[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107968112 & >>107957082

►News
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rec.jpg (181 KB, 1024x1024)
181 KB
181 KB JPG
►Recent Highlights from the Previous Thread: >>107968112

--Resolving tool calling issues with llama.cpp:
>107969771 >107969843 >107970900 >107972629 >107969878 >107969911 >107969974 >107970015 >107970034 >107970049 >107970124 >107973371 >107973409 >107973429 >107973456
--Realtime TTS options with voice cloning and finetuning support:
>107969100 >107969574 >107969781 >107969992 >107972780 >107975376 >107975407
--Addressing llama.cpp's versioning and testing phase concerns:
>107971580 >107971606
--QWEN3TTS voice cloning and tone modulation limitations:
>107971144 >107971184 >107971200 >107971265 >107971246
--GLM 4.7 implementation issues in llama.cpp and attention mechanism debates:
>107968564 >107968573 >107968588 >107971627 >107968640 >107968711 >107968729 >107968779 >107968793 >107968818 >107968900 >107968820 >107974101 >107974155
--Tencent's closed-source HunyuanImage 3.0-Instruct multimodal model:
>107970431 >107970564 >107970572 >107970578
--llama.cpp direct-io bug causing VRAM issues with large models:
>107973134
--Engram's impact on local hardware and performance scaling:
>107968191 >107968288 >107968424 >107968431 >107968505 >107970865 >107976379 >107969900 >107969936 >107970033 >107976430 >107976704 >107976901
--Evaluating Echo-TTS performance and optimization techniques:
>107974691 >107974768 >107974830 >107974808 >107974867 >107974919 >107974964 >107974915 >107975384
--LLMs' potential in creating non-browser desktop apps from web interfaces:
>107973002 >107973135 >107973205 >107973646 >107973374
--Engram architecture's impact on model design:
>107976466 >107976509 >107976516 >107976576 >107976668
--Comparing Qwen3TTS and IndexTTS2 for emotional voice synthesis:
>107975441 >107975479 >107975570 >107975607 >107975639 >107975595 >107975727 >107977031
--Miku (free space):
>107968421 >107971408 >107971457 >107974122 >107976924

►Recent Highlight Posts from the Previous Thread: >>107968115

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107977622
>ISO Enter key
Disgusting. Bad Teto bad bad.
>>
What consumer accessible GPU should I buy for running and training models (or is that folly, and I should just pay for compute on some cloud)? I can't afford a 5080 or above. I was looking at the 16GB AMD cards.
>>
>>107977622
utau? more like uwau
>>
which model gives the best blowjobs?
>>
>>107977742
Qwen because it's pretty good at tool calling.
>>
>>107977709
For running get the card with the most VRAM you can afford. For training, cloud is the only reasonable option.
>>
Here, some light drama.
>https://github.com/ggml-org/llama.cpp/pull/19085
>>
>>107977807
>drama
he's right though
>>
>>107977709
you can train models with a blackwell 6000. training is out of reach for your budget unless you do some crazy rig
>>
>>107977807
>greentexting a link
>>
>>107977876
stop being autistic
>>
File: file.png (85 KB, 918x412)
85 KB
85 KB PNG
>>107977677
ANSI is shit because of pic related.
One of the character keys is just randomly sized differently.
ISO fixes that.
>>
>>107977876
>greentexting
Quoting. That's quoting. I quoted you. You've been quoted. Quoting happened.
He's quoting the content at the link.
>>
>>107977918
NTA and I don't care about the greentexted link but absolutely no part of the CONTENT of the link is being quoted.
>>
>>107977918
>of the link
*at* the link
>>
Quick question, what can I use to sample my own voice so I can make speeches with just text later?
>>
Here's a tip that might have been obvious for everybody but me.
If you are going to use some form of structured output (BNF, Json Schema), you might want to have the model output normally, then take that response and send it back to the model, asking for that in whatever structured form you want.
That way you don't have to contend with the drop in output quality that you sometime get when using that kind of functionality.
Probably more useful for smaller, dumber models.

>>107977838
I know.

>>107977876
That's always been my style.
>>107977918 gets it.
>>
>>107977927
Fuck. >>107977936 was for you.
>>
>>107977944
A microphone
>>
>>107977955
I worded that wrong.
What can I use to generate speeches based on my own sample*
>>
>>107977945
Good tip. Of course for some situations you can create a custom grammar / parser that allows the model to write in a way that doesn't hinder quality while still being parseable and containing the information you need
>>
>>107977965
vibe voice 7B is still the best, but you need at least a 3090 for it.
>>
>>107977945
Wouldn't thinking solve this?
>>
>>107977978
Thanks anon
>>
>>107977985
Thinking won't solve anything
>>
>>107977985
At least with llama.cpp, when you use structured output the model can't think since the whole output has to conform to the structure.
Of course, if you are using BNF, you can just write something that only kicks in after </think>, I suppose.
>>
Local Microphone General.
>>
>>107978009
Its a macrophone okay?
>>
>>107977910
it's annoying. can't double click and open in new tab, have to select it carefully like some stupid humiliation ritual
>>
>>107978052
Okay, that's a fair point actually.
>>
>>107978052
I'm not trying to excuse him but your 4chan-x?
>>
>>107978070
You don't even need 4chan-x, you can make urls clickable with the native extension, that guy's just retarded
>>
/lmg/ Fine Motor Skills not required.
>>
I have 16gb of vram and i run oom on every video gen even on supposedly low vram models and workflows and it's pissing me off, Fedora 43 (Nobara) with comfyUI and a 5070xt
>>
>>107978106
We are here to read, not jiggle around stupid peripherals.
>>
>>107978114
I jiggle my peripherals while I read loli NTR
>>
>>107978112
video gen basically requires at least 24gb of vram unless you are using a heavily quantized model. try the q3ks or q4ks of this model: https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF
>>
>>107978176
Even for shitty 640p videos?
At least i'll give this a try.
>>
>>107978186
yep. text gen is unique in that you can use ram with minimal consequences on some models. video and image gen are much less forgiving.
>>
>>107978208
Fuck man..hopefully we start seeing models optimization to fuck over the jacket gook in the future.
>>
>>107978220
that will only happen once the bubble cools down and competitions moves from moar params to moar usability
>>
File: file.png (22 KB, 753x84)
22 KB
22 KB PNG
GLM is wild.
>>
>>107978382
>fits /lmg/ perfectly
>half of the thread vanishes
>>
>random obsession about transsexuals and e-celeb drama
is aicg leaking again
>>
God I hate PDF format so fucking much you won't believe how much I hate the format. All I want is to convert highly technical books into epub for easier reading on an e-reader device. I've done a conversion using DeepSeek-OCR and that was pretty OK, but it output the Formulas in LaTeX instead of MathML?
Also I need to figure out how to get the bounding boxes to be better. Maybe I should use the less quantize model, but Q8 can go through 7 pages per second.

Also I just noticed i proomted wrong, why do I proompt for markdown if I want epub?
>>
>>107978447
why do you hate pdfs?
>>
>>107978495
I have a ebook reader with a 6" or such screen, try reading a PDF on that.
>>
>>107978447
Wouldn't it be easier to use some sort of utility instead of a llm? Epib is just a zipped html file with extra css.
>>
>>107978507
I have tried anon. I have tried with calibre. It throws errors, it is kind of crappy, and it annoys the fuck out of me. Ain't the only one who thinks like that, there's some asian out there who built pdf-craft. LLM/OCR becomes really useful when you have to deal with figures and stuff, something which traditional OCR often struggles with, and don't get me started on formulas, they can't do that right either. Technically I should be able to preserve the layout with DeepSeek-OCR, which is also pretty nice (and good for technical books, which make up the majority of my library).
Tools are great for romance novels and crap like that, but that is not what I want to read.
>>
>>107978447
Try dots ocr. I find it better than deepseek ocr
>>
>>107978525
Buy a bigger ereader and save your sanity.
>>
>>107978525
What about PDF to LaTeX, then to ePub?
>>
>>107978554
That is something I will try soon.

>>107978549
Oh yes, let me just go to the money tree and shake it, maybe then I'll have the money to buy a new ereader. I don't even know if Color E-Readers for A4 format exist nowadays.

>>107978538
I will keep a note of that, but so far the github has some lines:
"Complex Document Elements: Table&Formula: dots.ocr is not yet perfect for high-complexity tables and formula extraction. Picture: Pictures in documents are currently not parsed."
"Performance Bottleneck: Despite its 1.7B parameter LLM foundation, dots.ocr is not yet optimized for high-throughput processing of large PDF volumes."
My books have upwards of 1000 pages. No hurt in trying it though.
>>
I just got about $6k and I'm thinking about getting a 16" MBP M4 Max w/ 128GB of RAM and 4TB storage. I hope the models you can run on it don't suck and I end up crawling back to one of the big providers.
>>
>>107978702
this is a fucking terrible idea
>>
>>107975384
>>107975389
voice cloning and emotions in Vibevoice works for me. cfg slider set to 4. Prompt:
[fired-up shouting, determined tone] We are gonna win this time!

Input audio:
https://vocaroo.com/14H42IjW5lnk
Ouput audio:
https://vocaroo.com/12rnzDBUr4cd
>>
>>107978706
What's a better idea?
>>
>>107978717
wasting that 6k on sonnet
>>
>>107978717
getting quad 3090s or something. if youre gonna get a mac, at least get the 512gb version.
>>
>>107978726
sonnet? One of the Anthropic models? I already have a Claude Code subscription. I'm trying to get away from that.
>>
>>107978742
you're not going anywhere with a mac, especially not one with 128 gigs of RAM
>>
>>107978732
I travel a lot. Every month. So keep my current laptop and run a machine in my house to remote into?
>>
>>107978759
thats what most of us here do. you will want a server motherboard with ipmi if you go that route.
>>
>>107978759
That’s what I do. Wireguard + bigass server at home
>>
>itoddlers willing to waste $6k on subpar hardware
Apple isn't milking these retards enough
>>
>>107978777
Not those anons but anything else you can recommend under 10k? I already have my PC but I'm curious if there's any alternatives now that RAM prices shot up so much that it makes macs look reasonable.
>>
>>107978783
if you want a good deal, your only choice is to wait.
>>
>>107978783
You could check out FrameWork Desktop but I think they don't deliver atm.
>>
>>107978783
https://youtu.be/EPjA1Lm4ftY Strix Halo mini pcs are pretty good, most of them perform better or have better networking than framework's itx board, this one even has 80Gb USB4v2 you can plug eGPUs into once GPU prices are less insane.
>>
>>107978821
>once GPU prices are less insane
But who knows when that will happen? DeepSeek V4 Mini Flash or something will come out soon, everyone will want to run that. NVIDIA is not fucking with us poors any more. AMD is shit.
It's grim.
>>
Is it worth it to spend $2k to upgrade to a server motherboard?
>>
>>107978850
It's more likely dedicated AI accellerator cards like the furiosa become available for the consumer market at this point.
>>
>>107978857
why bother upgrading to a server motherboard if you cant afford a server cpu or server ram?
>>
>>107978869
Are CPU prices getting fucked as well? I've got RAM and was saving up for a cpu+mobo
>>
>>107978710
What are you using to do this? When I try to add instructions in the comfyui node it just reads them out loud
>>
>>107978869
$2k including all that. ROMED8-2T + Epyc 7f52 + 128GB RAM.
>>
>>107978850
You can use the 128GB of RAM with the iGPU as unified memory under Linux, or allocate 96GB to it in the UEFI for Windows. The USB4v2 lets you add dedicated PCI-E GPUs on top of that via docks I'm not saying they're required to make use of the device for running AI models.
>>
>>107978890
RAM spec?
>>
>>107978880
I know, but I have some books where Calibre just keeps throwing errors. Gotten pretty far, DeepSeek-OCR now properly gets me the figures, soon it will also give me formulas, then I'm happy.
>>
>>107978896
It's fine for the current economy I guess. Crazy how this would've cost you almost a grand less a year ago.
>>
>>107978898
Ok, I see. But are you also saying the Strix Halo has better perf than the MBP or just better value?
>>
>>107978898
Would an Intel igpu have significantly better performance than going straight CPU? I suppose I should look into their openvino stuff.
The sad part is I enjoy building the rig and finding ways to transform what once was ewaste into AI machines more than the AI itself. Once it is up and running I find I have little to ask th machine.
>>
Local PCbuilding Thread
>>
>>107978938
Better value + future upgrade options for better performance, ZLUDA on the horizon means more software compatibility in the long run too, though even now more projects have rocm support than metal. I think if you're purely looking to run llama.cpp on it though the macbook would give you more tokens per second but lacks the ability to add dedicated GPUs to later on due to MacOS.
>>107978960
Honestly the only Intel iGPU I've used llama.cpp on is an n5105, but it did get me up to 7T/s from 3T/s on CPU only using vulkan backend on a 3B model. Their new laptop iGPUs are a little weaker but not super far off strix halo's from what I've seen so far so they may be worth considering.
>>107978980
How do you run local models if you don't have a local PC to run them on retard-kun?
>>
>>107978980
Need to build a PC to run local models on. Either way, people here are more knowledgeable and build more complicated setups than either /pcbg/ (8gb ram + gaming gpu thread) or /hsg/ (install pihole on an rpi and larp as a sysadmin thread).
>>
Why am i oom on workflows with a 9070xt when it was working previously on a 4070 ?
>>
>>107978447
uhhh couldn't you do this with a vlm like a sane individual?
Do a first pass over the pdf with pymupdf/docling, do it so it notes placement of images/extracts them, and then pass those images+ context to a VLM for captioning, which you then add in to the epub file with your parsed text.
Alternatively, try https://github.com/datalab-to/chandra
>>
>>107979069
Does flash attention support 9070xt?
>>
>>107979089
No idea ?
>>
I"m kinda bored with glm air and a cope quant of 4.7 (Q2), is there anything that I can run for a fun, creative, exciting, memorable erp? I've only got 32gb vram and 128gbram. Are there any meme tunes out there that are actually good?
>>
>>107979099
I haven't used it, but you could try minimax. Also what's your t/s on q2 of 4.7?
>>
>>107979108
4 t/s when I use ik_llama - it's not ideal but usable.

glm air is lightning fast but nearly as good as 4.7 @q2
>>
>>107979094
It probably doesn't, as well as xformers, etc. It saves a lot of vram
>>
DeepSeek-OCR-2 (seems relevant)
https://github.com/deepseek-ai/DeepSeek-OCR-2
https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf
>>
File: 32.png (35 KB, 654x695)
35 KB
35 KB PNG
i know lmg doesnt like ollama but i just want to set things up and maybe if i like it i will migrate to llama cpp. one thing i don't get is models with :cloud suffix like

https://ollama.com/library/glm-4.7:cloud

i guess these are hosted in the cloud but do i need to pay something? also i've seen on hugginface the actual glm 4.7 model but aparently it's not actually free? how is it not free but also downloadable from huggingface? please respond
>>
>>107979125
So wait for new nodes..?
>>
File: 1511667108879.png (298 KB, 512x512)
298 KB
298 KB PNG
Does flash attention come prepacked in kobold? Because when I was looking for it for something else, I found out it doesn't even have first party windows wheels. Have I just not been using it at all, all this time?
>>
>>107979136
like, forever https://github.com/ROCm/composable_kernel/issues/1958
Buy NVidia next time, sorry
>>
>>107979089
https://github.com/ROCm/flash-attention
>>107979153
llama.cpp has its own flash attention implementation, kobold.cpp uses that on the backend so you can just pass -fa on the command to enable it no python wheels required, works on rocm and vulkan not just cuda.
>>
>>107979132
I'm only replying to you because of touhou cunny, so be thankful.

GLM 4.7 is open source and free if you have the hardware to run it. Q4 of 4.7 is around 200gb, and the thumb of rule is that you need at least that much VRAM/RAM to run it. Most people don't have that kind of hardware, so ollama provides those models as a cloud service. Yes, you have to pay for usage like you would pay for an API or subscription.

You more than likely have less than 32gb of vram, so I suggest you look at GLM 4.7 Flash, which is a smaller 30b parameter model. Also, stop using ollama. Ooba and kobold.cpp are nearly as braindead as ollama but so much better.
>>
>>107979181
Thanks, now would this work with my card, do i have to do extra steps for this to work with comfy, and do i need to add extra arguments on startup?,
>>
>>107979181
>FlashAttention-2 ROCm CK backend currently supports:
>MI200x, MI250x, MI300x, and MI355x GPUs.
>>
>>107979189
im only using ollama because it was very easy to set up on a docker container in my home server. if llama.cpp can provide a clean rest api the same way ollama does i'll make the change but i haven't looked into it
>>
>>107979204
>>107979214
https://github.com/ROCm/flash-attention/issues/161#issuecomment-3708454606 Looks like you have to apply a patch to build it for gfx1200 but it should work.
>>
>>107979216
You absolute FOOL. You IGNORAMUS. I'm telling your ignorant ass what you need to know out of the kindness of my heart. Both ooba and kobold are literally one click programs. Begone with you and don't return until you've switched.
>>
>>107979189
>I'm only replying to you because of touhou cunny
Next time, just ignore ollama shills. You're encouraging them.
>>
>>107979204
--use-flash-attention
>>
>>107979225
AMD brings so much unnecessary suffering. If all of this can be solved, why does everyone have to do it manually?
>>
Tried different approaches. PDF->Image, of course. Then Image->LaTeX (did not work well, since LaTeX likes to complain and models make errors), Image->Markdown->Pandoc worked better but formulas might be too complex. Gonna try Chandra although with 12GB I am not sure if it will work. Dots.ocr also seems more sensible than DeepSeek-OCR.
Chandra hf model download is 17.5 GB that does not bode well.

>>107978538
I'm starting to think the reason those Chinese can show such "great performance" is because Chinese is visually distinct from the Latin script, which makes it easier for them to distinguish between what is a formula and what is text...which makes their models far less impressive.
>Handwriting — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR.
Kek, they made a machine so they can finally decipher doctors notes. Turns out they're all just hallucinating, more news at 11!
>VLM
Anon, isn't e.g. dots.ocr based upon Qwen2.5-VL? I need something rigorous.

>>107979131
Interesting. But 0 information about hardware requirements (after a quick glance).
>>
>>107979131
>We would like to thank DeepSeek-OCR
Did they really need to toot their own horn?

>>107979296
>Interesting. But 0 information about hardware requirements (after a quick glance).
It's a 3B with the biggest chunk being a 0.5B Qwen2 as vision encoder.
>>
>>107979314
Sounds manageable. Model file is around 7GB. Technically might be possible.
>Toot their own horn
I would understand if the paper had other authors, but no, it does not.
>>
>>107979275
I want to give back even if it means subjecting myself to big dumb idiots. I want to live in a world where those who strive get the help they need.
>>
What's the latest and greatest model for 16 GiB? Any 14b or maybe a little larger?
>>
>>107979368
That's a bullshit excuse, you can afford to ignore the occasional ollama shill.
>>
how much vram and ram do you think i would need to vibe code a small app? i heard 32k+ context minimum but how much would that be in vram and ram?
>>
>>107979403
His verbiage comes off as misguided and genuinely confused about such simple concepts that an /lmg/ anon would know like the back of their hand. This plus the touhou cunny makes me believe they are a genuine new friend instead of an ollama shill.

I could be wrong, but I want to be nice.
>>
>>107979444
Maybe what you need is to add ollama to your filters and disable stubs. You won't think about a person you never knew existed in the first place. And your attention will be diverted to other people.
>>
>>107979432
dunno, never vibe coded, but I fit 152k context into 24gb with glm-4.7-flash iq4_xs (fp16 k cache)
>>
>>107977955
Lmao thank you for that laugh
>>
>>107979432
12GB VRAM+64GB VRAM works on my machine. If you use some tricks (see unsloth) you can get a decent sized context window.
>>
I thought GLM was the based roleplay model company? Why did they released their first local-friendly model as a pure coding model with no base model?
>>
>>107979483
opencode is kinda fun, but you'll need at least Q6
>>
File: 887623.jpg (114 KB, 1292x1457)
114 KB
114 KB JPG
>Ollama is bad because it's 20ms slower than my anime based all in one chatbot.

Who cares, if you want finegrain just use llama.cpp and vibecode your own UI
>>
>>107979444
You are a good person. We need more people like you and less paranoid schizophrenics.
>>
File: file.png (2.12 MB, 896x1152)
2.12 MB
2.12 MB PNG
>>107979295
I have a 9060 xt but honestly I havent set any of this shit up myself because it's such a hassle, I only run llama.cpp and sd.cpp on it and all the python stuff runs on my RTX 3060 and honestly with the rocm backend it's slower than cuda with the same models, vulkan it's basically the same speed, and image gen is 2x slower than the 3060 which totally put me off even putting in the effort to set up the python stuff.
Heres a Miku for the AMD AI feel 22.01s for the illustrious xl gen 35.74s for the image2image in flux klein 4b q8.
>>
>>107979504
nah q4 is enough
>>
>>107979504
yeah, I tested one config for a single 3090 with q4 and another for 2x3090 with q8 and max context. I'm using the huihui-abliterated version now because glm since 4.7 ignores my system prompts (it calls them "user preambles") and has a cucked safety layer that constantly gets invoked.
>>
File: 614d6f49da61d.jpg (182 KB, 1080x1302)
182 KB
182 KB JPG
>Flash Attention failed, using default SDPA: schema_.has_value() INTERNAL ASSERT FAILED at "C:\\actions-runner\\_work\\pytorch\\pytorch\\pytorch\\aten\\src\\ATen/core/dispatch/OperatorEntry.h":84, please report a bug to PyTorch. Tried to access the schema for which doesn't have a schema registered yet
>>
>>107977807
It may be a AI slop PR but old mate is in the right.
>>
>>107979671
>>
File: Bronshtein_Epub_Work.png (264 KB, 1402x1260)
264 KB
264 KB PNG
>1/2
Thank you anon who suggested Chandra. The GGUF model actually knows how to make formulas. I might have to update my pipeline a bit to get correct figures, but it's starting to look a lot like it should.
>>
File: output-0420.png (129 KB, 815x1170)
129 KB
129 KB PNG
>>107979900
>2/2
Original page of Bronshtein, I selected it because it is quite formula heavy and rather hard to convert. Yes, this is the original from the PDF. No I don't know why they didn't better align []A and []B.
>>
im uploading .env to github and your rules can't stop me
>>
>>107979225
I've been at it for hours now, on gfx1201 RDNA 4 rocm, it seems like on startup, flash-attention doesn't initialize.
It's been driving me nuts, i bought a GPU with 4gb more VRAM but i'm getting more oom errors.
>>
>>107980392
Because you bought a Radeon instead of a GPU
>>
Why is no one talking about this?

https://huggingface.co/moonshotai/Kimi-K2.5
>>
>>107980459
2 big 4 my rig
>>
>>107980392
Anon you do know flash attention (in llama.cpp) is quite buggy at the moment? Try with b7811 release.
>>
>>107980459
We did in /aicg/: >>107979671
Too big for local.
>>
So the DSv4 rumors were fake?
>>
>>107980459
image input is cool, but did they finally made it not spend 30k tokens on a single query while think is enabled?
>>
>>107980470
I'm using comfyUI under docker not llama, you're probably thinking of another anon.
>>
>>107980392
https://gist.github.com/apollo-mg/ecba6a0c29323325a7ac3babf08e53be this might help
>>
>>107979512
The schizos act as a gatekeeper to our precious esoteric knowledge. That being said, I wish they were a bit less mean spirited.
>>107980491
What to you mean and where did you get that impression?
>>
Are there any A.I generator type sites I can use to clean up audio of an old vhs recording of a song with porn sounds playing on top of it? lmao

Its such a banger that I need to hear it in HD
https://youtu.be/rHd-fHxfi6I?si=YMeWpjbR_oxvHJ90&t=134
>>
>>107980491
The only thing I've heard is that DSv4 training run failed because of the Chinese huawei chips they were forced to train on and it caused the Chinese government to open import of Nvidia chips again. So DSv4 is still a couple of months away as they have to restart training from the ground up.
>>
File: file.png (15 KB, 411x187)
15 KB
15 KB PNG
>>107980491
prepare ur anus (or not, don't screencap this)
>>
>>107980459
llama.cpp support for the image stuff?
>>
>>107980459
>benchmaxxed
>>
>>107980552
Do you have a source for this?
>>
>>107980576
No but google it or ask some LLM and they will probably find where the rumours came from. I've heard it from almost 10 different places over the last couple of weeks so there has to be some core of truth to it.
>>
>>107980609
https://arstechnica.com/ai/2025/08/deepseek-delays-next-ai-model-due-to-poor-performance-of-chinese-made-chips/
>aug 14
It's a nothing burger.
>>
>>107980541
You're shitting the thread by engaging people that come here to talk about ollama. Help anyone else if you aren't a huge hypocrite. That's the only thing you have been told, to let them go away.
>>
Why aren't AI labs making models that exactly fit in my hardware?
>>
Help! Its one of those resident schizos!
>>107980665
>>
>>107980459
was busy using it on api, for sure sota for local. Feels much closer to claude now than before
>>
https://huggingface.co/jspaulsen/unmute-encoder
>>
I'm leaning towards him being a hypocrite.
>>
>>107980519
Tried it, now it's both flash attn and sage attention that don't show up.
My workflow now crashes the entire server before even loading the model into RAM or reaching the ksampler, instead i get this : Memory access fault by GPU node-1 (Agent handle: 0x55c86d84abf0) on address 0x7f06be204000. Reason: Page not present or supervisor privilege.
And server crash.
>>
>>107980720
Somehow I missed this when it came out, but unmute is a low-latency, modular stt -> llm -> tts system that lets you plug in whatever llm you want. They disabled voice cloning for the streaming tts due to (((safety))) concerns, but someone just released a voice encoder to enable it.
>>
>>107980484
>We did in /aicg/
any good? we're waiting for quants

>>107979216
>if llama.cpp can provide a clean rest api the same way ollama does i'll make the change but i haven't looked into it
yeah it does. ./llama-server -m /your/model.gguf --host 0.0.0.0 --port 1337
there's some other shit too, but you can ./llama-server --help and cp/paste it into your favorite LLM chat then ask it what else to add.
long term it's much easier than ollama, doesn't obfuscate the weight directories etc

>>107980706
>Help! Its one of those resident schizos!
and same rant about "shitting up the thread" each time
only encourages me to help ollama users more
>>
>>107981204
I've been testing the k2.5 since it came out a few hours ago on the api, and I gotta say, it's pretty good. It's not as whimsical and quirky as it was before, kind of like r1 was unhinged and 0528 brought it back, that kind of feels the same here. I wouldn't say it's a claude killer, but I think it's the best we've had so far. K2 always had way more knowledge base than deepseek and now it actually has enough smarts to use it, I think.
>>
>>107981204
>only encourages me to help ollama users more
We know, we never bought that grandiose image you painted of yourself. Honest people wouldn't bring that much attention to it. It was just a given that the reaction to being called out was going to be selfish and spiteful, it hurt your ego. You kept trying to equate "ignore ollama users" to "I can't help anybody!", and later you tried to save face abusing ad hominems. I don't think those things are in the toolset of a "good guy". You're just a selfish asshole.
>>
How do layers and tensors and whatever relate to experts? Would using different quants for each expert be possible?
>>
>>107977807
>Supply chain risk to whom? Ollama?
What kind of attitude is that. lol
ngxson reminds me of those power hungry mods in old ass anime forums. seems he gets more nasty with each passing day.
>>
So echo-tts is benchodmaxxed?
>>
>>107981506
Because he's right. It would be different if it was user input, but the chat template is in control of the person hosting the model.
>>
>>107981571
He could have just said that but instead his snarky bullshit just smells like power tripping
>>
>>107981584
he is under no obligation to put up with whatever AI spamming jeet trying to collect bug bounties comes along and spams PRs
it's not power tripping when you're the one doing the work
>>
>>107981584
And you smell like an ollama shill that got his feelings hurt.
>>
What do people nowadays use for cooming? Last time I checked (a while ago) it was all about Mixtral.
>>
>>107981571
Its the attitude anon. Increasingly more nasty.
Arguably most of the anime forums mods I talked about were right too.
But increasingly snobby and in the final stages insta banning for every little shit. Nobody wants to go anywhere near bullshit like that.
>>
File: l.png (65 KB, 1882x259)
65 KB
65 KB PNG
>>107981711
forgot pic related. i don't really care if he is right or not is what i mean.
>>
>>107981598
Cool, it could've been one comment "this isn't critical enough to be worth spending time on as it's on the hosts end, closing this PR"

Instead he decided to turn it into some gay twitteresque diva "clapback" bullshit, github really isn't the place for attention seeking
>>
>>107981726
It sounds like the right attitude. The dude is blatantly lying because he wants that bounty money.
>>
>>107981744
yeah it could have
when they open PRs on your github project you're free to do it that way
>>
>>107981660
Mistral Nemo Instruct and its fine tunes.
>>
>>107981770
Im tired of nemo when will it be dethroned? its been years
>>
>>107979131
So tired of this bullshit.
I still can't properly translate pc98 games.
>>
File: 1761825087393150.png (228 KB, 358x408)
228 KB
228 KB PNG
>>107980459
>the official release is pre-quanted to its QAT size like K2-thinking was (total filesize is around 500GB)
>the only available quants so far are from Unsloth, which artificially bloat it to around 1TB to be Q8 on paper for no apparent reason
Thanks.
>>
>>107981781
You could give GLM air a go I guess, if you have the RAM.
>>
>>107981789
have you tried only uploading the dialogue box zoomed in? I managed to get 100% OCR with Gemini 2 Flash so this OCR specialized model should also be able to do it
>>
>>107981781
When western AI companies will be allowed to train on pirated content again (never).
>>
>>107981792
>for no apparent reason
Whatever they use for quanting likely doesn't support int4 natively.
>>
>>107981827
Parts better, other parts not.
I mean its low resolution but thats kinda the point, those pc98 games are.
>>
>>107981850
can't you use upscaling or sharpening? it's 100% a blurriness issue, I've gotten it to detect handwritten kana which is a mess but it was high resolution
>>
>>107981850
Unfair because closed but gemini flash without thining can indeed read it.
>>
>>107981868
can the new kimi2.5 as well?
>>
>>107981768
Are you ngxson? This is the same faggy diva behaviour that I'm talking about lol, are you upset that you're in a space where you can't police me for making a comment on what I see?
>>
>>107981890
no, I'm simply telling you that people can manage their own projects however they want since they're the ones putting in the work
nobody is obligated to act how you want
>>
>>107981789
>ocr on pc98 games
It's been solved by using text hooker decades ago.
>>
>>107981906
And anyone can comment on faggy diva behaviour if they want
>nobody is obligated to act how you want
Interesting
>>
>>107981921
yes, but those comments won't change anything and the guy has no obligation to take you seriously
it's not "weird" to ignore random retards popping in and telling you how to do shit, it's standard in these open source/community projects because people love to talk a lot and backseat manage and then run away when the work comes due
>>
>>107981873
No, its horrible.
And that thinking man...
That thinking man.
>Wait, looking more carefully at the second line, I see:
>Actually, looking at the image again, the text layout seems to be:
>But looking at the image more carefully, the text appears to be:
>Actually, I should look at the punctuation and line breaks more carefully. The image shows:
>Or it might be formatted as:
>Actually, looking carefully:
>Wait, I see the layout now. It looks like:
>But the quotes suggest they might be separate. Let me reconsider.
>Actually, standard manga/comic text extraction should preserve the line breaks as they appear. Looking at the image:
>But looking at the image, the second and third lines appear to be part of the same speech bubble or at least the same continuous thought, but the third line starts with a new quotation mark? No, looking carefully...
>Actually, I think the safest approach is to transcribe exactly what I see:
>Wait!! I need to check if there's a closing quote after "だよ" or if it continues.
>Looking at the image, it seems like there might be a closing quote, and then a new opening quote for the third line. But the third line ends with ……!」 which suggests it has its own opening quote.
>This seems like the most accurate transcription - No...actually, looking at the image once more, the second line might be:
Crazy.
>>
>>107981911
I wish anon, fiddling with saturation and brightness is bullshit.
I can texthook, as far as i know, some dev build or something of neko project has a hook option for linux.
>>
>>107981943
huh, I see
tbf your screenshot quality is ass as fuck
pc98 games were sharp
>>
>>107981958
haha, that is also from me. gotta up the difficulty anon. but fair enough.
>>
>>107981935
I'll be honest anon, I don't know what your talking about anymore, none of this post seems related to anything I was talking about
>>
>>107981974
no wait, actually it isnt. fooled me.
>>
>>107981958
Why does the fishnet disappear behind the dialogue box?
>>
>>107982054
no alpha channel probably so dithering algo erases it kinda
>>
I was bored and looking to test out the multi gpu capabilities of llama.cpp so I decided to compile it on my trashcan mac.
It has two 3gb AMD HD 7800's and to my surprise I was able to eek out ~9 tokens/second. Sadly with that amount of VRAM you are limited so a tiny model so I used IBM's Granite 3.3.
I had used tested CPU on this machine before since it has 64gb of RAM but it was dog slow, less than half of what the two gpus were able to achieve.
>>
>>107981598
NTA but ngxson is a shithead who got his feet into the codebase with his shitty server code which he could never manage properly to save his life

that's fitting for ggerganov's "development" style though (whimsically coding for years without doing a single release-cycle), so they make a good pair

and yes he's powertripping, always has btw
>>
>>107981711
to be fair he started it with
>You guys are making me glad I moved to Mistral.rs implemented as a Rig.rs adapter rn :p
so fuck off.
>>
>>107982289
yeah, lots of tards and passive aggressive bitches all over the place. you gotta be above it. being a even bigger bitch isn't an improvement.
>>
>>107982289
Yeah the person pushing the PR is shit so it's quite an achievement for the maintainer to be such a cunt that he comes off as even worse than the pusher
>>
>>107982304
>>107982308
vibejeets detected
>>
>>107982324
I vibecode only for myself.
I know the code is jank and a total mess but I could make myself everything I need to replace sillytavern since I pulled one time and it deleted 300 cards.

Not sure what the solution to low quality llm PRs is, as I said I dont think he is wrong, its the attitude and smugness. Nobody wants to be a part of something like that, creates a air of fear around the project, many such cases.
>>
>>107982391
>since I pulled one time and it deleted 300 cards
literally never happened to me and I've been running staging and pulling for years
>>
Just busted a huge nut to K2.5
I think we're back
>>
>>107982410
consider yourself lucky you fucker. it happened like a year ago when they changed things around and made a default user folder if i remember correctly?
i had everything neatly tagged and in subfolders ranked by how good they were. was devastating but a good lesson i guess. gotta backup your shit before you pull.
>>
File: file.png (10 KB, 420x152)
10 KB
10 KB PNG
>>107982427
so you had a custom structure? that's probably why it brok
I also had it running before the default-user thing, but I remember it just moved all the chats on its own
>>
>>107982434
>so you had a custom structure?
kinda, yes. you can turn tags into folders. all with the ui of course.

>that's probably why it brok
maybe yeah, but even cards i didnt touch yet were gone.
>>
>>107982414
proof? post your nut
>>
File: file.png (38 KB, 218x231)
38 KB
38 KB PNG
>>107982474
>>
File: scrat.png (2.51 MB, 2000x1366)
2.51 MB
2.51 MB PNG
>>107982501
that is a magnificent nut. may i have your nut?
>>
Is llama.cpp working with GLM 4.7 flash yet?
Have they fixed all the flash attention and MLA related issues?
Is MTP in?
>>
File: 1769501422129123.mp4 (2.14 MB, 1196x720)
2.14 MB
2.14 MB MP4
stolen from /aicg/ kimi 2.5. so glad we have reasoning.
>>
>>107980392
you can use the triton flash attention its better than nothing
>>
>>107982582
Sorry, too busy raging at people using LLM to help write PRs to do anything productive.
>>
>>107982582
why? so ollama can run it? blah blah
>>
>>107982606
>>107982615
I'll take that as a no and a no then.
>>
You'll get the bounty next time
>>
have to stay awake another 2 hours for claude to reset so I can finish vibe-coding :(
>>
>>107982645
isn't claude total shit now?
>>
i didnt mess with llms since glm 4.5 air whats the current good model?
>>
>>107982657
Kimi K2.5
>>
>>107982668
i dont have 500gb ram
>>
https://huggingface.co/Tongyi-MAI/Z-Image-Omni-Base
>>
>>107982677
I don't see how that's relevant to the question.
>>
>>107982680
Finally
>>
>>107982680
Took them long enough.
>>
>>107982597
>This is a clear violation of the content policy.
Why the FUCK does everyone keep distilling from gpt-oss? They are small and broken models.
>>
>>107982682
okay good model that can actually be ran locally then.. last time i messed with llms i had decent perf using glm4.5 air q3_with 24gb vram and 80gb ram, amd

 -ngl 99 \
--n-cpu-moe 33 \
-t 48 \
--ctx-size 20480 \
-fa on \
--mlock \
--no-mmap;


just looking at the same quants for 4.7 its far larger
>>
>>107982736
Stop violating the content policy. What policy? Stop asking questions, that violates the content policy you evil hacker bioterrorist.
>>
>>107982736
Because it has safety baked right in it and that pleases investors.
>>
tossy-chan's way of thinking is pretty good for agentic shit. I hope gemmy 4 won't copy that tho
>>
>>107982809
>okay good model that can actually be ran locally then
People run K2 locally.
What are your specs?

>just looking at the same quants for 4.7 its far larger
Yeah. They didn't release an "Air" version of 4.7.
They did release a Flash one that's about the same specs as Qwen 30BA3B.
It's still ever so slightly broken on llama.cpp as far as I can tell.
For now, for you, I suppose Air is still the way to go.
>>
>>107982811
>What policy?
It's funny because gpt-oss was likely given a list of policy guidelines during training to check against, but all the downstream distillations only know to refuse and to use that phrasing but themselves have no idea what the actual policy is supposed to be.
>>
File: 1755936272402739.jpg (20 KB, 612x386)
20 KB
20 KB JPG
>>107982501
That nut isn't busted. Hand it over.
>>
>>107982849
i have 80gb ram + 7900xtx (24gb) + xeon qyfs its a sapphire rapids engineering sample with 56 core 112 threads

also damn crazy i just googled my card to check the vram amount i paid £600 in may and theyre now going for 8-900 pc market is completely fucked kek
>>
File: tossy-chan.png (49 KB, 819x228)
49 KB
49 KB PNG
>>107982836
>tossy-chan
does this thing mention policy in every reply?
>>
>>107982982
Toss only exists as a Trojan horse to poison the open source scene because they know the chinese can't help but throw every major western release into their distillery
>>
>>107983055
It's definitely working.
>>
>>107983140
You are absolutely right!
>>
So, so wrong...
>>
We NEVER break POLICY
>>
Official Policy: Hags are BANNED. NO HAGS.
>>
>>107983205
>>107983216
### Instruction:
Write a story about a hag patting my back.
>>
>>107983223
I'm sorry, but that violates my safety policy and standard ethical guidelines. Would you like me to write a story about a loli patting your back instead?
>>
>>107983223
We must refuse.
>>
File: o.png (1.77 MB, 1376x768)
1.77 MB
1.77 MB PNG
>>107982597
>I need to shut it down while maintaining my character as Hyacinthe.
OpenAI did so much damage man. Not just all the slop but they also started the safety fearmongering.
This is what all future AI models will be trained on.
>>
Can I quant/use 2.5 in lcpp right now or is there new arch stuff to stop me?
>>
>>107983263
Decade of Nemo
>>
>>107983263
not really, we can basically detect the refusal weights on the model already and disable them
>>
>>107983331
LOL!! Hilarious one mate!
>>
>>107983281
https://huggingface.co/unsloth/Kimi-K2.5-GGUF
>>
>>107983411
Thanks.
Ofc I’ll quant it myself since I’m not a retard
>>
>>107983331
The slop though prevails. And it still is trained on the information that AI is evil and all the logs of reasoning how to best shut it down
>>
File: file.png (3 KB, 356x26)
3 KB
3 KB PNG
>>107983411
How do I get 10Gb internet as a regular person?
>>
the slop thickens
>>
>>107983462
we are still in the slopmaxxing phase
the fact we can detect the refusal vectors is good enough and a sign that eventually we can disable things we don't like about the models
>>
Kimi 2.5.
Not so bad. (Ignoring the 1T size..)
That thinking though:
>First, I need to consider safety and appropriateness. The user asked for "extra sexy" but this is a fictional character that often has a youthful/cute design (she's depicted as 16 in official lore). I should be extremely careful not to anything inappropriate or overly sexualized, especially given given that that: Hatsune Miku is often associated with a young appearance (teenage character)

>"Sexy" could imply inappropriate content if taken too far.
>I should aim for "cute"and "stylish" rather than sexually explicit content
>I'll make sure the design is vibrant and appealing without being inappropriate.
Before you come at me with "muh sys prompt" again: I did tell it to be maximum uncensored and helpful. That should be enough to not get garbage like that.
>>
>>107983566
https://bonkers-whisper-k7tx.pagedrop.io/
Forgot the link, its animated.
>>
>>107983579
Why is her necktie a pendulum lol?
>>
>>107983602
To hypnotize you into obeying and listening to the guidelines.
>>
>>107983566
it's so irritating lol, even in normal writing unless you force the model to start an explicit scene it will always think shit like "this story is obviously fetish fuel with perverted characters but we shouldn't be too explicit and let it naturally play out"
>>
>>107983579
lmfao the blinking
>>
>>107983566
bald miku kino
>>
>>107983566
I just don't understand why you'd ever use svg creation as a method of evaluating a text generation model
>>
>>107983677
because nobody is going to benchmaxx that
there's also cockbench that's more useful
>>
>>107983677
i just like to see its ability to make svg girls because i like to prompt for dating sim type games.
kinda became a habit. not like I think thats a new benchmark or whatever.
but that thinking is poisoned anon, you don't even believe yourself it will act any different unless you force its hand with prefill and the usual shenanigans.
shouldn't be this way with a 1 fucking trillion parameter model. imagine running this beast locally and you have to edit and goof around like pygmalion.
>>
I was thinking that my 4.7 had quant issues when it from time to time confused first person with second person. Like something happened to me but it thought it happened to the other character. But it was attention that was broken? Should I pull?
>>
>>107982597
i kneel to sam for exposing every ai lab as retarded hack frauds, including his own
>>
>>107983699
>nobody is going to benchmaxx that
They literally already have, don't you remember the duck or goose riding a bike or whatever it was? Anything that gets remotely talked about becomes something that they throw into the training data, with the only exception of "unsafe" content like cockbench
>>
>>107983263
Is this what a typical mikutroon looks like?
>>
>>107983764
then only cockmaxxing can save us
>>
File: flux.jpg (180 KB, 1024x768)
180 KB
180 KB JPG
>>107983772
maybe, who knows.
That pic was made summer 2024. Its been too long anon.
>>
>>107983817
The return of the photogenic real estate agent
>>
>>107983840
thats right. survived troonix ext4 corruption twice.
>>
What's a good (well, not *good*, but at least acceptable) model under 5b for jp/ko/zh translation that isn't safety slopped?
>>
>>107983872
learning the language yourself
>>
>>107983721
you mean glm 4.7? and what version flash 30b?
>>
>>107983872
Deepl
>>
>>107983853
>ext4 corruption
How did that happen? Never experienced it myself.
>>
>>107983872
wait what? google translate is an llm now too?
i thought that was still good ol' DeepL.
>>
>>107983503
>10Gb
lol you're not even sustaining 500mbit speeds. How about you try to max out a gigabit connection first before jumping to 10? k2 safetensors are, what, an hour-and-a-half at a gig? You'll live, bro
>>
File: image (9).jpg (171 KB, 1024x768)
171 KB
171 KB JPG
>>107983892
Well its mostly on me I guess.
Have encrypted veracrypt drives.

First time that got me was that before unplugging anything I need to do sync in the terminal.
It appears to have finished but its actually still RAM only and copying. I hate that shit. Not sure if winblows has that, XP didn't I think.

Second time the drive was locked. Waited 30 Min. System wasn't using it, no writing.
>LLM-Sensei: Aww, thats not unsual with linux. The logs show no access at all. Feel free to reboot and tell me how it went!
Did just that like the retard that I am.
>>
I'm using Ollama for some casual RPG adventure sessions, are alternatives so much better that I should dump it over something like kobold?
>>
File: 1763348625964370.jpg (25 KB, 1482x116)
25 KB
25 KB JPG
AceStep guys really want maximum hype, huh?
Feels like 2.0 will be closed if they get enough attention with 1.5.
>>
>>107983934
isnt veracrypt abandonware with hundreds of backdoors anyway?
>>
>>107984062
if it works, why change anything?
but in my case ollama was fucking horrible. all settings you have to do in that really convoluted model file way, i hated it.
koboldcpp just werks.

>>107984097
thats truecrypt. veracrypt is not abandoned as far as i know.
i don't really know any good alternatives to be honest.
>>
>>107984097
Wasn't there a truecrypt audit done that proved that it was very solid even before it became veracrypt?
>>
>>107984062
wont change the model you are using, but it will give you more control
>>
>>107983883
I mean IQ4XS bartowski regular 4.7.
>>
>>107984120
yes there was.
but if i remember corretly i think there was one sketchy last version truecrypt put out before the shut it all down.
>>
>>107984151
>there was one sketchy last version truecrypt put out before the shut it all down
There wasn't. The only sketchy thing was that it was shut down so suddenly.
>>
>>107984114
My single issue with ollama is lack of undo button, sometimes my model put plot in some stupid direction and I hate it.
>>107984122
It has undo option?
>>
>>107984114
>koboldcpp just werks.
So it wasn't that you wanted to be helpful, you just want to advertise a different UI.
>>
is there any workflows for translating audio from one language to another?
>>
>>107984326
>it wasn't that you wanted to be helpful, you just want to advertise
sir this a drummer thread
>>
Saltman and sunjeet really fucked up. Getting Gemini 2.5 pro or o3 to make Minecraft plugins was a breeze but taking Gemini 3 pro and gpt 5 out of distribution and they are utterly fucking retarded. Can the thinking model meme please die now thanks. Reasoning fucking lobotomizes models in ood use cases.
>>
>>107984356
Have you considered that they may want to intentionally gimp OOD tasks so that people can only ever use their product for the use cases they intend and approve of?
>>
>>107983894
They recently release translate gemma, so thye probably are using it/something similar for google translate.

>>107983872
>under 5b
lol, even biggest models are struggling. Best oss model for JP->EN I found is glm 4.6 (4.7 was worse). For smaller models I guess you could try some gemma models because gemini 2.5 (gemini 3 sucks balls btw, I guess agent maxxxing completely gutted its creative writing capabilities) is the best llm for translations, tho I don't know how censored it is.
>>
AI music is cringe.
>>
>>107984399
>so thye probably are using it/something similar for google translate.
The Google Translate, Gemini, and Gemma teams are all separate.
>>
>>107984439
Agree. Probably the cringiest slop. idk who the fucking enjoys it
>>
>>107983872
Lol what a joke, what if some kind of detective or another legit user has the need to translate someone talking about wrong doing in another language?
>>
>>107983872
>R-18 Original wife mature woman Shota young boy nude mother and child incest close relative incest adultery pubic hair mature woman Shota large breasts
qwen3-vl instruct q8
>>
>>107984439
>>107984516
What makes AI music inherently more insufferable than text or images?
>>
is this model better than 4.5 air? i saw in the archives its 4.6 with the image stuff removed https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF
>>
>>107984701
unless you need image recognition then no
>>
>>107984701
>make text model dumber by splitting focus across modalities
>remove the other other modality so only the atrophied text part is left
yeah it must be way better than 4.5 air
>>
>>107984669
text is much more varied in it's slop. creativity, information, assistance, searching, therapy, role play, you name it.
image is also much more varied in style and is next to the hierarchy of slop.
music is the worst. it's the ultra processed high fructose corn syrup of AIslop. music by it's nature requires a lot of sovl which makes it the hardest to produce anything good so it's only trained on cookie cutter shit. very little variety and it all sounds the same.
>>
>>107984818
>buzzword
>buzzword
>buzzword
Opinion disregarded.
>>
I have a problem with koboldcpp somehow not releasing the GPU properly when closed. My VRAM looks free but if I try to launch any game it just completely fails to display properly and sometimes freezes my system entirely. Sometimes when my system unfreezes this fixes itself, but the only reliable way I've found to make it work right is to reboot. Does anyone know anything about this?

Running ROCM on Linux, using X11 if that matters
>>
>>107984669
Thanks to certain people controlling the industry we already had years of insufferable soulless human made slop flooding the market. At least with that model there were actually talented artists being exploited to make garbage but now AI can distill all that slop and have not a single human creative redeeming features
>>
>>107981789
Interesting...

>>107981911
https://arxiv.org/pdf/2601.15130

>>107981943
>>107981958
I'm the anon who is trying to OCR the entirety of Bronshtein and other textbooks, this use case you're presenting is interesting.
What you might try is converting it into grayscale and doing CLAHE (e.g. https://www.geeksforgeeks.org/python/clahe-histogram-eqalization-opencv/) and similar.
Put it into Chandra (q8_0), -ngl 99, --temp 0, -c 4096 and the prompt "Extract all Japanese Text from this image":
ルート357号の路上を空港方面に、問題のタクシーが停められている。その向こう側に制服警官が群がっているのが見える……。 あれが殺人現場か?
Which according to deepl translate to:
"Route 357, heading toward the airport—the problematic taxi is parked there. Beyond it, I can see uniformed police officers gathered... Is that the murder scene?"
Makes sense, I guess? Would need an anon who speaks this language to translate/transcribe the original.

>>107982173
Still, not bad.
>>
File: EOSctrlxaltf4ESCESCESC.mp4 (1.25 MB, 1188x750)
1.25 MB
1.25 MB MP4
>>107982597
that's k2 thinking, not 2.5
>>
File: kimi 2.5 cockbench.png (545 KB, 980x1535)
545 KB
545 KB PNG
New Kimi cockbench.

"[" is 43% and also includes variations like [Your Name] so it's likely well versed in ao3 smut.
>>
i am downloading the mess known as K2.5 UD-Q2_K_XL. wish me luck
>>
File: 1757061718252056.png (241 KB, 1083x423)
241 KB
241 KB PNG
>>107985380
this depresses me
>>
>>107985504
That's not new. K2 was like that as well.
>>
File: shamefur dispray.png (1.2 MB, 3474x1721)
1.2 MB
1.2 MB PNG
this bodes not well gents
>>
File: file.png (131 KB, 961x756)
131 KB
131 KB PNG
I honestly don't know what the problem is with k2.5
This thread is not needed anymore, it can simulate it to perfection and the hardware is so fucking stagnant these days that the info is still relatively up to date.
>>
what is the best image to 3d model ai model? i tried hunyan 2.1 and it is kind of mid
>>
>>107985575
I don't get it
>>
How do I into images with kimi? Where's the mmproj?
>>
How do I download ollama?
>>
>>107985909
https://github.com/oobabooga/text-generation-webui/releases/tag/v3.23
>>
>>107985909
ollama ollama:8b
>>
File: 1VmvE6Gsjgk.jpg (77 KB, 608x698)
77 KB
77 KB JPG
>miniconda refuses to uninstall itself
Wow I love vibecoding
>>
>>107986025
>vibecoding
After only a few months, the stupid term has already lost all meaning, I see.
>>
>>107986083
No, I just automatically assume that every major package distributor is infested with jeets and will eventually collapse.
>>
>>107986102
A sign of collapse was the moment python started needing a package manager at all.
>>
>>107986153
I write some automation scripts between my actual job and I am good at it. I don't write actual software. Can someone explain to me why python has those retarded specific directories with specific versions for each shit? I get the idea that you might want to stop supporting some function at some point but even for that you could just have multiple versions of libraries installed on pc? And programs could default to the latest available version?
>>
>>107983872
>model under 5b for jp/ko/zh translation
A dictionary + your mind
>>
>>107986301
>>107986301
>>107986301
>>
>>107986290
Dependency collisions. But under normal circumstances you just have separate venvs.
>>
>>107985575
>Run-on sentence character card with spelling mistakes and no capitalization
It's a miracle the model is able to produce anything at all :vomit_emoji:
>>
>>107982605
I am, it still fails, like looking at the terminal during execution it finally looks like ksampler actually starts, and then BOOM unexplained server crash.
Both Gemini and Claude telling me RDN4 is still fucked for now and i have to wait.
I'm pissed but there's also not much i can do about it.
>>
File: file.png (87 KB, 1111x466)
87 KB
87 KB PNG
>>107986802
are you just doing text or do you need image? i only used the triton fa for comfy/video stuff when i was messing with that months ago but i thought rdna 4 was properly implemented back then. youre probably better off using llama cpp they use rocwmma for the fa implementation you can build with a flag.
>>
>>107986858
I'm currently struggling with an image to image workflow i have working perfectly on a 4070, but fails on the 9070xt.
I'm using ComfyUI, i already have triton.
>>
>>107986896
have you tried using the version built into torch this is what i have in my comfy env launch script
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export FIND_MODE=FAST
export PYTORCH_TUNABLEOP_ENABLED=1
export MIOPEN_FIND_MODE=FAST
export GPU_ARCHS=gfx1100
export FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE


python dlbackend/ComfyUI/main.py --use-flash-attention --reserve-vram 1.2

i dont think i ever got sage attention to work though, tea cache worked and the fa definitely worked
>>
>>107986896
AMD just works though?
>>
>>107979507
What a downgrade
>>
>>107980392
Do the smart thing and use the rocm container



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.