[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: uuuuuuuuuuuuuuuuuu.jpg (332 KB, 960x2304)
332 KB
332 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107895444 & >>107886414

►News
>(01/15) PersonaPlex: Voice and role control for full duplex conversational speech: https://hf.co/nvidia/personaplex-7b-v1
>(01/15) Omni-R1 and Omni-R1-Zero (7B) released: https://hf.co/ModalityDance/Omni-R1
>(01/15) TranslateGemma released: https://hf.co/collections/google/translategemma
>(01/14) LongCat-Flash-Thinking-2601 released: https://hf.co/meituan-longcat/LongCat-HeavyMode-Summary
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107895444

--Clarifying misconceptions about Recursive Language Models vs looped models:
>107897758 >107898023 >107898125 >107898521 >107898863 >107898939 >107898971 >107899130 >107899531
--GPT-SoVITS ecosystem expansion with new C++ implementation and ONNX inference engine:
>107902610 >107902830
--Bypassing AI model restrictions and verifying voice cloning functionality:
>107900791 >107900808 >107900831 >107900843 >107900855 >107900920 >107900937 >107901091
--Pocket TTS optimization struggles with Rust porting, quantization, and model prewarming:
>107903434 >107903463 >107903487 >107903528 >107903563 >107903677 >107903699 >107903807 >107903866 >107905313
--Pocket TTS's CPU efficiency and streaming features vs. other tiny TTS models:
>107899179 >107899247 >107899223 >107899243 >107899341 >107899315 >107899412 >107899448 >107899619 >107899405 >107902954 >107903061 >107903076 >107903127 >107903151 >107904257 >107900698
--Pocket-TTS performance and comparisons with other TTS models:
>107902214 >107902248 >107902285 >107902411 >107902462 >107902519 >107902538 >107903337 >107902316
--Configuring PocketTTS to use local models instead of Hugging Face:
>107903257 >107903304 >107903339 >107903325 >107903499 >107903594 >107904217
--Debating the future of finetuning vs benchmark-focused models:
>107899672 >107899699 >107899735
--Technical challenges in Multi-Token Prediction implementation and speculative decoding layers:
>107903891 >107903954 >107903973 >107903998 >107904083
--Character recognition and censorship limitations:
>107898282 >107898322 >107898369 >107898393
--Miku and Teto (free space):
>107895617 >107895863 >107895984 >107896814 >107896869 >107897523 >107898393 >107900219 >107900580 >107902299 >107903267 >107903999 >107904030 >107904049 >107905260

►Recent Highlight Posts from the Previous Thread: >>107895448

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
sex with long Miku
>>
>>107906356
>Does this still use ram/vram?
I'm not 100% sure but I think not.
And if I'm right, then it will once they do implement MTP.
>>
>>107906367
why are her panties down if she is still has her shoes on?
>>
>>107906367
is that her spare egg cell container???? just how pregnant does this bitch want to get?!?!!?
>>
File: 1739732836890615.mp4 (847 KB, 1920x1920)
847 KB
847 KB MP4
>>107906471
>>
https://huggingface.co/KevinAHM/pocket-tts-onnx
>>
File: 1636834077450.gif (163 KB, 248x224)
163 KB
163 KB GIF
>>107906479
Fuck... I've spent the past 9 hours trying to make my own onnx + cpp port. Just fucking kill me.
>>
>>107906503
just use llama
>>
>>107906503
Did you learn anything about the conversion? He doesn't seem to post the conversion scripts (for this or the soprano models).
I'll start shoving it into my tts thing before going to sleep. The tokenizer is going to be fun.
>>
File: WhyHuhGIF.gif (1.51 MB, 498x279)
1.51 MB
1.51 MB GIF
>>107906503
>>
File: eek.png (17 KB, 866x248)
17 KB
17 KB PNG
>>107906531
Yeah I have two separate conversion scripts... that work... I think... Idk, I'm second guessing myself because the output is completely different than this guy's.

I could share them in catbox, but beware, it's vibe coded dogshit and you may need to manually point it to a different location for the yaml config because... I don't even want to explain.
>>
https://files.catbox.moe/r08vf6.py
https://files.catbox.moe/dc6oul.py
>>
>>107906573
>I could share them in catbox
If you want, sure. Thanks. I've only converted some nvidia models for voice identification and i'm not sure where to start for the usual .safetensors. It'd give me a point of reference at the very least.
>>
Why spend time on pocket tts though?? It's not very good
>>
>>107906614
Because it runs on fast on cpu and supports voice cloning which means you can save your vram for your llm. And the voice cloning is great, actually, if you choose the right voice and have a high quality sample.
>>
>>107906614
For the same reason I implemented kittentts after doing piper and kokoro about a year ago. And then supertonic and soprano. Because I like it.
>>
>>107906285
Prove that you can do better
>>
>>107906643
based kitten enjoyer. niggas in this thread don't have the aryan spirit to understand the beauty of min-maxxing.
>>
>>107906597
Looks much more involved than what i had to do for the speaker verification stuff. I'll read through it to see if i learn something. Thanks.
>>
>>107906659
yw bro. I'm glad if someone can at least get some use out of it.
>>
I just watched this random video about new findings related to aphantasia and they talked about how the same brain regions activate between those who have aphantasia and those who don't, including a specific region used when "visualizing" a scene. Both those with aphantasia and those without it have that region active. Even though that region is activating, the actual person with aphantasia themselves don't report the experience of visualization. Actually what's going on is that the connectivity between that region, and their frontal parietal lobe, is not a high. That's why they don't experience it. It explains why they can still perform similar tasks as those without aphantasia. In other words, their brains use a different "strategy" even for tasks we normally think are demanding of the ability to visualize.

It's interesting to think of what this implies for machine intelligence. While we still need to architect specialized connections and specific parts of a network for specialized functions, in order to reach AGI, current architectures can still be incredibly flexible even without all that complexity, hence why transformers have been so successful and can sometimes give the illusion of spatial understanding. Arguably, some LLMs might in fact have some weak spatial understanding of things. So weak and fragile that we believe it's only just an illusion rather than merely weak.
>>
>>107906597
Thanks for these.

>>107906603
do you happen to have a snac to onnx converter?
I couldn't figure it out, and the guy who puts them on hf didn't provide the script.
>>
File: dd2d.png (1.38 MB, 871x862)
1.38 MB
1.38 MB PNG
My webui.bat keeps trying to run Python 3.13 instead of 3.10 as I want it to due to compatibility issues. I've spent all day trying to solve this but I just can't bros...
>>
>>107906715
Nope. Didn't even know about that codec.
>>
>>107906744
set up a virtual environment using uv. uv is god-tier software unironically.
>>
>>107906744
retard-kun... just use pyenv or uv...
>>
>>107906744
Retard. The webui.bat for what?
>>
>>107906744
never install more than one python application in a single environment
it makes mustard gas
>>
python being satan's anus to deal with is unironically a good filter. tech illiterates don't deserve llms
>>
>>107906367
>https://hf.co/collections/google/translategemma
>muh safety
Ok, so it's useless. Why the fuck would I want to use a translation tool that isn't going to translate correctly?
>>
>>107899977
>What about control vectors, seems interesting yet I never see it being talked about.

>>107899987
>general rule of thumb, if it isn't getting talked about then it didn't work most likely

control vectors work for a lot of things. what model and purpose are you talking about?

>>107906791

probably ooba. I think that handles its own conda environment. And forcing it to 3.10 would probably break everything lol.
>>
>>107906828
Only a problem for dunning-kruger linux users. You just run an installer on windows and it works.
>>
>>107906770
uv saved python but unfortunately people who can't into such basic bitch dependency issues are unlikely to make good use of it. a sad fact of life. pearls before swine as it were
>>
>>107906872
I run both wincuck 11 and linux bub, linux is way easier to set up TTS and much faster with LLMs, and they both have occasional python problems
>>
We need a pruned+reaped GLM 4.7. 50b max.
>>
I need 120b trained exclusively on mikusex
>>
I need RAM.
>>
mikoss-120b
>>
>>107907027
There's plenty of RAM available, what you need is money.
>>
File: rom.jpg (246 KB, 800x800)
246 KB
246 KB JPG
I need ROM
>>
Are these just for horny posting or can you feed them your shit and get a studybuddy/maid to help you wiyh random shit or jist mske retarded projects with?
>>
>>107907043
True. They should raise prices even further, and not just on RAM but on your PC, and your internet connection, and make everything need a subscription license. Maybe even tax each breath you take. After all, all you need is just some money.
>>
>>107906967
If I'm genuinely very happy using conda, is there any reason to learn uv? Or is uv just "easy conda"?
>>
>>107907108
It's much faster and downloads in parallel
>>
>>107907027
how much do you have?
>>
>>107907108
not really, for the average user they're solving the same problems. personally I find uv to be more performant and less invasive and annoying to work with though, when I can't get by with standard venvs I would much rather work with uv than conda
>>
>>107906846
>3.10
The only thing I can think of that forces 3.10 is auto11111's stable diffusion webui, which means "wrong general".
Guaranteed Gemini can help you install it if you really want to tho.
>>
>>107906987
win10 iot has patches for years, still. 11 is for actual retards and masochists
>>
>>107907282
this argument applied to win7, but acting superior for using 10 over 11 is newfag silliness. they are the same malware
>>
Vram troglodites getting 3000 series thinking theyre getting a good deal dont know about fp8
>>
>>107907409
Zoomer doesn't know that token generation is memory-bound and that you only need to pack-unpack once during token processing
>>
>>107907461
Here is my source
https://youtu.be/fWqKIntFYqQ
>>
>>107907409
https://github.com/SuriyaaMM/feather
>>
>>107907493
Does that slop work tho?

>Older Hardware (RTX 3090, A100, V100): Forced to use FP16 or FP32, leaving huge bandwidth potential on the table.

No mention of BF16
>>
>>107907409
>fp8
>good
That shit quant is worse than Q8_0 gguf
>>
>>107907529
fp8 is 1 byte, fp16 is two bytes, you can pack two fp8 into one fp16, do operations on fp16 and unpack back into fp8, no bandwidth wasted
>>
>>107907493
>>107907570
When is cuda dev implementing this in llama.cpp?
>>
Why can't I just Q4 my Q4? Fuckin nerds.
>>
>>107907826
you can. it is called a Q2
>>
>>107907830
What about FP4 huh? FP2. FP1, even. Get to work nerds.
>>
>120gb VRAM
>GLM 4.7 UD-IQ2_m
>ERP
>Coding

I don't have many issues even though I'm using such a small quant. The worst I've experienced so far is the thinking getting stuck in a loop for harder problems. Also it occasionally stops generating mid reply. Anyone able to run it at a larger quant? I want to know if these issue persist at larger filesizes.
>>
>>107907939
Do you not also have regular RAM to offload to? You could easily bump up to a higher quant and check yourself.
Granted you'd probably be getting 10-15 t/s TG, but still.
>>
>>107907939
vram? pp? tg?
>>
>>107907959
I've got 64gb but I'm sitting on another 32gb that I haven't installed yet. I haven't tried mostly because I don't want to download another huge model at this moment. I plan on trying though.

>>107907969
>Pro 6000 + 4090
>pp 376.49 t/s
>tg 18 t/s
>>
Anime feet
>>
>>107907939
>UD
ew. try ubergarm quants (needs ik_llama), iq2_kl works well for me
>>
>>107908002
>I've got 64gb but I'm sitting on another 32gb that I haven't installed yet.
Yeah, that's enough to just barely bump up to one of the q4's, just make sure you have mmap disabled.
>>
>>107907969
>>107908002
I forgot to mention the pp is so low because I've offloaded 19 of the experts.
>>107908018
ik llama seems like a hassle but maybe I'll give it a shot.
>>107908030
What's wrong with mmap?
>>
>>107907345
> but acting superior for using 10 over 11 is newfag silliness
It means when I occasionally fire up a windows vm for the rare program that doesn’t work in wine, I can do it without being blasted by as many ads. 10 iot gets security patches til 2032 iirc
>>
mikumikupad mikumikubroke my mikumikusessions
I mikulost everything
>>
>>107908314
its mikuover.
>>
>>107908337
kinda upset because I had like 5 sessions all with different stories but oh well
>>
>>107908347
it's a shame because most other frontends focus on turn based interactions (the so called 'instruct' template), I'm not sure if theres any other GUI that allows free form shit like memepad.
>>
4x48gb of ddr5 makes my windows 10 shit itself, i get bsods with memory errors and non-deterministic app behavior, at both 5600mhz and the standard 4000mhz, any help/experience?
>>
>>107908364
yeah the ability to edit text without having to go through 70 different clicks and to look at token probs and switch is a massive mental switch and allows you to experiment a lot
>>
Any Nemo lovers here?

https://huggingface.co/BeaverAI/Rocinante-X-12B-v1b-GGUF/tree/main

Mistral Tekken without [SYSTEM_PROMPT] tag or... Metharme :^)

(I'm not sure which one fares well with long context.)
>>
>>107908391
How is it different? Also, which one was metharme?
>>
Are there some GLM4.7 presets out there that stops the model from immediately accepting everything I type?

If I try to coerce someone in RP I don't want them to immediately agree but GLM4.7 is too much of a sycophant.
>>
>>107908438
>stops the model from immediately accepting everything I type
welcome to LLMs
>>
>>107908479
Claude doesn't have this problem and didn't have it since at least claude 3.0. GLM4.7 feels smart enough that it should be able to behave the same way given the right settings and preset.
>>
>>107908391
>tasting copper
pathetic
>>
>>107907599
It is in principle possible to cast FP8 to FP16 and to use FP16 instructions.
But I think for Ampere or any other hardware lacking FP8 instructions it makes more sense to define something like a q7.75 format that packs weights and scales into exactly 8 BPW and does not use floating point arithmetic at all.
>>
>>107908368
Faulty ram?
Put this on a stick and run it. (free version is fine)
https://www.memtest86.com/download.htm
On linux you can exclude memory sections on boot, not sure if that is possible on windows.
>>
>>107908597
it works fine if I run an r1 quant, blue screens mostly happen when I'm doing regular stuff like vstudio, it might not even be ram problem. Though I'll have this in mind.
>>
https://github.com/antirez/flux2.c
>It is my first open source project where I wrote zero lines of code.
HOLY FUCK
>>
>>107908368
Intel gen13/gen14 ?
I had that issue until I swapped to a 12th gen.
>>
>>107908677
14900, I used it with new bios and firmware and everything. Thought they've fixed the issues.
>>107908597
Example of app behavior i was talking about (python sometimes dying executing a program, other times doing fine) from event viewer:
Faulting application name: python.exe, version: 3.10.6150.1013, time stamp: 0x62e84c21
Faulting module name: torch_cpu.dll, version: 0.0.0.0, time stamp: 0x67186648
Exception code: 0xc0000005
Fault offset: 0x0000000005dea1aa
Faulting process id: 0x4e00
>>
>>107908676
Based Brahmin
>>
>>107908699
Yeah, I had done the bios/firmware updates and underclocked, my cpu still cooked despite this. Mine was 13th gen.

It might not be the problem though, do the memtest like anon above said, but keep this in mind if you can't find anything as I had very similar problems (plus crashes in games with nvidia related error logs, even though nvidia had nothing to do with it)
>>
>>107908676
very aryan code saar
to the moon :rocket::rocket::rocket:
>>
>>107908438

You can try this dominant control vector. I trained it on glm-4.6 so it might not be perfect with 4.7 if you enable reasoning.

https://litter.catbox.moe/vcuqor9sjpue4um1.gguf
>>
>>107908733
I found that it fails VT3 test in y-cruncher really fast while holding up well in other tests. I'll see if I can fix that with bios tweaks
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4
https://huggingface.co/deepseek-ai/DeepSeek-V4
>>
>>107909044
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>
>>107909044
picture_of_a_tiny_cat_looking_at_the_camera-final_03_backup.png.webm.jpeg.gif
>>
>>107909044
I'm not clicking on any V4 links until February.
>>
why are her pantsu pulled down?
>>
>>107909321
they fell because she wasnt holding them up
>>
>>107909337
isnt her vagina cold tho?
>>
>>107909352
i'm keeping it warm, dw
>>
>>107908597
>>107908733
update: it has been running the VT3 test for an hour without errors after I set the following in bios: 5200mhz, CPU SA 1.20V, CPU VDD 1.35V, VDDQTX 1.3V, DRAM VDD/VDDQ 1.35V, XMP OFF
wait nevermind it failed as I typed
>>
File: poopsock.png (218 KB, 947x935)
218 KB
218 KB PNG
wtf is the poopsock incident??
>>
>>107909044
>1.3t
could've been worse I guess
>>
>>107908438
Is it doing that with reasoning on?
>>
https://vocaroo.com/1bVYii3khDw0
any sovits wizards here?
my first attempt at tuning this
I think it sounds okay, I have relatively small amount of samples (about 2-3 minutes)
is it as good as it gets or can I get it to be better with more samples?
>>
>>107909490
Hard to judge without knowing what it's supposed to sound like. Just train on more samples and see if it gets better.
>>
>>107909523
inflection, pauses and breathyness are fine, I'm asking more about the overall quality, I managed to get the robotic tinge down a lot, I almost can't hear it, but it's still slightly there
here's the real va sample
https://vocaroo.com/1mxwCxBt8VJD
>>
>>107909490
sovits wizard here. What settings did you use to finetune it?
>>
I want something new for pisser mashing. Anything new yet? Still glm?
>>
Nemo will never be replicated again:
https://torrentfreak.com/nvidia-contacted-annas-archive-to-secure-access-to-millions-of-pirated-books/

>NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books
>
>NVIDIA executives allegedly authorized the use of millions of pirated books from Anna's Archive to fuel its AI training. In an expanded class-action lawsuit that cites internal NVIDIA documents, several book authors claim that the trillion-dollar company directly reached out to Anna's Archive, seeking high-speed access to the shadow library data.
>
>[...]
>>
>>107909629
I think I just lowered the learning rate to 0.2 since I don't have that much audio and increased the batch size to 8. GPT part was left untouched.
>>
>>107909633
Remember how there was a lawsuit against Meta for downloading 60TB of illegal books and how nothing happened to them?
>>
>>107909639
You need to increase the vits batch size slightly and test, it'll improve the audio quality without messing up the rest.
>>
>>107909657
>nothing happened to them?
Llama dead tho
>>
>>107909706
That's because Zucc recognized that llama4 was a mistake so he spent hundreds of millions to hire new people who have been doing absolutely nothing for a year now.
It's just the metaverse all over again.
>>
Is zuck the dumbest mf in the AI space?
>>
>>107909728
And llama4 was a mistake because of absolutely dogshit data that they didn't finish cooking into a turd pie
>>
>downloading books is illegal
Meanwhile libraries are barely staying open
>>
>>107909662
my poorfag 3060 taps out at 9 per batch with DE closed and starting the training from my phone remotely
owari da
it does sound a bit better though
>>
>>107909657
What I want to know is what they used those 60TB of illegal books for, because that clearly didn't end up in the llama4 that was released.
>>
>>107909728
>the guy is in the business for decades
>still falling for obvious scums
He is super lucky that facebook is the most popular social media just because.
>>
>>107909728
The new hires can't do anything right now because Zuck decided to tear down the world's biggest GPU farm he just finished building and replace it all with Google TPUs.
>>
>>107909991
>jew
>lucky
lmao
>>
File: IMG_9040.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
>>107909044
> have an old pic for old meme
>>
>>107909981
They obviously didn't release what they originally intended to, that was too legally and ethically 'unsafe' and they couldn't preserve performance without compromising on safety.
>>
best models for edging? https://huggingface.co/collections/zai-org/glm-edge
>>
File: glm47flash.png (43 KB, 726x210)
43 KB
43 KB PNG
>>107910136
GLM 4.7 Flash.
https://docs.z.ai/guides/llm/glm-4.7#glm-4-7-flash
>>
File: glm47flash_bench.png (197 KB, 786x908)
197 KB
197 KB PNG
>>107910151
>In mainstream benchmarks like SWE-bench Verified and τ2-Bench, GLM-4.7-Flash achieves open-source SOTA scores among models of comparable size. Additionally, compared to similarly sized models, GLM-4.7-Flash demonstrates superior frontend and backend development capabilities.In internal programming tests, GLM-4.7-Flash excels at both frontend and backend tasks. Beyond programming scenarios, we also recommend experiencing GLM-4.7-Flash in general-purpose applications such as Chinese writing, translation, long-form text processing, and emotional/role-playing interactions.
>>
4.7v-flash when?
>>
File: d8b-785038133.gif (50 KB, 413x243)
50 KB
50 KB GIF
>>107910151
IT'S ANOTHER 100B MOE FROM CHINA SURELY THIS ONE WILL SAVE LOCAL!
>>
>>107910297
It's around 28B parameters, 3.5B active, or something like that.
>>
Densesissies lost
>>
>>107910326
Their previous "flash" model was something like 10B params, right?
If this new one is smarter than Qwen 30B and is better for RP (not dry as a bone), then I'll gladly take it.
>>
>>107910326
https://github.com/huggingface/transformers/pull/43031/files

        vocab_size: int | None = 154880,
hidden_size: int | None = 2048,
intermediate_size: int | None = 10240,
moe_intermediate_size: int | None = 1536,
num_hidden_layers: int | None = 47,
num_attention_heads: int | None = 20,
num_key_value_heads: int | None = 20,
n_shared_experts: int | None = 1,
n_routed_experts: int | None = 64,
routed_scaling_factor: float | None = 1.8,
kv_lora_rank: int | None = 512,
q_lora_rank: int | None = 768,
qk_rope_head_dim: int | None = 64,
v_head_dim: int | None = 256,
qk_nope_head_dim: int | None = 192,
n_group: int | None = 1,
topk_group: int | None = 1,
num_experts_per_tok: int | None = 4,
norm_topk_prob: bool | None = True,
hidden_act: str | None = "silu",
max_position_embeddings: int | None = 202752,
initializer_range: float | None = 0.02,
rms_norm_eps: int | None = 1e-5,
use_cache: bool | None = True,
pad_token_id: int | None = None,
bos_token_id: int | None = 0,
eos_token_id: int | None = 1,
pretraining_tp: int | None = 1,
tie_word_embeddings: bool | None = False,
rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None,
rope_interleave: bool | None = True,
mlp_layer_types=None,
attention_bias: bool | None = False,
attention_dropout: float | None = 0.0,


Also see https://github.com/vllm-project/vllm/pull/31386/files
Llama.cpp? In 2 weeks, maybe.
>>
>>107910350
>glm4_moe_lite_mtp
Maybe when ngxson finishes implementing MTP.
>>
>>107910350
>Llama.cpp? In 2 weeks, maybe.

Probably exl3 tomorrow if it's like the last GLM release.
>>
https://huggingface.co/zai-org/GLM-4.7-Flash
>>
>>107910478
Let's fucking go.
>>
>>107910478
It's so over
>>
>>107910478
cockbench when?
>>
File: 1752702886144643.gif (2.84 MB, 442x250)
2.84 MB
2.84 MB GIF
>>107910478
>30b
finally something in the middle size, was it too much to ask for?
>>
>>107910578
its moe 3b active tho
>>
>>107909991
>facebook is the most popular social media
in 2008 maybe, it's been a while it's not the case anymore
>>
>>107910478
I wonder if bitsandbytes 4-bit quantization works with it or if I'd waste time.
>>
>>107910478
>GLM4.7 is trained and optimized for agentic coding, not for explanation and back-and-forth chatting
mega oof
>>
>>107910639
>Beyond programming scenarios, we also recommend experiencing GLM-4.7-Flash in general-purpose applications such as Chinese writing, translation, long-form text processing, and emotional/role-playing interactions.
>>
>>107910656
>>107910639
So is she gonna tool call for kiss or what?
>>
>>107910670
>model is waiting for kiss tool response
>>
>>107910082
Instead of poaching employees, would have been far cheaper and more effective to simply fire the safety department.
>>
>>107910687
>police, we found a terrorist
>>
File: 88998ab338346b14.gif (1.05 MB, 500x324)
1.05 MB
1.05 MB GIF
>>107910670
>>107910674
>>
>>107910706
PLUG ME INTO THE MIKUTRIX
>>
>>107910706
add scissors or razor blades to whatever the metal thing on the bottom is and we're back
>>
File: 1768498987535869.jpg (829 KB, 1125x2000)
829 KB
829 KB JPG
>>107910722
>>
File: thumb-1920-312180.jpg (210 KB, 1680x1050)
210 KB
210 KB JPG
>>107910722
100% would kiss
>>
>>107910687
Too many Karens and True Believers willing to blow the whistle or who just talk too much for that; they had to start over with a much smaller team, but I don't think Zuck did it right by just hiring ML superstars.
>>
>>107910722
intriguing taste
>>
>>107910478
Judging from the 4.7 safety debacle I wonder how safe they made the one they know coomers might actually use
>>
File: 1754912761124730.png (857 KB, 1906x1493)
857 KB
857 KB PNG
Will they release sora 2 locally once they go bankrupt? kek
>>
>>107910794
4.7 isn't safe though? Or if it is then it is smart enough to compensate for the safety lobotomy.
>>
File: 1768497266743516.jpg (1005 KB, 1916x3230)
1005 KB
1005 KB JPG
>>107910770
>>
>>107910478
>Glm4MoeLiteForCausalLM
is there actually a difference in the model architecture between this and any of the other glm4moe models? could you just change the name of the architecture in the config.json and it would just work?
>>
>>107910836
they wouldn't why break compatibles for no reasons?
>>
>>107910478
where's da goof?
>>
>>107910776
At this point I absolutely believe my schizo theory that the only purpose of safety is to make people suffer. The end goal of this technology is to fire everyone here and make them work on an assembly line / die in a ditch. I still don't know if it is gonna happen but if it is then at least I could have fun with something that will inevitably kill me or make my life miserable. But no. You can't have fun. And calling it "safety" is another layer of dystopian doublethink.
>>
>>107910853
>schizo theory
at least you're self aware
>>
File: file.png (336 KB, 1993x860)
336 KB
336 KB PNG
I wanted to see if a server motherboard would be much faster than my gaming one, so I rented a server on vast.ai to see.
>>
>>107910930
was this with memory offloading? if so, it makes sense. the server board has quadruple the memory bandwidth of your gaming motherboard.
>>
>>107910810
>release sora 2 locally
please don't
the world doesn't need an infinite tiktok generator
>>
>>107910950
I wanted to see how much slowdown I was getting for using a usb pcie 3.0 riser.
>>
Let the 2 week counter begin.
>>
>>107910994
jesus. yeah you need to make some changes.
>>
>>107910950
>quadruple the memory bandwidth
Is it supported in llama.cpp?
>>
>>107911022
yeah. it's just a motherboard.
>>
>>107911028
No, it's Numa fuckery
>>
>>107911043
the romed8-2t is a single cpu motherboard.
>>
>>107911048
Anon...
>>
File: file.png (129 KB, 488x369)
129 KB
129 KB PNG
>>107911007
I'm waiting for one of those m.2 to pcie adapters. Then at least I will be able to configure the motherboard back to pcie gen 4.
>>
>>107910478
My penis refuses to get hard for small models.
>>
>>107911048
How many numa nodes are in one Epyc CPU?
>>
>>107911080
My penis only gets erect with complex agentic setups, which can only achieve tolerable speeds with small models. 100t/s is a bare minimum
>>
>>107911120
>complex agentic setups
You can't convince me that this meme can be used for sex and is actually good.
>>
>>107910478
anything sub 12B Active is useless for long context tasks.
>>
>>107910810
Sora 2 before they cucked it was really really good.
>>
>30ba3b

come the fuck on
>>
>>107910478
>30B A3B alternative
ogey
>>
>>107911120
enjoy your robotic arm tool calls
>>
>>107911130
llms work much better when you narrow down their tasks. You can give additional instructions for specific situations and use long-term planning so that the model forces itself to move to the next plot point instead of getting stalled and repetitive. You can fix every problem with a minimally intelligent llm by inserting a good, detailed prompt with correctly determined conditions. Writing all instructions in a single prompt will only confuse the model, so an agentic approach is the only rational choice
>>
>>107911255
No, retard. It should write a symphony when I say poo poo pee pee.
>>
>>107911290
this, but unironically
>>
>>107911290
It actually should and "skill issue" posting was always just baiting and trolling.
>>
>>107911290
Why would I want it to write music when I prompt poo poo pee pee? It then, it should be a rap at least, not a symphony.
>>
>>107911255
>llms work much better when you narrow down their tasks
If your task can be narrowed that much, you're better off using a script instead (which will be near instant + 0 hallucination).
>>
File: file.png (116 KB, 629x835)
116 KB
116 KB PNG
>>
>>107910810
>Now let's see ol' Sammy wriggle his way out of THIS jam!
I remember "OpenAI could go bankrupt within months" headlines being widely shared in late 2023. And yet...
>>
>>107911255
It is really cool how you can throw extra compute time to make up for having low number of parameters by taking a task, breaking it down into smaller bits, processing each part individually (sometimes in parallel to make use of batched decoding), etc.
>>
File: 1758290239978536.png (40 KB, 706x155)
40 KB
40 KB PNG
The anniversary of Deepseek R1 will be tomorrow. Things are about to get crazy.
>>
It's always funny seeing actual proof that llmtards are much dumber than imggooners
>>
>>107911393
>deepseek v4
>prices halved AGAIN
>>
>>107911411
ye, cause you interact with you image model the same as a llm totally ...
>>
File: bwackhowl.jpg (22 KB, 750x350)
22 KB
22 KB JPG
How do you hook-up your local model to claude code? Do I need just the proxy or also some frankensteined claude code version too?
>>
>>107911460
You're worse at generalization than 3B
>>
>>107911308
>it's actually intelligent!! Ph.D level stuff!
>only if you're a Ph.D driver
yes, it was always altman cope, agi never
>>
>>107911411
pic not related
>>
>>107911459
prices so cheap you vill use the api
>>
>>107911490
show me your insane agentic rp workflow
>>
>>107911411
isn't cumfartui spyware and malware? Where is the runtime performance and the Photoshop layers?
>>
>>107911290
>poo poo pee pee
>writes symphony
retarded model, not following user instructions at all
>>
>>107911411
>go to /ldg/
>schizos screaming about trannies
>retarded test spammer with unfunny memes
>z-image when?
it's slightly worse than over here but just enough to be unsavoury
>>
>>107911500
Working on it.
>>
>>107908131
I use win11 ltsc on my vfio setup, and I have no ads. I was just using it this morning, because modding Baldurs Gate 2 is a fucking nightmare in Linux.
>>
>>107911528
so this >>107911255
is just as much hopeful vibes as most tard papers that go nowhere, you don't know since you don't do it either
>>
>>107910836
>https://huggingface.co/zai-org/GLM-4.7-Flash/discussions/5
>>
>>107911571
it's not even their moe arch?
>>
>>107911501
yes. it's just corpo slop at this point. extremely disappointed in the direction it went
>>
File: why yes.png (148 KB, 1280x1125)
148 KB
148 KB PNG
>>107911584
>>
>>107911472
check the zai docs
https://docs.z.ai/devpack/tool/claude

there's an automated script you can look at to see how to set it up. it is both config for to set your custom endpoints and models, as well as some config to override the initial onboarding prompt that asks you to sign in to anthropic.
>>
>>107911571
lol i knew it. nobody ever makes new architecture.
>>
the new rocinante makes peak cunny
i know a few anons itt will appreciate that
>>
>>107911689
You also sometimes have stuff like that korean model from a few weeks ago that modify the transformer architecture itself. As a result, it's not supported by anything besides vllm.
>>
>>107911733
The X?
>>
>>107911757
yep
got no refusals so far
>>
File: 1758662804570364.gif (699 KB, 165x163)
699 KB
699 KB GIF
>>107911733
>>
It's funny how "it doesn't give refusals" is still a criteria amongst skillets.
>>
>they don't rape the unwilling LLM
what are you even living for
>>
>>107911789
it's a criteria for anyone models that can give refusals are more tarded even when they do comply
>>
>duh refusals don't matter! just edit the model output to what you want!!
>>
>>107911789
refusals = i'm sorry but i will not engage in harmful behavior [...]
my rape cards work as expected, mind you
>>
>>107911733
>>107911757
organic
>>
File: 1749224080876048.png (19 KB, 1047x293)
19 KB
19 KB PNG
>>107911818
yep
>>
>>107911584
>>107911689
>>
>>107911857
>fizz
aie
>>
>>107911865
lmao
>>
>>107911789
>there are frying pans posting ITT
horrifying
>>
>>107911865
Still seething about our based anti 'p queen?
>>
>>107911882
I support the pannocaust, only pure pots should be allowed to post
>>
File: file.png (326 KB, 900x796)
326 KB
326 KB PNG
>>107910560
This was with Q4 from here https://huggingface.co/ngxson/GLM-4.7-Flash-GGUF/tree/main
I'm downloading safetensors to check if what happened at the end is a quant issue.
>>
>>107911913
>barely above a whisper
but otherwise looked quite decent until "I asleep?"
>>
>>107911913
>check if what happened at the end is a quant issue
that's typical glm, I don't believe anyone who says they don't have that happen often to them
larger MoE GLM do this less but will enter endless thinking loop instead
>>
>>107911913
and?
>>
>>107911472
You can change where claude code sends requests with the following env vars

ANTHROPIC_AUTH_TOKEN=
ANTHROPIC_BASE_URL=
ANTHROPIC_DEFAULT_HAIKU_MODEL=
ANTHROPIC_DEFAULT_SONNET_MODEL=
ANTHROPIC_DEFAULT_OPUS_MODEL=
DISABLE_NON_ESSENTIAL_MODEL_CALLS=1
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1


The last 2 aren't strictly necessary.

You can also set these permanently for claude by editing one of your claude settings files, for example at
~/.claude/settings.json


{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:4141",
"ANTHROPIC_AUTH_TOKEN": "dummy",
your other env vars here etc
}
}


Note that Claude uses the Anthropic API format, which differs from the OpenAI API format that everyone else uses. Because they're a special snowflake, you'll probably have to run a proxy or something to convert requests to OpenAI API format. One such project you can run to do so is https://github.com/musistudio/claude-code-router
>>
File: model is funny lol.png (528 KB, 1516x6160)
528 KB
528 KB PNG
>>107911913
>>
>>107911501
legit the most insecure program you could install on a pc, and these retards still do it without using docker or anything lule, and on fucking windows too
>>
>>107912114
somebody posted this article yesterday
https://www.scworld.com/brief/malware-distributed-via-comfyui-server-exploits
>>
>ai-generated article
>>
>>107912114
>>107912162
ComfyUI itself is fine, it's the retards auto installating 100x custom nodes because sanjeet told them to in a blog that get got, there's literally no need to use a single custom node if you understand how the program actually works and can build your own workflows
>>
>>107912200
>literally no need to use a single custom node
Bullshit. Comfy has no built-in control flow nodes, no latent size presets, no gguf support, no yolo/sam nodes, etc. Even no image post-processing nodes (adjusting brightness, contrast, etc) in an image generation software!
>>
>>107912200
>ComfyUI itself is fine
it's not. the default shit is retarded and the entire thing is kept alive by custom nodes. all the frontend changes made the UI/UX much worse. runtime performance is a slideshow which makes navigation around a graph absolute cancer and the keep shoving telemetry into it. when you understand how comfy works the more you realize it's a heap of shit and we need something else
>>
go back to shitting in /sdg/
>>
>>107912200
some custom nodes are really needed to make cumrag not suck though
comfy doesn't even have a way to do prompt editing from text, to do what [dog:cat:1] does you need to create a new dog with dog, a node with cat, and a ksampler sending its result in another ksampler in another ksampler in another ksampler..
or you could install
https://github.com/asagi4/comfyui-prompt-control
and get A1111 style prompt editing, plus the option to use the saner weighting of A1111 etc
cumrag doesn't even have nodes to concatenate text, or do very basic integer math or anything you'd want to build stronger automations in your workflows
care to show me the cumrag way to do X/Y plots without custom nodes?
I use the Inspire Pack among other things to autoload a list of prompts that contain various artist mixes to let me pick what fits a pic best etc
I actually love the node editor concept, it's one of the great powers of Blender and it's fine in Unreal Engine
but cumrag lacks the most basic of primitive nodes from which you could make truly usable, actually comfy workflows
>>
>>107912333
nodes are a feature not the only interaction in entire app. substance is the only thing I can think of that has nodes only but painter is the manual heavy lifting that uses substance graphs.
>>
>>107906367
You gens are trash, fuck off of /g/
>>
So my favourite erotic novel writing model is Mistral-Small-3.2-24B-Instruct-2506. Its output is good enough, and my PC runs it decently.

My issue with it is it basically only works well when writing in English. They say it supports Polish but it's actually trash at it in comparison; constant grammar issues, also output quality is so much worse (it begins to read like a kids book).

Does anyone know any model that's actually capable of writing in multiple languages and can fit on 16 GB RAM?
>>
>>107912308
>Control flow
Just lazy shit you don't need it, you can construct your workflow in a linear fashion
>Presets
Same as above
>Gguf support
Meme shit in image/vid/audio gen, safetensors work fine
>Yolo/Sam nodes
More meme shit you don't need that you can workflow yourself
>Image post process nodes
More meme shit, image editing programs decades old can do this, there is even automation for them but why would you automate post process which is a subjective artistic touch, you don't benefit from this unless you are a slop spamming jeet
>>107912314
>Performance
Works on my machine
>Telemetry
laughably easy to stop and in any case what's stopping you from using an old version of comfy, you can even integrate new model support yourself, stop being lazy
>>107912333
>Shilling A111 the shit that was spyware before comfy even existed and does nothing better other than being more accessible to retards
>Prompt editing
Just write your prompts manually you fucking retard holy shit, how is every reply to my post so successfully proving my point, you don't need any of that shit, any malware on your machine is a result of your own retardation
>>
>>107912388
Bóbr kurwa
>>
>>107912388
It's not a very smart model, and won't be usable at long context, but nothing, absolutely nothing beats Gemma E4B in multilingual. It's even better than the 27B Gemma 3 in that regards. Punches way above its weight and you will need to get into very large MoE territory before it gets better than E4B.
>>
>>107912395
how much do you charge an hour? I need a shill to help me market my new app
>>
>>107912395
>Just write your prompts manually
you clearly don't know what prompt editing means retard
doing this in cumrag means creating a new ksampler for every timestep you're alternating your prompt
if you think this is a valid use of your time kill yourself, become an hero, now
>>
>>107912423
>Doing this is a waste of time you don't understand I NEED ghuptars mega prompt editing node instead of just adding a new step in my workflow, wtf why is my machine mining bitcoin fucking comfyui reeeeeee
>>
>>107912423
what dogshit workflow are you even running?
>>
we NEED GLM 5.0
>>
>>107912494
These retards don't even understand how the program or the nodes they're installing work, they just download workflows from ww1.goodmorningworkflows.siksha and they have the cheek to talk down to anyone else tech illiterate enough to download malware
>>
File: file.png (180 KB, 657x765)
180 KB
180 KB PNG
kimi is testing a new vl model
>>
>>107912539
the software itself destroys ssds so I don't really know what you are trying to put down here
>>
>>107912556
Fucking lmao, that's it, post hand
>>
my specs:
- RYZEN 9 7900
- 32 GB DDR5
- RTX 3090

i'm looking primary for coding agents, what do you recommend to run locally in my computer? i dont care too much about time as while i can save money from using the cloud based code agents.
>>
>>107912556
lolwut?
>>
Take your discussion to /ldg/
>>
>>107911913
>with a consensual look in my eyes
>>
>>107912591
>i dont care too much about time
Kimi K2 with a lot of swap memory.
>>
>>107912609
stop
>>
>>107912591
32gb is too little to run toss-120b. So, choose between 30b models like qwen, glm-flash, toss-20b.
>>
>>107911946
It (endless repetition; can be word, sentence, paragraph) happened quite often for me with Air at Q4_K_S. Never had it happen with Q4_K_L.
>>
>>107912613
It's the equivalent of saying no homo so it isn't gay. The consensual look makes it not rape.
>>
>>107912613
lmao
>>
>>107912591
qwen 2.5 coder 32b, qwen 3 coder 30b, llama 3 70b
>>
>>107912591
GLM 4.7 Flash
>>
>>107911913
Nice :rocket: perfectly to ship gorgeouses!!
>>
File: file.png (23 KB, 499x173)
23 KB
23 KB PNG
>>107912721
heck yeah sir!
>>
>>107912609
There's no discussion to be had with those tech illiterate slop merchants, I think I'll talk about the software used to run models locally on the local model general
>>
ah it's another episode of this, great
>>
>>107912395
Genning images is meme. All you need is hands.
>>
Oh no you don't get to autisticallt >>107912802
dictate what gets spoken about in the relevant public forums, oh the depravity
>>
>>107912825
as always when this happens
>a general dedicated to the discussion and development of local language models.
>>
>>107912802
I blame the Long Miku
>>
>>107912867
Of course you do Chris.
>>
I have to patch mikupad because it's lagging with 10k tokens
why is JS like this
>>
File: file.png (119 KB, 883x599)
119 KB
119 KB PNG
>>107911913
This is with the official weights on VLLM.
>>
>>107913005
I get that VLLM is optimized for high throughput but it's kinda sad how many features it's lacking compared to llamacpp.
>>
>>107911913
>>107913005
grim
>>
>>107910810
Does this mean that if they do go bankrupt that all that ram that they purchased will be sold off and future orders canceled?
>>
>>107913005
ok so another totally fantastic at benchmarks prune that's useless after a minute
>>107913038
no
>>
>>107913038
No, their competitors will buy up the entire stock
>>
>>107913005
i'm starting to think that their big model was just a fluke or something happened internally
how hard is it to deliver something usable these days?
>>
>>107913043
>no
Aww.
>>
>>107913038
They didn't buy RAM, they bought fab capacity.
>>
>>107913056
>how hard is it to deliver something usable these days?
Sorry bro, benchmaxx to the moon.
>>
>>107913065
They definitely bought RAM in the past. How do their datacenters work otherwise?
>>
>>107913043
>ok so another totally fantastic at benchmarks prune that's useless after a minute
It's because of the A3B MoE. I get that they're the new cool kid on the block, but seriously if your model is only 32B, just keep it dense.
>>
>>107913065
>>107913038
specifically
>OpenAI secured up to 40% of the world's DRAM production capacity by signing agreements with Samsung and SK Hynix, which allows them to purchase raw DRAM wafers for their Stargate project. This move significantly impacts the RAM market, leading to rising prices and reduced availability for other industries.
>>
The bubble is too big to pop.
>>
yet another big win for moesissies
>>
>>107913094
vibecoded software says hi
>>
>>107913085
I think the model's "hidden size" in the configuration tells a clearer story than the number of active parameters directly.
>>
>>107913094
this but unironically
Jewish funny money isn't real so printing a few trillion to prop it up indefinitely is no real effort
Inflation only hurts peasants, not real people
>>
>>107911757
New? Where?
>>
>>107913467
beaver
>>
>>107913470
Found it, tnx
>>
File: file.png (659 KB, 1596x1259)
659 KB
659 KB PNG
>>107913005
It's like gptoss.
It degrades outside of the chat template.

The black line is where I started generating after deleting the template on the right.
>>
>>107913517
>It degrades outside of the chat template.
sounds like a theme going forward yay..
>>
File: 1739355938496204.jpg (539 KB, 1362x1145)
539 KB
539 KB JPG
I'm glad that glm 4.7 flash came out but I'm disappointed its an 30B-A3B instead of a 106B-A12B like 4.5 air was.

I would strongly prefer if labs would exclusively train and release models that exactly fit the upper bound of the hardware I own.

>model smaller that my hardware = worthless toy for normies
>model larger than my hardware = meme datacenter-only model
>>
>3b active
lmao why
>>
>>107913563
to placate the air begging
>>
>>107913577
we wanted air, not thin air
>>
>>107913561
true
>>
>>107913592
>wanted
now maybe you'll learn to let them cook and not be ungrateful, else you get this
>>
>>107913561
I enjoy using small models as my daily driver and only use big models for erp
>worthless toy
gets the job done
>>
>>107913637
spanking your cock is not a job.
>>
>>107910853
>At this point I absolutely believe my schizo theory that the only purpose of safety is to make people suffer

"Only" is a bit much. I'd say the main purpose is to signal loyalty and conformance.
>>
>>107913637
Funnily enough, I do the exact opposite
>>
can i sue glm 4.7 flash with a 4060 16go and 48 ddr4 ram in a okay speed ?
>>
>>107913740
>can i sue glm 4.7 flash
anon I know the model is bad but come on it's free
>>
File: file.png (27 KB, 550x254)
27 KB
27 KB PNG
API sisters hiding in our ranks, not like this...
>>
>>107913790
Who tf prompts pizza in cloud IDEs??
>>
>>107913804
a disgustingly high amount apprently
>>
>>107913790
those underaged letters...
sickening!
>>
>>107906367
Her expression is really good but shading is unfortunate.
>>
Just finished my cpp+onnx port of Pocket TTS.

One cpp file.
One binary.
All dependencies statically linked.
Fully portable.

100m parameters, voice cloning, runs fast on CPU (but I could improve it still)
>>
>cannot improve mikupad performance further without rewriting the entire rendering engine
insane
>>
just stop using JS and webui
>>
it's more about the entire thing being a 8k LoC mess
it needs to be rewritten in proper react
>>
>>107914007
What text rendering engine can display 10,000+ words with advanced (markdown) rendering FAST?
Qt+QML can't, even though its html support is limited. I tried.
>>
>>107913790
>csam
Call it child porn. Oh no, you won't because then everyone will see that it doesn't involve any real children and call you retarded.
>>
>>107914056
write your own in rust
it has fearless concurrency and zero cost abstractions
or was that go? I get my memelangs confused
>>
>>107913790
RIP random slop apple store app users I guess.
>>
>>107914092
what's the c for do you think?
>>
>>107914094
It definitely isn't go because go doesn't have abstractions.
>>
>>107914114
probably rust then
is writing something like that hard?
>>
>>107914056
ImGui with https://github.com/enkisoftware/imgui_markdown
>how fast
insanely fast, I use it in vr app
>>
>>107914241
Interesting. But how is imgui's support of advanced scripts? IME? I mean, things like Japanese where you first type one thing (kana) than convert it into another (kanji) with IME. Smaller gui libraries often struggle with such stuff.
>>
>>107913043
cockbench is deeply flawed for new models since it's pure text completion. of course something heavily RL'd with chain of thought and back-and-forth convos won't know what the fuck to do with the context
>>
>>107914241
Doesn't seem like it supports code blocks or syntax highlight. Just like Qt, it supports only features that span a single line.
>>
>>107914293
which in itself is interesting to know, means it's completely fried to fuck
>>
>>107914293
It just means that the model is fried.
>>
what models are there even to look forward to? I mean /ldg/ has z image base, what do we have?
>>
>>107914343
DSv4
>>
I haven't been here in a hot minute, is there really no news?
I thought it was over before. I didn't realize just how over it could get.
>>
>>107914358
Can anyone even run it though?
>>
>>107914370
me :)
>>
>>107914277
>>
>>107914370
do you not have 200GB+ of RAM?
>>
>>107914315
>>107914317
It doesn't, like models freaking out if they don't see just one BOS in the right place doesn't. Use it as intended, it's not 2023.
>>
>>107914400
>we overtrained this fancy autocomplete so much that unless you put these meaningless(and totally different for each model) placeholders perfectly it will just break
>>
>>107914400
>Use it as intended, it's not 2023.
I will not submit to OAI chat completion madness.assistant
>>
>>107914395
How fast would it even run though? I'm not interested in 1-2tk/sec no matter how good the model is.
>>
File: investors and Sam Altman.jpg (217 KB, 2000x1334)
217 KB
217 KB JPG
>>107910810
>bro just 2 more billions, trust me we've achieved AGI internally, just two more billions for testing bro
>>
>>107914395
We don't even know yet how big it will be, but it's unlikely to be less than 1 or 2 trillion params.
>>
>>107914473
it will have a magic new architecture and only 100B params
trust the plan
>>
>>107910810
>2025
>government spends billions to prop up OAI in the AI race
>2026
>IPO enriches early private equity investors and the c-suite
>2027
>declare bankruptcy and let retail investors eat the losses
God, I love capitalism.
>>
Seriously. They need to create an architecture that has high fluid intelligence, and only basic crystallized intelligence, but has ability to use ssd-loaded knowledge databases efficiently.
>>
>>107914528
https://github.com/deepseek-ai/Engram
>>
>>107914510
I see you've played this game before.
The ride never ends...
>>
File: le stare.png (279 KB, 706x221)
279 KB
279 KB PNG
>>107914528
It's almost like somebody was saying that LLM architecture is a dead end for years...
>>
>>107914528
They need to start creating something else than llms
>>
>>107914539
promising but hangs on a big IF
and that is if they can make the gate work for this i.e. when does the model reach into the memory and when does rely on its smarts
>>
>>107914559
How's the V-JEPA deadend going?
>>
File: DS-HF_activity.png (87 KB, 1187x860)
87 KB
87 KB PNG
>>107914539
There's been quite a bit of activity on DS's HF page in last week or so.
>>
>>107914528
>>107914539
https://www.tomshardware.com/tech-industry/artificial-intelligence/deepseek-touts-memory-breakthrough-engram
>>
File: file.png (672 KB, 1280x720)
672 KB
672 KB PNG
>>107914575
>>
>>107914570
If I see a car with a fucked up engine I can deduct that it won't drive. I don't have to build a better car for that.
>>
>>107913933
Based.
>>
>>107914575
It's almost like their anniversary is coming up soon or something
>>
Where's air? I can't breathe
>>
dipsy v4 will be optimized to run on chinese gpus
>>
engram.gguf?
>>
yeah it's shit
>>
can we has deepseek v4 air?
>>
File: 1751739377929613.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>107914595
This image would hit harder if NVDA wasn't at $186 rn.
>>107914606
Let's hope new launch soon.
>>107914642
I don't think the "run" part is the issue, it's the "train" part that's been difficult.
>>
File: Untitled.png (13 KB, 837x513)
13 KB
13 KB PNG
>>107914740
>>107914740
>>107914740
>>
File: 1739088573683987.png (288 KB, 563x562)
288 KB
288 KB PNG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.