/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/26/24(Thu)05:11:22 No.103649764

File: ComfyUI_temp_xhgsl_00038_.png (1.06 MB, 832x1216)

1.06 MB PNG

/lmg/ - Local Models General Anonymous 12/26/24(Thu)05:11:22 No.103649764 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103644379 & >>103638036

►News
>(12/25) DeepSeek-V3-Base 685B released: https://hf.co/deepseek-ai/DeepSeek-V3-Base
>(12/24) QVQ: 72B visual reasoning model released: https://qwenlm.github.io/blog/qvq-72b-preview
>(12/24) Infinity 2B, bitwise autoregressive text-to-image model: https://hf.co/FoundationVision/Infinity
>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/26/24(Thu)05:11:43 No.103649767

Anonymous 12/26/24(Thu)05:11:43 No.103649767

File: __kagamine_rin_vocaloid_d(...).jpg (201 KB, 706x907)

201 KB JPG

►Recent Highlights from the Previous Thread: >>103644379

--Local vs cloud computing, API usage, and the future of model deployment:
>103647690 >103647701 >103647797 >103647883 >103647912 >103647967 >103648025 >103648067 >103648113 >103648126
--Testing Deepseek's capabilities and limitations, comparing to Claude:
>103644658 >103644698 >103644738 >103644754 >103644779 >103644816 >103644876 >103644835 >103644896 >103644923
--Discussion of language models and their capabilities:
>103648286 >103648296 >103648315 >103648321 >103648330 >103648355 >103648385
--DeepSeek V3 effectiveness and limitations:
>103644403 >103644423 >103644470 >103644559 >103644512 >103644543 >103644590
--Optimizing DeepSeek V3 performance and reducing repetition:
>103648755 >103648783 >103648811 >103648830 >103648857 >103648874 >103648883 >103649020
--Bots stopping mid-sentence during generation and potential solutions:
>103644937 >103644996 >103645178 >103645349 >103645723
--Anons discuss and compare SSDs for DeepSeeKV3:
>103644429 >103644507 >103644539 >103644691 >103644757 >103644630
--Anon weighs GPU options for coding use case:
>103646018 >103646042 >103646043 >103646067 >103646151
--Anon asks for AI model to tag large video collection:
>103644683 >103644708 >103644740 >103644751 >103645219
--Qwen 1.5 MoE vs non-MoE model comparison:
>103648423 >103648429
--DeepSeek V3 discussion and dragon story example:
>103646053 >103646086 >103646125 >103646138 >103646161 >103646193 >103646163 >103646192 >103646197 >103646201
--DeepSeek V3 usage and troubleshooting discussion:
>103646967 >103646992 >103647158 >103647221 >103648463 >103648469 >103648480 >103649124
--Anon wants to merge Constitutions using an LLM:
>103648078 >103648136 >103648165 >103648932
--Miku (free space):
>103644553 >103644661 >103644732 >103644887 >103644895 >103644911 >103646906 >103647932

►Recent Highlight Posts from the Previous Thread: >>103644382

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
12/26/24(Thu)05:15:22 No.103649789

Anonymous 12/26/24(Thu)05:15:22 No.103649789

Let's take a moment to remember Qwen and how badly they got dunked on by Deepseek.

Anonymous
12/26/24(Thu)05:20:31 No.103649813

Anonymous 12/26/24(Thu)05:20:31 No.103649813

>>103649789
>let's take a moment to remember chinks and chinkshit
Let's not.

Anonymous
12/26/24(Thu)05:21:30 No.103649824

Anonymous 12/26/24(Thu)05:21:30 No.103649824

File: Screenshot_20241226_192104.png (510 KB, 1437x792)

510 KB PNG

>>103649789
https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas
CEO seems based at least. Interesting interview. Gonna post a couple excerpts.

>China should gradually become a contributor instead of freeriding.
>In the past 30+ years of the IT wave, we basically didn’t participate in real technological innovation. We’re used to Moore’s Law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge.
>That’s how the Scaling Law is being treated.
>But in fact, this is something that has been created through the tireless efforts of generations of Western-led tech communities.
>It’s just because we weren’t previously involved in this process that we’ve ignored its existence.
>What we see is that Chinese AI can’t be in the position of following forever. We often say that there is a gap of one or two years between Chinese AI and the United States, but the real gap is the difference between originality and imitation. If this doesn’t change, China will always be only a follower — so some exploration is inescapable.

>Q:But you’re ultimately a business organization, not a public-interest research institution — so where do you build your moat when you choose to innovate and then open source your innovations?
>A:In the face of disruptive technologies, moats created by closed source are temporary. Even OpenAI’s closed source approach can’t prevent others from catching up. So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation. That’s our moat.
>Open source, publishing papers, in fact, do not cost us anything. For technical talent, having others follow your innovation gives a great sense of accomplishment. In fact, open source is more of a cultural behavior than a commercial one, and contributing to it earns us respect. There is also a cultural attraction for a company to do this.

Anonymous
12/26/24(Thu)05:26:33 No.103649851

Anonymous 12/26/24(Thu)05:26:33 No.103649851

>>103649824
Highly based.

Anonymous
12/26/24(Thu)05:27:40 No.103649856

Anonymous 12/26/24(Thu)05:27:40 No.103649856

How does the cache work? Like if I deleted 90% of a chat and start over is the cache still infecting its outputs?
Seems like if the slop infects your API key you need to get a new one.

Anonymous
12/26/24(Thu)05:27:55 No.103649859

Anonymous 12/26/24(Thu)05:27:55 No.103649859

File: chinesemantyping.jpg (131 KB, 1255x837)

131 KB JPG

>Highly based

Anonymous
12/26/24(Thu)05:29:00 No.103649866

Anonymous 12/26/24(Thu)05:29:00 No.103649866

>was going to get 2 5090 for 72b/123b models
>now debating for cpu maxxing instead
please tell me all the deepseek posts are just shills or a meme

Anonymous
12/26/24(Thu)05:29:30 No.103649869

Anonymous 12/26/24(Thu)05:29:30 No.103649869

>>103649866
They're slant-eyed shills.

Anonymous
12/26/24(Thu)05:31:19 No.103649877

Anonymous 12/26/24(Thu)05:31:19 No.103649877

>>103649866
they're state-operated shill agents

Anonymous
12/26/24(Thu)05:31:33 No.103649879

Anonymous 12/26/24(Thu)05:31:33 No.103649879

>>103649866
CPUMAXXing is futureproof and there's nothing stopping you from slapping 3-4 5090s onto your dual socket Genoa-X board later when you need them.

Anonymous
12/26/24(Thu)05:31:58 No.103649883

Anonymous 12/26/24(Thu)05:31:58 No.103649883

>>103649851
Yeah, I mean he could be lying of course but very surprising insight into chinese ai.

>Why is Silicon Valley so innovative? Because they dare to do things. When ChatGPT came out, the tech community in China lacked confidence in frontier innovation. From investors to big tech, they all thought that the gap was too big and opted to focus on applications instead. But innovation starts with confidence, which we often see more from young people.
>Our hiring standard has always been passion and curiosity. Many of our team members have unusual experiences, and that is very interesting. Their desire to do research often comes before making money.
>DeepSeek is entirely bottom-up. We generally don’t predefine roles; instead, the division of labor occurs naturally. Everyone has their own unique journey, and they bring ideas with them, so there’s no need to push anyone.

>Q:Many LLM companies are obsessed with recruiting talents from overseas, and it’s often said that the top 50 talents in this field might not even be working for Chinese companies. Where are your team members from?
>A:There are no wizards. We are mostly fresh graduates from top universities, PhD candidates in their fourth or fifth year, and some young people who graduated just a few years ago.
>The team behind the V2 model doesn’t include anyone returning to China from overseas — they are all local. The top 50 experts might not be in China, but perhaps we can train such talents ourselves.

>Q:Once DeepSeek lowered its prices, ByteDance followed suit, which shows that they feel a certain level of threat. How do you view new approaches to competition between startups and big firms?
>A:Honestly, we don’t really care, because it was just something we did along the way. Providing cloud services isn’t our main goal. Our ultimate goal is still to achieve AGI.
>Big firms have existing customers, but their cash-flow businesses are also their burden, and this makes them vulnerable to disruption at any time.

Anonymous
12/26/24(Thu)05:33:17 No.103649891

Anonymous 12/26/24(Thu)05:33:17 No.103649891

File: WEBP_Player.png (700 KB, 777x630)

700 KB PNG

I'm starting to get used to AI-generated look and feel.

Anonymous
12/26/24(Thu)05:33:27 No.103649893

Anonymous 12/26/24(Thu)05:33:27 No.103649893

>>103649866
You'll get fucked either way. Why not watch some TV shows, read a book, play some games? LLMs and hardware are in a weird spot right now, I'll just wait until the dust settles and then scoop up some cheap hardware

Anonymous
12/26/24(Thu)05:33:42 No.103649897

Anonymous 12/26/24(Thu)05:33:42 No.103649897

>using UnslopNemo 4.1
>mention {{user}} has long, sharp, pointy, claw-like toenails
>well over 100 message later
>{{user}} makes {{char}} lick her feet
>model mentions that the toenails are painful on her tongue
Whoa.
This after this model actually demonstrating significant spatial awareness the other day.
Good model for a 13B. Continues to impress me in little ways like this that other models of the same size, or even Mixtrals back in the day, have not.
Drummer completely fucked up the Metharme implementation but the model is quite good with Mistral templates.

Anonymous
12/26/24(Thu)05:39:26 No.103649926

Anonymous 12/26/24(Thu)05:39:26 No.103649926

>>103649893
Yeah I feel like local image gen is in a much better place right now than local LLMs.
Still flawed, but you can get great results with a bit of shooping and inpainting, and unlike editing prompts and messages, this feels satisfying, whereas having to edit prompts and messages feels more like tard wrangling.

Anonymous
12/26/24(Thu)05:39:43 No.103649928

Anonymous 12/26/24(Thu)05:39:43 No.103649928

>>103649866
Qwen and deepseek is too dry and positivity pozzed unfortunately.
We have mistral for RP and thats it. And even those get worse the bigger the B.
For coding I wouldnt use local. But if you must I'd use qwencoder. And thats 32b.
Hope you at least tried out the big 70b+ models somewhere before buying. I was dissapointed.

Anonymous
12/26/24(Thu)05:40:47 No.103649932

Anonymous 12/26/24(Thu)05:40:47 No.103649932

Gemini is great.

Anonymous
12/26/24(Thu)05:41:31 No.103649935

Anonymous 12/26/24(Thu)05:41:31 No.103649935

>>103649932
Which one? From my testing they all kinda seemed dumb.

Anonymous
12/26/24(Thu)05:44:15 No.103649946

Anonymous 12/26/24(Thu)05:44:15 No.103649946

its been a while since i used llama.cpp as im a exl2 user.
I see that now it requieres cmake to compile instead of "make".
It's painfully slow.

I used to run:
make clean && time GGML_CUDA=1 make -j$(nproc)

and it was faster. Any alternative to?:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

I have an EPYC system.

Anonymous
12/26/24(Thu)05:44:18 No.103649947

Anonymous 12/26/24(Thu)05:44:18 No.103649947

All Chinese LLMs are going to be CPUmaxxed from now on because they are sanctioned to death by the US and the sanctions are going to get worse under Trump.

With CPU inference they can at least skirt some of the requirements and make optimum use of available GPU power by putting 100% of the GPUs in the country to use purely for training.

Anonymous
12/26/24(Thu)05:46:23 No.103649956

Anonymous 12/26/24(Thu)05:46:23 No.103649956

>>103649866
The approach is to wait and see what hardware and models will come out in the coming year. Everything moves too fast to just sink a ton of cash into something and regret it a few months later.

Anonymous
12/26/24(Thu)05:49:24 No.103649970

Anonymous 12/26/24(Thu)05:49:24 No.103649970

>>103649926
I agree, even the biggest txt2img models can be run on a single xx90, but that's entry level for average intelligence LLMs. So much for "a picture says more than a thousand words"
Images are also far easier to understand and compare

Anonymous
12/26/24(Thu)05:50:30 No.103649973

Anonymous 12/26/24(Thu)05:50:30 No.103649973

>>103649946
Check the docs https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md#cuda
Also use -j [threads] when compiling, it helps a lot, especially when you compile all FA quants

Anonymous
12/26/24(Thu)05:52:28 No.103649981

Anonymous 12/26/24(Thu)05:52:28 No.103649981

>>103649956
something something "hurr durr poorfag cope"

Anonymous
12/26/24(Thu)05:54:41 No.103649996

Anonymous 12/26/24(Thu)05:54:41 No.103649996

>>103649973
would using GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 be better than just offloading more layers to RAM?

Anonymous
12/26/24(Thu)05:56:17 No.103650003

Anonymous 12/26/24(Thu)05:56:17 No.103650003

>>103649947
Thats not so bad I guess.
Wasnt there a similar situation with a IBM monopoly in the 80s which forced optimizations?
2xxx$$ Nvidia 5090 with 600 watt and 32gb is crazy.

Anonymous
12/26/24(Thu)05:56:54 No.103650013

Anonymous 12/26/24(Thu)05:56:54 No.103650013

what kills me most about the current crop of models is that no matter how smart, they'll all fall apart eventually just far enough into the context. It is inevitable.

Anonymous
12/26/24(Thu)05:57:48 No.103650022

Anonymous 12/26/24(Thu)05:57:48 No.103650022

>>103649928
>For coding I wouldnt use local.
Isn't the new DeepSeek competitive with the best models while being a lot cheaper?

Anonymous
12/26/24(Thu)06:00:40 No.103650042

Anonymous 12/26/24(Thu)06:00:40 No.103650042

>>103649996
Oh, yeah don't enable that, at least not in windows
It tricks your gpu into thinking it has more vram and when it spills into ram, the performance is going to tank HARD
Regular offloading is much faster than unified memory

Anonymous
12/26/24(Thu)06:00:48 No.103650043

Anonymous 12/26/24(Thu)06:00:48 No.103650043

>>103650022
Yeah. But I still wouldnt make the jump.
Even sonnet 3.5 sometimes fucks up hard. Especially with loops. I'd rather take 3x if/else. You cant trust the output.
I wouldnt accept anything but the best. If its a hobby project or something its more than enough though.

Anonymous
12/26/24(Thu)06:01:11 No.103650046

Anonymous 12/26/24(Thu)06:01:11 No.103650046

>>103649926
flux is boring. I started shitposting here because there's only so much you can do unless you want to inpaint for hours. Let's face it, at the end of the day txt2img is just a 1girl generator and not much more

Anonymous
12/26/24(Thu)06:10:56 No.103650092

Anonymous 12/26/24(Thu)06:10:56 No.103650092

has anyone been able to quantize deepseek v3 to GGUF? I think DeepseekV3ForCausalLM is not supported?

Anonymous
12/26/24(Thu)06:17:35 No.103650124

Anonymous 12/26/24(Thu)06:17:35 No.103650124

>>103650092
doesn't quantizing sparse moe models fuck up their performance?
the original is q8 already so there's not that much to trim down anyway

Anonymous
12/26/24(Thu)06:19:33 No.103650134

Anonymous 12/26/24(Thu)06:19:33 No.103650134

File: 1707832964593536.jpg (94 KB, 1050x618)

94 KB JPG

>Performs on par with Claude 3.6 Sonnet as a web agent while having only 9B params
https://huggingface.co/THUDM/cogagent-9b-20241220
https://github.com/THUDM/CogAgent
Holy fuuuuuuuck. China won again!!!

Anonymous
12/26/24(Thu)06:20:19 No.103650135

Anonymous 12/26/24(Thu)06:20:19 No.103650135

>>103650134
is there anything like this but for RP?

Anonymous
12/26/24(Thu)06:24:30 No.103650156

Anonymous 12/26/24(Thu)06:24:30 No.103650156

>>103650124
well, i just wanted to be able to try it. I have a 3090x4 + 512GB ram so I'm not sure im able to load the original model even at the native fp8

Anonymous
12/26/24(Thu)06:25:20 No.103650161

Anonymous 12/26/24(Thu)06:25:20 No.103650161

>>103650134
those benchmarks are always cope

Anonymous
12/26/24(Thu)06:25:36 No.103650162

Anonymous 12/26/24(Thu)06:25:36 No.103650162

>>103650156
Could've paid for billions of tokens with that investment

Anonymous
12/26/24(Thu)06:27:57 No.103650181

Anonymous 12/26/24(Thu)06:27:57 No.103650181

>>103650134
(on a very very narrow task)

Small models are retarded. Always has been, always will be.

Anonymous
12/26/24(Thu)06:29:52 No.103650193

Anonymous 12/26/24(Thu)06:29:52 No.103650193

>>103650181
Go away Sam, no one wants your oversized models

Anonymous
12/26/24(Thu)06:30:01 No.103650195

Anonymous 12/26/24(Thu)06:30:01 No.103650195

>>103650162
I use the homelab for more stuff though, not only LLMs + it's still local

Anonymous
12/26/24(Thu)06:30:26 No.103650201

Anonymous 12/26/24(Thu)06:30:26 No.103650201

>>103650046
flux big thing is the natural way of instructing it but it only works if you hit something it actually knows exactly, which is complete guesswork. The worst thing about that is that because of the nature of natural text, you really have to do a lot of guesswork how to write the prompt that it actually will pay attention to all parts and figure out if it even really understands all of it. Tags are easier. I found sdxl with controlnets to be much more versatile, personally

Anonymous
12/26/24(Thu)06:32:54 No.103650215

Anonymous 12/26/24(Thu)06:32:54 No.103650215

>>103650195
Now I'm curious, what do you use it for?

Anonymous
12/26/24(Thu)06:33:53 No.103650221

Anonymous 12/26/24(Thu)06:33:53 No.103650221

>>103650013
Well yeah but most of them are stable up till 32k context. Last year we were glad when we reached 8k max context on models. Keep in mind how quickly we're making progress.

The only thing that changed is that model makers straight up started lying about the max context the models can handle.

Anonymous
12/26/24(Thu)06:38:38 No.103650248

Anonymous 12/26/24(Thu)06:38:38 No.103650248

>tts sucks
>every 3d model has hard requirements to cuda libraries that won't work with amd
/lmg/ was mistake

Anonymous
12/26/24(Thu)06:42:34 No.103650279

Anonymous 12/26/24(Thu)06:42:34 No.103650279

>>103650248
>cuda libraries that won't work with amd
That's kind of a you problem though.

Anonymous
12/26/24(Thu)06:42:45 No.103650280

Anonymous 12/26/24(Thu)06:42:45 No.103650280

>>103650215
VMs with multiple GPU's for passthough as a middle server for livestreaming with OBS using SRT to help with revieved packets due to the source being on cellular 5G. Then from the server to youtube using RMTP from a wired gigabit connection.
Also remote gaming with parsec or moonlight. It's handy also having x4 stable diffussion forge or comfy ui loaded with Flux for parallel outputs

Anonymous
12/26/24(Thu)06:48:59 No.103650306

Anonymous 12/26/24(Thu)06:48:59 No.103650306

File: Pro WS WRX90E-SAGE SE.png (699 KB, 692x692)

699 KB PNG

Should I get a threadripper pro or do I go straight for a server board?

Anonymous
12/26/24(Thu)06:50:12 No.103650316

Anonymous 12/26/24(Thu)06:50:12 No.103650316

they uploaded the Deepseek v3 model card and paper:

https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf
https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/README.md

It has distilled reasoning capabilities:

"Post-Training: Knowledge Distillation from DeepSeek-R1

We introduce an innovative methodology to distill reasoning capabilities from the long-Chain-of-Thought (CoT) model, specifically from one of the DeepSeek R1 series models, into standard LLMs, particularly DeepSeek-V3. Our pipeline elegantly incorporates the verification and reflection patterns of R1 into DeepSeek-V3 and notably improves its reasoning performance. Meanwhile, we also maintain a control over the output style and length of DeepSeek-V3.
"

Anonymous
12/26/24(Thu)06:58:35 No.103650373

Anonymous 12/26/24(Thu)06:58:35 No.103650373

>>103650306
big ass board lmfao

Anonymous
12/26/24(Thu)07:01:47 No.103650393

Anonymous 12/26/24(Thu)07:01:47 No.103650393

>>103650306
>7 pcie slots
b r u h

Anonymous
12/26/24(Thu)07:06:38 No.103650427

Anonymous 12/26/24(Thu)07:06:38 No.103650427

>>103649926
I kind of feel the opposite to this. Maybe I'm just retarded but imggen prompting and building refinement pipelines are a lot harder to intuit than tard wrangling LLMs imo. Competent image generation looks almost indistinguishable from the real deal, but it's a lot more finicky and getting there is harder than it is to get a sufficiently large LLM to match the quality of an average slop novel.
The main advantage of imggen models is that they're significantly easier to run on consumer hardware.
>>103650306
>7 slots
Why would anyone do this?

Anonymous
12/26/24(Thu)07:11:03 No.103650447

Anonymous 12/26/24(Thu)07:11:03 No.103650447

File: file.png (1.17 MB, 2400x2400)

1.17 MB PNG

>>103650427
>Why would anyone do this?
You want them to just not connect all those pcie lanes the cpu has?

Anonymous
12/26/24(Thu)07:19:12 No.103650482

Anonymous 12/26/24(Thu)07:19:12 No.103650482

Johnny Dep's Speed 3

Anonymous
12/26/24(Thu)07:21:42 No.103650496

Anonymous 12/26/24(Thu)07:21:42 No.103650496

File: file.jpg (614 KB, 1376x2012)

614 KB JPG

>>103650427
>Why would anyone do this?
https://youtu.be/-nb_DZAH-TM?t=993
Chinese richfag mikubox

Anonymous
12/26/24(Thu)07:26:04 No.103650511

Anonymous 12/26/24(Thu)07:26:04 No.103650511

>>103650393
That's the same amount the server mainboards have that CUDAdev and some others use in their mining rig builds. It's probably the maximum that's supported in terms of PCI-E x16 lanes with these CPUs.

Anonymous
12/26/24(Thu)07:35:32 No.103650555

Anonymous 12/26/24(Thu)07:35:32 No.103650555

File: 2024-12-26_04-34-18.png (8 KB, 618x49)

8 KB PNG

>>103650316
>ssd maxxing dream doa
so much for my panic shilling oh well time to look into cpumaxxing

Anonymous
12/26/24(Thu)07:36:33 No.103650557

Anonymous 12/26/24(Thu)07:36:33 No.103650557

>>103650555
manic* fuck

Anonymous
12/26/24(Thu)07:42:07 No.103650586

Anonymous 12/26/24(Thu)07:42:07 No.103650586

>github copilot has been out for 3 years
>free tier available now
>no open source/local alternatives
Why can't local keep up with proprietary shit?

Anonymous
12/26/24(Thu)07:46:02 No.103650607

Anonymous 12/26/24(Thu)07:46:02 No.103650607

>deep repeat 3

Anonymous
12/26/24(Thu)07:46:17 No.103650611

Anonymous 12/26/24(Thu)07:46:17 No.103650611

>>103650586
>why can't my home pc keep up with an entire industrial datacenter
gee I dunno anon

Anonymous
12/26/24(Thu)07:46:40 No.103650612

Anonymous 12/26/24(Thu)07:46:40 No.103650612

>>103650586
This is the smartest kid of shitpost because it invites know-it-alls and shills to come defending their turf by giving an actual answer.
A truly efficient way to get a proper answer amidst the shit flinging.

Anonymous
12/26/24(Thu)07:49:15 No.103650622

Anonymous 12/26/24(Thu)07:49:15 No.103650622

>>103650316
>fp8 native weight training:::We design an FP8 mixed precision training framework and, for the first time, validate the feasibility and effectiveness of FP8 training on an extremely large-scale model.
>AMD GPU: Enables running the DeepSeek-V3 model on AMD GPUs via SGLang in both BF16 and FP8 modes.
Huawei Ascend NPU: Supports running DeepSeek-V3 on Huawei Ascend devices.
>We investigate a Multi-Token Prediction (MTP) objective and prove it beneficial to model performance. It can also be used for speculative decoding for inference acceleration.
cont
NOTE: The total size of DeepSeek-V3 models on HuggingFace is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.

Anonymous
12/26/24(Thu)07:49:22 No.103650623

Anonymous 12/26/24(Thu)07:49:22 No.103650623

>>103650612
yeah well fuck you

Anonymous
12/26/24(Thu)07:51:10 No.103650631

Anonymous 12/26/24(Thu)07:51:10 No.103650631

>>103650611
>entire industrial datacenter
They offer it for free, they're running this shit on raspberry pis and we can't even match it with the free 405B parameter models

Anonymous
12/26/24(Thu)07:52:07 No.103650634

Anonymous 12/26/24(Thu)07:52:07 No.103650634

>>103650631
True

Anonymous
12/26/24(Thu)07:57:21 No.103650659

Anonymous 12/26/24(Thu)07:57:21 No.103650659

>>103650555
what is ssdmaxxing? Ive been away for awhile.

Anonymous
12/26/24(Thu)07:58:32 No.103650661

Anonymous 12/26/24(Thu)07:58:32 No.103650661

>>103649866
I got 4090 for gaming and then got into AI. Get a 5090 for gaming if you game. If not then none of this shit is worth it.

Anonymous
12/26/24(Thu)08:09:23 No.103650722

Anonymous 12/26/24(Thu)08:09:23 No.103650722

>>103650659
The idea that you could run a moe model on pcie 5 ssds.

Anonymous
12/26/24(Thu)08:12:00 No.103650738

Anonymous 12/26/24(Thu)08:12:00 No.103650738

Man, this necessity to have the exact same vocab between the main model and the draft model fucking sucks.
qwen2-57b-a14b-instruct isn't compatible with qwen2-0_5b-instruct as a draft model? What the hell.
What are some 40 to 60B~ish MoE out there that I can use for my tests?
Mistral doesn't have a tiny v1 model that would be compatible with the original mixtral, their smaller one is 7b, right?

Anonymous
12/26/24(Thu)08:13:38 No.103650748

Anonymous 12/26/24(Thu)08:13:38 No.103650748

>>103650659
Level 2 cope
Level 1 cope is cpumaxxing because you're too poor for H100maxxing
Level 0 cope is pretending that current tech LLMs will magically stop being retarded after a magic parameter number while your house burns down around you because you forgot to install a better breaker

Anonymous
12/26/24(Thu)08:20:19 No.103650788

Anonymous 12/26/24(Thu)08:20:19 No.103650788

>ITT promplets getting BTFO
It doesn't matter how many params you have if you don't know how to write.

Anonymous
12/26/24(Thu)08:23:08 No.103650808

Anonymous 12/26/24(Thu)08:23:08 No.103650808

>>103650788
>uhm sweaty you are using it wrong!
*yawn*

Anonymous
12/26/24(Thu)08:23:52 No.103650815

Anonymous 12/26/24(Thu)08:23:52 No.103650815

>>103649789
Releasing a 685b model that defeats existing smaller param models isn't impressive
It's like when Grok 1 (314b) released and claimed it scores better benchmarks than llama 70b. Like yeah it does but fuck off.

Anonymous
12/26/24(Thu)08:26:40 No.103650827

Anonymous 12/26/24(Thu)08:26:40 No.103650827

>>103650815
A 685B sparse MoE isn't the exact same thing, but I get what you're saying. But the barrier to entry to run that model is way lower than 314B non-MoE model.

Anonymous
12/26/24(Thu)08:32:37 No.103650860

Anonymous 12/26/24(Thu)08:32:37 No.103650860

>>103650586
continue.dev+LM studio, but /g/ will say LM studio is a proprietary solution or something.

Anonymous
12/26/24(Thu)08:33:36 No.103650865

Anonymous 12/26/24(Thu)08:33:36 No.103650865

>>103650860
I like aider better. Works with any editor.

Anonymous
12/26/24(Thu)08:34:10 No.103650870

Anonymous 12/26/24(Thu)08:34:10 No.103650870

>>103650860
Yeah retard, no need for another reskinned llama.cpp

Anonymous
12/26/24(Thu)08:34:17 No.103650871

Anonymous 12/26/24(Thu)08:34:17 No.103650871

Am I the only one who's not excited about the new deepseek?
It's just more of the same just scaled up.
At least the QwQ meme was something different.

Anonymous
12/26/24(Thu)08:35:42 No.103650878

Anonymous 12/26/24(Thu)08:35:42 No.103650878

>>103650870
>double click and run?
>no retard, you need to install Gentoo and get llama.cpp working or you're not running efficiently enough!!!
Thanks.

Anonymous
12/26/24(Thu)08:42:35 No.103650908

Anonymous 12/26/24(Thu)08:42:35 No.103650908

>>103650860
It's proprietary and brings nothing to the table.

Anonymous
12/26/24(Thu)08:44:58 No.103650918

Anonymous 12/26/24(Thu)08:44:58 No.103650918

>>103650871
I dread the implications of the fact that its reasoning is distilled from the new R1 model. If deepseek v3 is this big. How big is R1?

Anonymous
12/26/24(Thu)08:48:01 No.103650935

Anonymous 12/26/24(Thu)08:48:01 No.103650935

Just messing out with mistral big after i got a new gpu, is there a "recommended" Token size for all of the system instructions (im making my own) and character card together you should not go above?

Anonymous
12/26/24(Thu)08:55:37 No.103650973

Anonymous 12/26/24(Thu)08:55:37 No.103650973

>>103650631
>They offer it for free
no, you're testing their model for free

Anonymous
12/26/24(Thu)08:57:03 No.103650983

Anonymous 12/26/24(Thu)08:57:03 No.103650983

Got myself a second 3060 so I have 24gb vram. Are there any good MOEs that work at this size or am I still a vramlet

Anonymous
12/26/24(Thu)08:57:26 No.103650985

Anonymous 12/26/24(Thu)08:57:26 No.103650985

>>103650918
1.4T dense, local will be saved!
Jokes aside, didn't people claim that it was <50B a few weeks ago?

Anonymous
12/26/24(Thu)08:58:51 No.103650993

Anonymous 12/26/24(Thu)08:58:51 No.103650993

>>103650985
>didn't people claim
Guess it depends who those people were.

Anonymous
12/26/24(Thu)08:59:16 No.103650998

Anonymous 12/26/24(Thu)08:59:16 No.103650998

>>103650983
did you actually get a 12GB 3060
also why would you buy 3060 when 3090 exists

Anonymous
12/26/24(Thu)08:59:25 No.103651002

Anonymous 12/26/24(Thu)08:59:25 No.103651002

>>103650983
MOEs are optimized for cpu inference, but 24GB lets you run pretty much anything up to 30B at near-lossless quants and above reading speeds
70B at 3-4bpw is also doable if you don't mind waiting, with a decent draft model it'll be even better
Honestly, just wait until we get better, more efficient shit

Anonymous
12/26/24(Thu)08:59:56 No.103651008

Anonymous 12/26/24(Thu)08:59:56 No.103651008

ok im going to sacrifice myself
im going to seduce m zuckerberg

Anonymous
12/26/24(Thu)09:00:30 No.103651013

Anonymous 12/26/24(Thu)09:00:30 No.103651013

>>103650993
A few anons speculated as much since it was really fast (please don't tell me it's an even bigger MOE)

Anonymous
12/26/24(Thu)09:03:02 No.103651032

Anonymous 12/26/24(Thu)09:03:02 No.103651032

>>103651013
>please don't tell me it's an even bigger MOE
okay, I won't

Anonymous
12/26/24(Thu)09:06:22 No.103651054

Anonymous 12/26/24(Thu)09:06:22 No.103651054

>>103650973
>can use it for free (0 dollars)
>>um actually, if it's free then you're the product!

Anonymous
12/26/24(Thu)09:07:27 No.103651062

Anonymous 12/26/24(Thu)09:07:27 No.103651062

>>103650998
> did you actually get a 12GB 3060
also why would you buy 3060 when 3090 exists
$400 vs more than $400 for a toy, hmm hard choice

Anonymous
12/26/24(Thu)09:11:50 No.103651094

Anonymous 12/26/24(Thu)09:11:50 No.103651094

>>103650918
R1 is probably 236B
Just because it can into CoT that deserves being distilled into V3 doesn't mean it's bigger than V3. This is the logic for non-reasoner distillations. Think of reasoners more like you think of reward models. QwQ is 32B and beats llama-405B on a ton of benchmarks after all.

Anonymous
12/26/24(Thu)09:12:35 No.103651099

Anonymous 12/26/24(Thu)09:12:35 No.103651099

>>103651094
>236B
Oh yeah a measly 236B

Anonymous
12/26/24(Thu)09:14:08 No.103651111

Anonymous 12/26/24(Thu)09:14:08 No.103651111

>>103651094
Nothingburger then

Anonymous
12/26/24(Thu)09:21:44 No.103651165

Anonymous 12/26/24(Thu)09:21:44 No.103651165

>>103651099
>>103651111
you're hard to please digits guys

Anonymous
12/26/24(Thu)09:23:57 No.103651178

Anonymous 12/26/24(Thu)09:23:57 No.103651178

>>103651094
"Distilling" a small model to create a big one is an oxymoron. R1 is bigger than V3.

Anonymous
12/26/24(Thu)09:26:20 No.103651196

Anonymous 12/26/24(Thu)09:26:20 No.103651196

>>103651178
"Distilling" is literally just "tutoring" rebranded after retards tutoring Llama-1-7B on GPT-4 outputs gave it a bad name.

Anonymous
12/26/24(Thu)09:28:19 No.103651214

Anonymous 12/26/24(Thu)09:28:19 No.103651214

>>103651178
puerile semantics
What matters is the quality of data. Small reasoners can generate data that large non-reasoners cannot. They explicitly say that R1 is kind of retarded so they do a ton of work around rectifying it, they use V2.5 a lot too. We have no reason to think R1 is bigger.

Anonymous
12/26/24(Thu)09:30:00 No.103651229

Anonymous 12/26/24(Thu)09:30:00 No.103651229

>>103651062
>he paid $400 for a 3060
bro...

Anonymous
12/26/24(Thu)09:31:10 No.103651240

Anonymous 12/26/24(Thu)09:31:10 No.103651240

>>103651062
You played yourself.

Anonymous
12/26/24(Thu)09:32:04 No.103651246

Anonymous 12/26/24(Thu)09:32:04 No.103651246

>>103651229
AUD

Anonymous
12/26/24(Thu)09:33:28 No.103651256

Anonymous 12/26/24(Thu)09:33:28 No.103651256

File: 1709058286818351.png (32 KB, 827x122)

32 KB PNG

>>103651196
>>103651214
So this is a different kind of process than the one that was used to "distill" LLaMA3.1 405B into the 3.1 70B and 8B variants?
https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

Anonymous
12/26/24(Thu)09:37:12 No.103651286

Anonymous 12/26/24(Thu)09:37:12 No.103651286

>>103651256
This is a colloquial use of the term "distillation", they don't match logprobs, just train on outputs.

Anonymous
12/26/24(Thu)10:17:02 No.103651585

Anonymous 12/26/24(Thu)10:17:02 No.103651585

>>103651246
How much is a used 3090 in aussieland?

Anonymous
12/26/24(Thu)10:31:02 No.103651696

Anonymous 12/26/24(Thu)10:31:02 No.103651696

deepseek is really nice, finally a local that might be worth using over claude through open router

Anonymous
12/26/24(Thu)10:37:36 No.103651756

Anonymous 12/26/24(Thu)10:37:36 No.103651756

It's a shame no one makes a local first model. Small active set MoE with Turbosparse type predictor for further sparsity.

SSD maxing a large model designed to run on GPU clusters is going to work poorly at best.

Anonymous
12/26/24(Thu)10:44:41 No.103651828

Anonymous 12/26/24(Thu)10:44:41 No.103651828

I think deepseek v3 is the first model I feel like I can wrangle into being a proper AI Dungeon Master with some prompting magic. It barely needs lorebooks to be fed information about the 3.5e version of the game, and there's a lot of information.
This is pretty fucking cool man.
Even Claude and GPT4o wuld hallucinate Tome of Battle Meneuver names.

Anonymous
12/26/24(Thu)10:45:29 No.103651837

Anonymous 12/26/24(Thu)10:45:29 No.103651837

File: kill me baby;yasuna;.png (178 KB, 500x500)

178 KB PNG

I did some research on LLM support on a Linux phone (Oneplus 6, sdm845, Adreno 630, mesa 24.3).
>llama.cpp CPU
works without issues
>llama.cpp GPU - Vulkan
Mesa Freedreno Turnip doesn't support 16bit storage on Adreno 630 (present only on Adreno 650+ for now), which is required by Vulkan backend in llama.cpp
>llama.cpp GPU - experimental Qualcomm OpenCL
Mesa Freedreno Rusticl doesn't support subgroups yet, which are required by OpenCL backend in llama.cpp
>mlc-llm GPU - Vulkan
From what i understand, same as llama.cpp, 16bit storage is required
>mlc-llm GPU - OpenCL
f32 models run, but slow and output garbage

So far i'm stuck with CPU, i'm wondering how performant Adreno 630 can be once it supports the required features (i believe it has the hardware for it, but it's not implemented in Freedreno yet?).
Tale as old as time: GPUs other than Nvidia have dogshit support, while CPU always works.
Thanks for reading my blog.

Anonymous
12/26/24(Thu)10:46:10 No.103651846

Anonymous 12/26/24(Thu)10:46:10 No.103651846

Looks like 3200mhz ram would only get more like 2-3tks out of this. You would need a more recent server for good speeds. The price just doubled lol.

Anonymous
12/26/24(Thu)10:47:02 No.103651853

Anonymous 12/26/24(Thu)10:47:02 No.103651853

So cpumaxxer was getting like 8t/s with v2.5 which has 21b active parameters, now v3 has 37b active parameters.
Remember that he has 12-channel ddr5 memory.
My mental calculations say that an 8-channel ddr4 machine would get about ~2t/s or less.
I'd say wait for people doing tests before buying a ddr4 server.

Anonymous
12/26/24(Thu)10:48:40 No.103651871

Anonymous 12/26/24(Thu)10:48:40 No.103651871

Personally I'm just waiting for a new architecture so I can finally run a non-lobotomized quant of a good model on my shit consumer hardware

Anonymous
12/26/24(Thu)10:48:55 No.103651874

Anonymous 12/26/24(Thu)10:48:55 No.103651874

>>103651853
Didn't they release small v2 and v1 models back in the day?
I wonder if those could be used for speculative decoding to squeeze a couple more t/s.

Anonymous
12/26/24(Thu)10:51:30 No.103651897

Anonymous 12/26/24(Thu)10:51:30 No.103651897

Apparently it was trained on only 2,000 H800s for less than 2 months, costing $5.6M. Why cant we crowd found something like this?

Anonymous
12/26/24(Thu)10:54:22 No.103651928

Anonymous 12/26/24(Thu)10:54:22 No.103651928

>37b
I thought it uses 8 experts by default so isn't it 16B + 21B constant and you can just load those 21B into vram?

Anonymous
12/26/24(Thu)10:57:31 No.103651966

Anonymous 12/26/24(Thu)10:57:31 No.103651966

>>103651874
Different tokenizers.

Anonymous
12/26/24(Thu)10:57:44 No.103651971

Anonymous 12/26/24(Thu)10:57:44 No.103651971

>>103651853
buying a ddr4 server seems super scuffed - it's old shit, so cheap but also slow and it's practically useless outside of big MoE models

Anonymous
12/26/24(Thu)10:59:04 No.103651981

Anonymous 12/26/24(Thu)10:59:04 No.103651981

>>103651971
With 37B active your gonna end up wanting a ddr5 server anyways for decent speeds. Though perhaps you could pair a ddr4 board with a 48GB card?

Anonymous
12/26/24(Thu)11:11:02 No.103652082

Anonymous 12/26/24(Thu)11:11:02 No.103652082

>>103650998
>>103651062
Bro.... I paid $300 for my 3090

Anonymous
12/26/24(Thu)11:11:16 No.103652084

Anonymous 12/26/24(Thu)11:11:16 No.103652084

>>103651853
>So cpumaxxer was getting like 8t/s with v2.5 which has 21b active parameters, now v3 has 37b active parameters.
>Remember that he has 12-channel ddr5 memory.
I just got https://github.com/kvcache-ai/ktransformers working and I'm getting 5 t/s with 2 channels. I have a 4090 and 192GB ram.

Anonymous
12/26/24(Thu)11:14:09 No.103652112

Anonymous 12/26/24(Thu)11:14:09 No.103652112

>>103652084
? Does ktransformers support v3 already or do you mean 2.5?

Anonymous
12/26/24(Thu)11:14:24 No.103652114

Anonymous 12/26/24(Thu)11:14:24 No.103652114

Can someone explain the Test Time Compute thing to a retard? How does it differ from CoT with ability to see the prior mistakes? Jewgle Flash Thinking thought process doesn't seem like it's that helpful for coding, it's just doing similar thing to what older models were doing, just a bit longer.

Anonymous
12/26/24(Thu)11:14:37 No.103652117

Anonymous 12/26/24(Thu)11:14:37 No.103652117

>>103652112
2.5

Anonymous
12/26/24(Thu)11:15:47 No.103652129

Anonymous 12/26/24(Thu)11:15:47 No.103652129

>>103651897
Even if that was a realistic thing to say, I doubt we would get gold on our first try.

Anonymous
12/26/24(Thu)11:16:06 No.103652131

Anonymous 12/26/24(Thu)11:16:06 No.103652131

>>103652117
So you would get about half that with v3 if you had enough ram. Might be very doable then with 8 channels

Anonymous
12/26/24(Thu)11:20:22 No.103652179

Anonymous 12/26/24(Thu)11:20:22 No.103652179

>>103652114
The other difference is that Test Time Compute uses RL to pick next steps to consider

Anonymous
12/26/24(Thu)11:20:59 No.103652190

Anonymous 12/26/24(Thu)11:20:59 No.103652190

>>103652131
I'm don't know how viable ram overclocking is on 8 channel boards.
I have 4 sticks and I'm running them at 4800mhz because that's what worked with zero manual tweaking.
With only two sticks it goes up to 6400mhz. If those frequencies are possible on 8 channel boards then it would be even faster.

Anonymous
12/26/24(Thu)11:22:32 No.103652214

Anonymous 12/26/24(Thu)11:22:32 No.103652214

No but really is ssd thing impossible with this? I would assume like in >>103651928 you would need 21B + context in your gpu and regular ram constantly. And then you would load experts from SSD with it still being 2B's so 1gb at 4bpw?

Anonymous
12/26/24(Thu)11:24:21 No.103652243

Anonymous 12/26/24(Thu)11:24:21 No.103652243

File: 47dd49d19255fe73aa684ed22(...).gif (875 KB, 500x286)

875 KB GIF

>>103649764
electro magnet snow board is probably a cool product

Anonymous
12/26/24(Thu)11:24:40 No.103652244

Anonymous 12/26/24(Thu)11:24:40 No.103652244

>>103652214
The big question that probably nobody here knows the answer to is how often do the experts change from one token to the next.

Anonymous
12/26/24(Thu)11:25:59 No.103652270

Anonymous 12/26/24(Thu)11:25:59 No.103652270

>>103652190
You can get 5600mhz for.. well not cheap

Anonymous
12/26/24(Thu)11:27:13 No.103652282

Anonymous 12/26/24(Thu)11:27:13 No.103652282

>>103652114
It's essentially a new type of finetune. Like how we have base models and then trained on instruction following made them instruct models. And then training on question/answering interactions made them into chatbots.

Now they also added Reinforcement Learning to train the models to "pick the best route" out of multiple different options.

Then what the "CoT" does is essentially make a couple of short drafts within its CoT and the RL training makes the model pick the best of these drafts and finish them up. So constantly while thinking the model makes a lot of "branches" where it can go and the RL finetuning then makes the model decide which of the branches is more likely to lead to a correct answer.

o3 is rumored to make 1000 branches, pick the best one and at every pivotal reasoning step again make 1000 branches.

It's extremely wasteful in terms of tokens and I think we will change how it's done severely. The way we do it today is extremely hacky with a lot of wasted tokens.

Anonymous
12/26/24(Thu)11:27:14 No.103652283

Anonymous 12/26/24(Thu)11:27:14 No.103652283

>>103652214
You'd load 8 (it was 8 experts, right?) of the experts per token. Given how many total experts there are, you'll probably load new ones most of the time. At 4bpw, you'll load about 8gb per token. How long does it take you to load 8gb?

Anonymous
12/26/24(Thu)11:28:42 No.103652299

Anonymous 12/26/24(Thu)11:28:42 No.103652299

>>103652283
The fastest ssds can do about 14 GBs sequential so it might be doable

Anonymous
12/26/24(Thu)11:29:49 No.103652310

Anonymous 12/26/24(Thu)11:29:49 No.103652310

DeepSeek v3 feels like it finally hits the ideal intelligence level I've been waiting for
The only thing now is optimizing that shit into a size that doesn't require a dedicated RAM farm

Anonymous
12/26/24(Thu)11:30:33 No.103652320

Anonymous 12/26/24(Thu)11:30:33 No.103652320

Btw, for deepseek use 1.7 temp. It gets fun then. 1.8 starts making mistakes.

Anonymous
12/26/24(Thu)11:31:52 No.103652340

Anonymous 12/26/24(Thu)11:31:52 No.103652340

>>103652299
Though that would pretty much bottleneck it to 2 tks or a bit less.

Anonymous
12/26/24(Thu)11:32:43 No.103652349

Anonymous 12/26/24(Thu)11:32:43 No.103652349

>>103652299
If you have the fastest, sure. still, that'd limit your maximum possible generation to < 2t/s

Anonymous
12/26/24(Thu)11:33:35 No.103652364

Anonymous 12/26/24(Thu)11:33:35 No.103652364

>>103652340
Maybe it is not that retarded if you run it at 4 experts instead of 8.

Anonymous
12/26/24(Thu)11:33:53 No.103652368

Anonymous 12/26/24(Thu)11:33:53 No.103652368

>>103652282
Close, but afaik it doesn't generate the actual drafts. Otherwise, the Flash 2.0 thoughts wouldn't be generated token by token. The goal of RL is to effectively prune the search by training it on multiple branches during training time. When the time comes to do inference, RL enables the model to pick the most probable option without expanding the others, which is faster, but results in a loss of accuracy / thoroughness

Anonymous
12/26/24(Thu)11:33:58 No.103652371

Anonymous 12/26/24(Thu)11:33:58 No.103652371

>>103652349
2 tks for like $200 is not bad at all for that intelligence though.

Anonymous
12/26/24(Thu)11:35:49 No.103652387

Anonymous 12/26/24(Thu)11:35:49 No.103652387

>>103652320
That's not a very big workable range.

Anonymous
12/26/24(Thu)11:36:53 No.103652406

Anonymous 12/26/24(Thu)11:36:53 No.103652406

>>103652368
We don't know how it actually works I think o3 uses actual written out drafts because of how much tokens it used to answer a single ARC-AGI question (110 MILLION tokens)

Anonymous
12/26/24(Thu)11:37:55 No.103652416

Anonymous 12/26/24(Thu)11:37:55 No.103652416

>>103652371
MAXIMUM. That's just loading the experts under optimal circumstances and absolutely nothing else. You still need to run the ~20B params and deal with the overhead of swapping experts and all that. I won't speculate about the realistic speed.

Anonymous
12/26/24(Thu)11:42:44 No.103652477

Anonymous 12/26/24(Thu)11:42:44 No.103652477

>>103651897
That amount of money could be crowd-sourced (thought it wouldn't be easy) but
- you'd have to have knowledgeable people who aren't grifters heading the project
- you'd have to avoid breaking muh copyright in blatant ways when it comes to collecting training data - easy to do when you're a chink who doesn't have to care, but not so in le free west (otherwise ambulance chasing lawyers will eat all your crowdsourced money instead of training models)

Anonymous
12/26/24(Thu)11:43:58 No.103652497

Anonymous 12/26/24(Thu)11:43:58 No.103652497

File: huh.png (370 KB, 1159x1037)

370 KB PNG

>>103652320
>>103652387
That's a creative / sanity balancing line for people who thought the model was too dry. This model goes almost crazy (but still perfectly coherent / without anatomical mistakes) around 1.7. This scene was not supposed to be sexual and look at where it took it. Wheeze...

Anonymous
12/26/24(Thu)11:44:48 No.103652511

Anonymous 12/26/24(Thu)11:44:48 No.103652511

After really giving deepseek v3 a go, here's my review.
Works great if you have one question that needs an answer.
Once you do RP with this thing it shits the bed. It seems VERY sure of what the next token should be, like seemingly all Chinese models, once it is on a set direction, it cannot change course.
It has horrid repetition issues. After a few back and forth messages, it will just copy segments of previous messages wholesale completely out of context.
It seems to get even weirder at higher contexts which isn't really unexpected, but the 163840 purported contexts limit is being generous.
So yeah, not that great for RP.
But that's all modern models. Model creators are obsessed with benchmarks, math and programming. At this point, a model's ability to converse feels vestigial.

You want my opinion, the glory days of chat models has been over for months. Anything coming out now will be a side grade at best.

Anonymous
12/26/24(Thu)11:44:58 No.103652512

Anonymous 12/26/24(Thu)11:44:58 No.103652512

>14.8 trillion tokens
Scaling is dead

Anonymous
12/26/24(Thu)11:45:13 No.103652515

Anonymous 12/26/24(Thu)11:45:13 No.103652515

>>103650622
Wait you're telling me that one Meta paper is now in a production model? That's cool.

Anonymous
12/26/24(Thu)11:45:15 No.103652517

Anonymous 12/26/24(Thu)11:45:15 No.103652517

>>103652406
I feel pretty confident that o1 doesn't generate the branches. While we don't technically have concrete proof of that, we have a good idea of what Google does given that they show the thoughts rather than hiding them, and it gives performance reasonably close to o1 preview with their Flash model, which, presumably, is not nearly the best they have
o3 is likely something entirely different from Flash and o1 which OP was talking about. My guess is also that it's actually doing a more thorough search in a tree-like fashion and evaluating paths somehow. OpenAI obviously won't divulge any details because they're thirsty for their moat, but I'd expect Google (or somebody else) will release something similar relatively soon which should give us a good baseline

Anonymous
12/26/24(Thu)11:45:51 No.103652522

Anonymous 12/26/24(Thu)11:45:51 No.103652522

Drummer's models are dogshit.

Anonymous
12/26/24(Thu)11:47:08 No.103652536

Anonymous 12/26/24(Thu)11:47:08 No.103652536

>>103652511
>horrid repetition issues
Still never ran into this. What's your setup?

Anonymous
12/26/24(Thu)11:47:23 No.103652540

Anonymous 12/26/24(Thu)11:47:23 No.103652540

>>103652511
0.15 rep pen and I have no had any rep issues since. Make sure you have your formatting right as well. Use 1.2-1.7 temp for creative stuff. Yea, it seems very sure of itsself but that is a sign of a smart model. I have had no issues up to about 50k context. And I would say some smut tunes probably still write better smut but this model is the only option now if you want actually intelligent RP / writing.

Anonymous
12/26/24(Thu)11:48:28 No.103652553

Anonymous 12/26/24(Thu)11:48:28 No.103652553

>>103652511
Oh, and dont use openrouter. Apparently you you will 2.5 half of the time which is retarded compared to v3

Anonymous
12/26/24(Thu)11:48:28 No.103652554

Anonymous 12/26/24(Thu)11:48:28 No.103652554

>>103652540
>0.15 rep pen
Doesn't that make repetition more likely?

Anonymous
12/26/24(Thu)11:49:28 No.103652564

Anonymous 12/26/24(Thu)11:49:28 No.103652564

>>103652554
No, it starts penalizing tokens based upon them already being in the context

Anonymous
12/26/24(Thu)11:49:58 No.103652569

Anonymous 12/26/24(Thu)11:49:58 No.103652569

>>103652564
That's presence penalty

Anonymous
12/26/24(Thu)11:50:52 No.103652573

Anonymous 12/26/24(Thu)11:50:52 No.103652573

I tried deepseek v3 in (((the cloud))) and it seems more of a leaderboard whore than a model you'd actually use
it's not anything groundbraking even in coding, its supposedly strong suit

Anonymous
12/26/24(Thu)11:52:08 No.103652589

Anonymous 12/26/24(Thu)11:52:08 No.103652589

>>103652540
>Yea, it seems very sure of itsself but that is a sign of a smart model
Overcooked

Anonymous
12/26/24(Thu)11:52:09 No.103652590

Anonymous 12/26/24(Thu)11:52:09 No.103652590

Considering training knowledge domains: how far is the domain of long multiturn dicksucking in multiple varied interesting ways, from the domain of answering a single turn safe question / riddle / coding problem with one objective truth?

Anonymous
12/26/24(Thu)11:52:27 No.103652595

Anonymous 12/26/24(Thu)11:52:27 No.103652595

>>103652569
In ST im using frequency penalty. Just used to calling it rep pen

Anonymous
12/26/24(Thu)11:54:03 No.103652609

Anonymous 12/26/24(Thu)11:54:03 No.103652609

>>103652595
Yeah, it's unfortunate since despite the similar names and functions, they gave them different scales for whatever reason

Anonymous
12/26/24(Thu)11:55:20 No.103652619

Anonymous 12/26/24(Thu)11:55:20 No.103652619

>>103652590
About 3.5.

Anonymous
12/26/24(Thu)11:55:33 No.103652620

Anonymous 12/26/24(Thu)11:55:33 No.103652620

How does the new Deepseek model compare to the Hunyuan large A52B MoE? Why are we getting hopeful over this when A52B still has not yet gotten a q4 quant? Despite it being smaller than deepseek v3

Anonymous
12/26/24(Thu)11:58:28 No.103652637

Anonymous 12/26/24(Thu)11:58:28 No.103652637

Okay, anybody used DeepSeek V3 for incest smut and is it worth spending money on?

Anonymous
12/26/24(Thu)11:59:09 No.103652643

Anonymous 12/26/24(Thu)11:59:09 No.103652643

What models do the best with large context windows for parsing long documents? I hear gemini apparently excels at the task but I'd rather not use it for obvious reasons.

Anonymous
12/26/24(Thu)12:04:07 No.103652688

Anonymous 12/26/24(Thu)12:04:07 No.103652688

>>103652511
Hate to be that guy, but are you SURE you're using V3?

Anonymous
12/26/24(Thu)12:04:36 No.103652692

Anonymous 12/26/24(Thu)12:04:36 No.103652692

File: RDT_20241226_140338781060(...).png (600 KB, 1574x1028)

600 KB PNG

Anonymous
12/26/24(Thu)12:05:45 No.103652702

Anonymous 12/26/24(Thu)12:05:45 No.103652702

>>103652620
Because no one can try Hunyuan-Large anywhere.

Anonymous
12/26/24(Thu)12:07:26 No.103652717

Anonymous 12/26/24(Thu)12:07:26 No.103652717

>>103652244
>The big question that probably nobody here knows the answer to is how often do the experts change from one token to the next.
Just like quantization there's not even a real answer. The mixture of cache paper used only the top 2 experts for certain and the rest only if they were in cache. Unfortunately they didn't do any hitrate experiments with a small cache with that strategy.

A local model could be trained specifically for caching, loading only max 1 new expert per token with say 8GB worth of expert cache.

Anonymous
12/26/24(Thu)12:12:53 No.103652771

Anonymous 12/26/24(Thu)12:12:53 No.103652771

>>103652717
I wonder how sensitive these models are to expert selection. What would happen if you picked, say, k arbitrary experts and just used those without swapping between them? Do individual experts have enough "general" capabilities to give decent outputs even if they aren't the best choice?

Anonymous
12/26/24(Thu)12:14:45 No.103652785

Anonymous 12/26/24(Thu)12:14:45 No.103652785

>>103652702
If people aren't running A52B now, I don't see a chance for Deepseek V3 being local either.

Anonymous
12/26/24(Thu)12:16:35 No.103652797

Anonymous 12/26/24(Thu)12:16:35 No.103652797

>>103652785
From what I remember Hunyuan-Large massively underperformed for its size and so was never mentioned again even by the creators.

Anonymous
12/26/24(Thu)12:20:58 No.103652831

Anonymous 12/26/24(Thu)12:20:58 No.103652831

>>103652785
>>103652797
Also 52B active would start getting to the point where cpu only inference is just not gonna do it. 37B is more reasonable. Reading speed might be doable.

Anonymous
12/26/24(Thu)12:22:39 No.103652842

Anonymous 12/26/24(Thu)12:22:39 No.103652842

>>103652692
>those cope benchmarks where qwen and 2.5 beats sonnet and 4o
what even is the target audience for it when it's so obviously bullshit

Anonymous
12/26/24(Thu)12:22:53 No.103652846

Anonymous 12/26/24(Thu)12:22:53 No.103652846

Anyone know if I can run deepseek with 350GB RAM and 96GB VRAM?

Anonymous
12/26/24(Thu)12:23:57 No.103652858

Anonymous 12/26/24(Thu)12:23:57 No.103652858

>>103652846
4 bit. Be sure to pass on the performance you get.

Anonymous
12/26/24(Thu)12:24:11 No.103652863

Anonymous 12/26/24(Thu)12:24:11 No.103652863

>>103652846
I know.

Anonymous
12/26/24(Thu)12:25:01 No.103652870

Anonymous 12/26/24(Thu)12:25:01 No.103652870

anyone tried the instructions to git clone https://github.com/deepseek-ai/DeepSeek-V3.git and run inference with torchrun?

Anonymous
12/26/24(Thu)12:25:16 No.103652872

Anonymous 12/26/24(Thu)12:25:16 No.103652872

>>103652842
Kicks their asses on Livebench too from my understanding, and that one has a closed test set that is updated every few months
Next question

Anonymous
12/26/24(Thu)12:27:51 No.103652900

Anonymous 12/26/24(Thu)12:27:51 No.103652900

>>103650612
back in my days this was called "bait" and you'd post an image of a fish and a hook to signify that the post was bait

Anonymous
12/26/24(Thu)12:28:14 No.103652903

Anonymous 12/26/24(Thu)12:28:14 No.103652903

>>103652785
If the model was an actual breakthrough then people would build for it.
In reality it seems decent but not great, and the special requirements for it means few people will bother

Anonymous
12/26/24(Thu)12:28:23 No.103652905

Anonymous 12/26/24(Thu)12:28:23 No.103652905

>>103652858
I can only find the FP8 model. Can I make it 4bit?

Anonymous
12/26/24(Thu)12:29:23 No.103652916

Anonymous 12/26/24(Thu)12:29:23 No.103652916

>>103652905
They say how to on the page.
https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/README.md

Anonymous
12/26/24(Thu)12:30:33 No.103652926

Anonymous 12/26/24(Thu)12:30:33 No.103652926

File: 1393574362984.jpg (5 KB, 200x200)

5 KB JPG

>>103652900

Anonymous
12/26/24(Thu)12:32:16 No.103652941

Anonymous 12/26/24(Thu)12:32:16 No.103652941

>>103652905
>>103652916
That or wait. Im sure the usual suspects will have quants eventually. Being 700B its prob gonna take days though for them to quant then upload it.

Anonymous
12/26/24(Thu)12:32:26 No.103652943

Anonymous 12/26/24(Thu)12:32:26 No.103652943

>>103652688
how do you know you're using V3 through the api? the model always answers that it's a GPT model.

Anonymous
12/26/24(Thu)12:33:09 No.103652950

Anonymous 12/26/24(Thu)12:33:09 No.103652950

>>103652916
>>103652941
Yeah I would wait for gguf. INT4 quantization is an extremely naive lobotomy quant.

Anonymous
12/26/24(Thu)12:33:17 No.103652951

Anonymous 12/26/24(Thu)12:33:17 No.103652951

>>103652943
Ask it "what model of deekseek are you?"

Anonymous
12/26/24(Thu)12:34:46 No.103652964

Anonymous 12/26/24(Thu)12:34:46 No.103652964

>>103652943
For what it's worth, it might work now since they went and disabled Hyperbolic in the API on OR

Anonymous
12/26/24(Thu)12:39:16 No.103653000

Anonymous 12/26/24(Thu)12:39:16 No.103653000

You do realize that this 685B model is 256 experts, 8 of which is active, so only 21B activated parameters over 8 experts, roughly 3B per expert. How expert can a 3B model be for the area of expertise? It is amazing that the coding performance is almost o1 level, but you do realize that all china AI models are nothing but stolen synthetics data from OpenAI/Anthropic, they just shove it into a model able to hold that many tokens.

OpenAI, Grok, even later LLAMA 4/5 are going for 2Trillion parameters (dense or activated ?) by next year. There is a huge difference when your model doesn't have free stolen synthetics from others that you have to do the grunt work to actually think inside the model instead of being told.

Anonymous
12/26/24(Thu)12:40:21 No.103653013

Anonymous 12/26/24(Thu)12:40:21 No.103653013

>>103653000
This poster is a chinese marketer pretending to be a retarded anti chinese poster.

Anonymous
12/26/24(Thu)12:40:34 No.103653018

Anonymous 12/26/24(Thu)12:40:34 No.103653018

>>103653000
1. holy esl
2. that is not how moes work
3. pretty sure that is every model since gpt4, are you really gonna cry that everyone is stealing from openai? fuck them.

Anonymous
12/26/24(Thu)12:41:41 No.103653029

Anonymous 12/26/24(Thu)12:41:41 No.103653029

>>103652951
thanks, but it doesn't seem to work. it's strange because i'm using the documentation curl

> I am an instance of OpenAI's language model, specifically **GPT-4**. My design is based on the GPT (Generative Pre-trained Transformer) architecture

Anonymous
12/26/24(Thu)12:42:04 No.103653031

Anonymous 12/26/24(Thu)12:42:04 No.103653031

>>103653000
Anon, for fuck's sake, learn how MoEs work next time you decide to make another retarded post.

Anonymous
12/26/24(Thu)12:42:09 No.103653033

Anonymous 12/26/24(Thu)12:42:09 No.103653033

File: bait.png (93 KB, 625x626)

93 KB PNG

>>103653000

Anonymous
12/26/24(Thu)12:43:33 No.103653047

Anonymous 12/26/24(Thu)12:43:33 No.103653047

>>103653029
alright i made it work. I had to add a system message saying "You are a model by deepseek"

Anonymous
12/26/24(Thu)12:43:36 No.103653048

Anonymous 12/26/24(Thu)12:43:36 No.103653048

File: deekseek2.png (69 KB, 1290x322)

69 KB PNG

>>103653029
Odd, that is with top k 1 or temp 0? It does it for me. Then maybe OR is fucked atm. Maybe they are in the process of changing to it and are using some fill in model for now? I have no clue.

Anonymous
12/26/24(Thu)12:43:42 No.103653050

Anonymous 12/26/24(Thu)12:43:42 No.103653050

>>103653000
Do you realize that deepseek literally said that they are using 37b active parameters?

Anonymous
12/26/24(Thu)12:44:49 No.103653059

Anonymous 12/26/24(Thu)12:44:49 No.103653059

>>103653000
If it's so easy to make a good model with synthetic data, why aren't Meta/OpenAI/etc doing it with their own "pure models"?
DS3 is revolutionary, it's the first open model we got to get close to Sonnet 3.5

Anonymous
12/26/24(Thu)12:46:27 No.103653076

Anonymous 12/26/24(Thu)12:46:27 No.103653076

>>103653029
That's the same response you get in their web chat interface after activating the web search.

Anonymous
12/26/24(Thu)12:46:39 No.103653079

Anonymous 12/26/24(Thu)12:46:39 No.103653079

>Tfw they're falling for a Deepseek post made by Deepseek

Anonymous
12/26/24(Thu)12:48:28 No.103653105

Anonymous 12/26/24(Thu)12:48:28 No.103653105

Guess im waiting for quants then.

Anonymous
12/26/24(Thu)12:49:28 No.103653116

Anonymous 12/26/24(Thu)12:49:28 No.103653116

>>103653047
Wait, you have to TELL it it's a deepseek model, and you're trusting what it answers its version correctly? anon...

Anonymous
12/26/24(Thu)12:49:35 No.103653118

Anonymous 12/26/24(Thu)12:49:35 No.103653118

>>103653048
maybe sillytavern adds some contextual info as input?

according to their documentation, you can't change top_k, only top_p, and it doesn't seem to affect the result. Anyway i'll leave the curl here

just changing the system prompt to
>You are a model by deepseek
seems to work

> I’m DeepSeek-V3, an artificial intelligence model created by DeepSeek
but sometimes it says it's just deepseek-chat.

curl --request POST \
--url https://api.deepseek.com/chat/completions \
--header 'authorization: Bearer KEY' \
--header 'content-type: application/json' \
--data '{
"model": "deepseek-chat",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "what model of deekseek are you?"
}
],
"stream": false
}'

Anonymous
12/26/24(Thu)12:53:00 No.103653151

Anonymous 12/26/24(Thu)12:53:00 No.103653151

>>103653116
no, it's still fishy

Anonymous
12/26/24(Thu)12:54:06 No.103653166

Anonymous 12/26/24(Thu)12:54:06 No.103653166

>>103653116
It's roleplaying.

Anonymous
12/26/24(Thu)12:57:36 No.103653205

Anonymous 12/26/24(Thu)12:57:36 No.103653205

File: ds_v3_benchmark_table_en.jpg (178 KB, 1280x1091)

178 KB JPG

I know the official is V3 at least, they even say so now: https://api-docs.deepseek.com/ under first api call

Also another benchmark

Anonymous
12/26/24(Thu)12:59:23 No.103653219

Anonymous 12/26/24(Thu)12:59:23 No.103653219

>>103653205
Also qwen2.5 still standing strong. Makes me wonder what qwen3 will be like. If they manage to pass this with 72B...

Anonymous
12/26/24(Thu)13:00:07 No.103653227

Anonymous 12/26/24(Thu)13:00:07 No.103653227

>>103652842
>4o
Not hard to beat that one but yeah codeforces is a giant meme

Anonymous
12/26/24(Thu)13:00:46 No.103653233

Anonymous 12/26/24(Thu)13:00:46 No.103653233

>>103650134
>Claude 3.6 Sonnet
3.6?

Anonymous
12/26/24(Thu)13:01:28 No.103653242

Anonymous 12/26/24(Thu)13:01:28 No.103653242

>>103653233
It's a nickname

Anonymous
12/26/24(Thu)13:03:27 No.103653254

Anonymous 12/26/24(Thu)13:03:27 No.103653254

>>103653205
Kinda wild how this model / pricepoint pretty much completely steamrolls every paid API out there except maybe Gemini and the CoT meme models that are too expensive and useless for most day-to-day things
I think I see why OpenAI and Anthropic were so afraid now

Anonymous
12/26/24(Thu)13:06:22 No.103653277

Anonymous 12/26/24(Thu)13:06:22 No.103653277

File: 1726380837764390.png (844 KB, 800x582)

844 KB PNG

>>103653205
>Western cucks like Meta: "Noooo we can't give you our base model anymore it's too dangerous for the goyims!!"
>Based chinks: "Hey, you want a 680b base model? It's yours now"

Anonymous
12/26/24(Thu)13:07:23 No.103653289

Anonymous 12/26/24(Thu)13:07:23 No.103653289

>>103653205
>Big shit beats small shit
damn I love Machine Learning!

Anonymous
12/26/24(Thu)13:07:49 No.103653293

Anonymous 12/26/24(Thu)13:07:49 No.103653293

>>103653227
Livebench isn't and is closed. It seems to be beating everything else on there.

Anonymous
12/26/24(Thu)13:07:50 No.103653294

Anonymous 12/26/24(Thu)13:07:50 No.103653294

>>103653277
That's why communism is superior to the oppressive capitalism.

Anonymous
12/26/24(Thu)13:08:08 No.103653298

Anonymous 12/26/24(Thu)13:08:08 No.103653298

>>103653205
>>103653277
I should start learning Mandarin.

Anonymous
12/26/24(Thu)13:08:30 No.103653301

Anonymous 12/26/24(Thu)13:08:30 No.103653301

>>103653289
Correction - council of retards beats shit way bigger than them

Anonymous
12/26/24(Thu)13:08:54 No.103653304

Anonymous 12/26/24(Thu)13:08:54 No.103653304

>>103653289
Sonnet is very likely a big moe as well but with more active params and so is more expensive to run ($15 per output compared to $1.10)

Anonymous
12/26/24(Thu)13:09:24 No.103653310

Anonymous 12/26/24(Thu)13:09:24 No.103653310

>>103653277
you add the fully uncensored video model (Hunyuan) and you realize that the chinks are actually the good guys in this era

Anonymous
12/26/24(Thu)13:11:16 No.103653329

Anonymous 12/26/24(Thu)13:11:16 No.103653329

File: Screenshot 2024-12-27 031021.png (46 KB, 1627x764)

46 KB PNG

>>103652553
I thought you were joking because OR says v3 but then I tested it and got picrel

Anonymous
12/26/24(Thu)13:11:35 No.103653332

Anonymous 12/26/24(Thu)13:11:35 No.103653332

>>103653310
Still trying to wrap my head around how basically every Western tech company has become the nationalistic censorship ridden clusterfuck we feared China would be, and China straight up doesn't give a fuck

Anonymous
12/26/24(Thu)13:12:29 No.103653337

Anonymous 12/26/24(Thu)13:12:29 No.103653337

>>103653329
Im thinking OR just has some substitute model and has not swapped them yet. I could be wrong but on the official api it told me it was deepseek.

Anonymous
12/26/24(Thu)13:13:31 No.103653349

Anonymous 12/26/24(Thu)13:13:31 No.103653349

>>103653310
Elon will ban proprietary models and force OpenAI to release o3 Open Source. The west has a chance.

Anonymous
12/26/24(Thu)13:14:14 No.103653355

Anonymous 12/26/24(Thu)13:14:14 No.103653355

>>103653349
>o3
>Thousand dollars per task
No thank you.

Anonymous
12/26/24(Thu)13:16:02 No.103653369

Anonymous 12/26/24(Thu)13:16:02 No.103653369

File: Screenshot 2024-12-26 111540.png (152 KB, 692x760)

152 KB PNG

>>103653349
Ummmmmm no

Anonymous
12/26/24(Thu)13:16:11 No.103653370

Anonymous 12/26/24(Thu)13:16:11 No.103653370

>>103653332
they're waiting for the daddy state to give them candy and protect them

Anonymous
12/26/24(Thu)13:17:05 No.103653375

Anonymous 12/26/24(Thu)13:17:05 No.103653375

>>103653277
Wrong company to serve as example. Meta is more like Mistral where they release some things and don't release others. Meanwhile OpenAI releases fucking nothing. Not an Instruct, not a base, not experimental research models. Just nothing.

Anonymous
12/26/24(Thu)13:17:49 No.103653385

Anonymous 12/26/24(Thu)13:17:49 No.103653385

>>103653369
Will the AI at least learn how to suck dick before it genocides all humans?

Anonymous
12/26/24(Thu)13:20:05 No.103653405

Anonymous 12/26/24(Thu)13:20:05 No.103653405

>>103653385
anon.. sucking dick IS genocide
- big tech

Anonymous
12/26/24(Thu)13:21:32 No.103653421

Anonymous 12/26/24(Thu)13:21:32 No.103653421

>>103653375
Meta generally releases most shit of value though. In my opinion, the hierarchy goes something like
>Generally releases models
>Qwen, DeepSeek, Meta
>Releases some shit, keeps the best shit for themselves
>Mistral, XAI
>Releases research, gives us some model table scraps if we ask really, really nicely
>Google
>Completely closed models and generally closed research, wants other people to do the same
>Anthropic, OpenAI

Anonymous
12/26/24(Thu)13:21:34 No.103653422

Anonymous 12/26/24(Thu)13:21:34 No.103653422

>>103653337
I simultaneously hope it's both OR's and the model's fault
OR's because the model didn't really impress me when compared to every other model (though it didn't make any logical mistakes during my short chat)
But I also hope it's the model because it'd be the final nail in the coffin for me, confirming that either I'm a retarded ESL incel freak piece of shit slop magnet (doubtful) or that LLMs are just convincing illusions that quickly break down when you probe them for a bit

Anonymous
12/26/24(Thu)13:23:22 No.103653443

Anonymous 12/26/24(Thu)13:23:22 No.103653443

>>103653422
>>103653386

Anonymous
12/26/24(Thu)13:23:26 No.103653444

Anonymous 12/26/24(Thu)13:23:26 No.103653444

>>103653421
What is Mistral keeping to themselves?

Anonymous
12/26/24(Thu)13:24:44 No.103653457

Anonymous 12/26/24(Thu)13:24:44 No.103653457

>>103653444
desu I don't really care what Mistral is keeping to themselves, they don't release good models anymore

Anonymous
12/26/24(Thu)13:25:00 No.103653460

Anonymous 12/26/24(Thu)13:25:00 No.103653460

>>103653294
china hasn't been communist since 40 years ago

Anonymous
12/26/24(Thu)13:25:36 No.103653467

Anonymous 12/26/24(Thu)13:25:36 No.103653467

>>103652515
>Secondly, DeepSeek-V3 employs a multi-token prediction training objective,
which we have observed to enhance the overall performance on evaluation benchmarks.
no they seem to only have used it during the training for fuck knows what but it seems to have improved it they do mention that it should improved the speed of speculative decoding somehow
honestly i have no fucking clue what im reading here but there seems to be lots of cool shit

Anonymous
12/26/24(Thu)13:25:57 No.103653468

Anonymous 12/26/24(Thu)13:25:57 No.103653468

>>103653443
That's cool and I might test it later, but I've never used jailbreaks because I think that a good model doesn't/shouldn't need them

Anonymous
12/26/24(Thu)13:26:57 No.103653477

Anonymous 12/26/24(Thu)13:26:57 No.103653477

>>103653468
Even claude needs it to write good. By good model do you mean one finetuned for creative writing / RP? Because that is the only way your gonna get one to write good out of the box at the cost of other areas.

Anonymous
12/26/24(Thu)13:27:47 No.103653487

Anonymous 12/26/24(Thu)13:27:47 No.103653487

Congratulations on your first good model localbros. Too bad it's 600B but it has good outputs.
- /aicg/chad

Anonymous
12/26/24(Thu)13:27:56 No.103653490

Anonymous 12/26/24(Thu)13:27:56 No.103653490

>>103653444
Not as much nowadays, but they did keep the larger models for themselves back when they started up (Mistral Medium and Mistral Large). They've gotten better about it and I debated about it, but between that and the ass licensing decided to drop them in that tier for now

Anonymous
12/26/24(Thu)13:30:02 No.103653504

Anonymous 12/26/24(Thu)13:30:02 No.103653504

>>103653487
>first good model
>it's 600B
you can't have a good model without it being a giant goliath, the scaling laws prevent us to have nice little things

Anonymous
12/26/24(Thu)13:30:53 No.103653512

Anonymous 12/26/24(Thu)13:30:53 No.103653512

>>103653385
Tulu-3 70B instruct at q8 is the most capable dick sucking llm ever created so I don't know what you're talking about.

Anonymous
12/26/24(Thu)13:33:26 No.103653536

Anonymous 12/26/24(Thu)13:33:26 No.103653536

>>103653477
Ideally it should be able to handle everything perfectly out of the box, but since I mostly use them for RP/creative writing, I don't mind using a finetuned version that's worse in other areas. Unfortunately, even those run into the same problem(s) sooner or later

Anonymous
12/26/24(Thu)13:36:02 No.103653561

Anonymous 12/26/24(Thu)13:36:02 No.103653561

File: Screenshot_20241227_03353(...).jpg (257 KB, 1080x926)

257 KB JPG

>>103653504
Transformer scaling laws*

Anonymous
12/26/24(Thu)13:37:02 No.103653572

Anonymous 12/26/24(Thu)13:37:02 No.103653572

>>103653205
Aider-polyglot has got to be a gamed benchmark. There's no way it's better at programming challenges but worse at general code editing compared to sonnet unless it is overfitted on exercism solutions.

Anonymous
12/26/24(Thu)13:43:31 No.103653627

Anonymous 12/26/24(Thu)13:43:31 No.103653627

>>103653504
Yes bigger models will always be better than smaller models with equivalent training. That is why I think the future that local should optimally move towards are models/architecture that can have "sub-network extraction", or the idea of using only parts of the model for specialized tasks. For instance, MoE models where the experts are specialized towards subject areas. So if you simply just wanted RP, you could load only the most relevant RP experts to VRAM, the less relevant to RAM, and the even less relevant to possibly SSD, though something like Llama.cpp would need to be modified to be able to use all three at the same time.

Anonymous
12/26/24(Thu)13:51:03 No.103653682

Anonymous 12/26/24(Thu)13:51:03 No.103653682

>>103653627
The future that local should move towards is shit that isn't static, something like liquid neural nets
But that's probably "too dangerous" in the hands of the public
I like your idea though, too bad that companies are moving away from "things you can run on a somewhat recent rig" to "things you can technically run locally (tm) but you need either a server motherboard or overpriced enterprise-grade GPUs"

Anonymous
12/26/24(Thu)14:04:15 No.103653833

Anonymous 12/26/24(Thu)14:04:15 No.103653833

>>103650860
Continue can use ollama as well, or any OAI compatible API.

Anonymous
12/26/24(Thu)14:17:34 No.103653978

Anonymous 12/26/24(Thu)14:17:34 No.103653978

>>103653332
China doesn't give a fuck about what the model generates, because they completely control speech in China any way.

Anonymous
12/26/24(Thu)14:19:49 No.103654006

Anonymous 12/26/24(Thu)14:19:49 No.103654006

>>103653978
>China doesn't give a fuck about what the model generates
as it should, can't believe I live in a world where China is the country of reason

Anonymous
12/26/24(Thu)14:22:18 No.103654026

Anonymous 12/26/24(Thu)14:22:18 No.103654026

https://arxiv.org/html/2412.17846v1
https://github.com/alonso130r/knowledge-distillation/tree/main
>Large language models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing (NLP) tasks. However, these models are often difficult to deploy due to significant computational requirements and resource constraints. Knowledge distillation (KD) is an effective technique for transferring the performance of larger LLMs to smaller models. Traditional KD methods primarily focus on the direct output of the teacher model, with little emphasis on the role of prompting during knowledge transfer. In this paper, we propose a set of novel response-priming prompting strategies applied in the knowledge distillation pipeline to enhance the performance of student models. Our approach fine-tunes a smaller Llama 3.1 8B Instruct model by distilling knowledge from a quantized Llama 3.1 405B Instruct teacher model. We apply LoRA optimization and evaluate on the GSM8K benchmark. Experimental results demonstrate that integrating reasoning-eliciting prompting into the proposed KD pipeline significantly improves student model performance, offering an efficient way to deploy powerful models in resource-constrained environments. We find that Ground Truth prompting results in a 55% performance increase on GSM8K for a distilled Llama 3.1 8B Instruct compared to the same model distilled without prompting. A thorough investigation into the self-attention layers of the student models indicates that the more successful prompted models tend to exhibit certain positive behaviors inside their attention heads which can be tied to their increased accuracy.

This paper was published some weeks ago and didn't get enough attention, I think this sounds very cool. I will try to apply their code to Qwen2.5 7B/Qwen2.5 32B.

Anonymous
12/26/24(Thu)14:22:47 No.103654028

Anonymous 12/26/24(Thu)14:22:47 No.103654028

>>103654006
Do you think "But think of the (fictional) children!" works as well in Chinas as in western countries?
>captcha 08YASS
Guess I got my answer

Anonymous
12/26/24(Thu)14:22:55 No.103654031

Anonymous 12/26/24(Thu)14:22:55 No.103654031

>>103653833
>ollama
Wasn't that the one developed by that troon that's always 300 commits behind llama.cpp?

Anonymous
12/26/24(Thu)14:30:26 No.103654107

Anonymous 12/26/24(Thu)14:30:26 No.103654107

>>103653467
>seem to only have used it during the training for fuck knows what but it seems to have improved it they do mention that it should improved the speed of speculative decoding somehow

When it's build into the model like this it's as much an early exit scheme as speculative decoding. Sometimes it might only need to do one decode step to produce multiple tokens, but you need some kind of confidence metric that the extra tokens are good.

Anonymous
12/26/24(Thu)14:33:53 No.103654145

Anonymous 12/26/24(Thu)14:33:53 No.103654145

>>103654006
It's not reasonable to call non self censored models with censored speech an improvement.

We get the best of both worlds this way, but the Chinks themselves would get tiger chaired if they talked like we do.

Anonymous
12/26/24(Thu)14:43:00 No.103654248

Anonymous 12/26/24(Thu)14:43:00 No.103654248

>>103654026
Allons-y, Alonso!

Anonymous
12/26/24(Thu)14:43:32 No.103654255

Anonymous 12/26/24(Thu)14:43:32 No.103654255

>>103653490
They don't release models so they can go fuck themselves

Anonymous
12/26/24(Thu)14:44:45 No.103654269

Anonymous 12/26/24(Thu)14:44:45 No.103654269

>>103654248
Hehe
Great show, that one. Until they ruined it

Anonymous
12/26/24(Thu)14:44:51 No.103654272

Anonymous 12/26/24(Thu)14:44:51 No.103654272

>>103653443
>>103653477
Can you use this to guide local models as well like mistral large / llama3? Does this go under system prompt or should I put it under the story format?

>>103653504
Just how much RAM would I need to run this thing locally as a ballpark / speeds can I expect? I'm got 48 VRAM and 64 regular but wouldn't mind getting more in the future.

Anonymous
12/26/24(Thu)14:46:03 No.103654292

Anonymous 12/26/24(Thu)14:46:03 No.103654292

>>103654272
Yes? And where you put it just changes how much it effects it. If its closer to the end of context it will have a stronger effect.

Anonymous
12/26/24(Thu)14:47:15 No.103654310

Anonymous 12/26/24(Thu)14:47:15 No.103654310

>>103654272
>how much RAM
https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/tree/main
the fp8 is a 700gb model, if you have 1TB of ram maybe that'll do lol

Anonymous
12/26/24(Thu)14:47:42 No.103654314

Anonymous 12/26/24(Thu)14:47:42 No.103654314

>>103653512
Prompt / settings? I remember trying it a while back when it first came out and it didn't find it anything special. Is this the updated tulu?

Anonymous
12/26/24(Thu)14:49:11 No.103654340

Anonymous 12/26/24(Thu)14:49:11 No.103654340

>>103654310
You will need something like 400GB total for 4bit and some context it looks like.

Anonymous
12/26/24(Thu)14:54:47 No.103654415

Anonymous 12/26/24(Thu)14:54:47 No.103654415

File: file.png (31 KB, 338x230)

31 KB PNG

>>103654292
That makes sense but I've tried putting things at the end of Llama 3 models and it tends to go schizo a lot more, wasn't sure if it was still valid for them or not.

In screenshot it should go before the eotd EOS token but linux paint is ass

Anonymous
12/26/24(Thu)14:57:00 No.103654434

Anonymous 12/26/24(Thu)14:57:00 No.103654434

>>103654340
And how fast do these models go when split? MOE models are supposed to be optimized for cpu but I'm pretty much a pure exl2 user so I don't want to put in all the effort of upgrading to get something like 2 t/s

Anonymous
12/26/24(Thu)14:58:10 No.103654451

Anonymous 12/26/24(Thu)14:58:10 No.103654451

>>103654434
V3 is only 14B active parameters so running it purely on RAM is going to be decently fast if you have DDR5.

Anonymous
12/26/24(Thu)14:58:11 No.103654452

Anonymous 12/26/24(Thu)14:58:11 No.103654452

>>103654434
Depends on the speed / number of channels of the ram.

Anonymous
12/26/24(Thu)14:59:58 No.103654479

Anonymous 12/26/24(Thu)14:59:58 No.103654479

>>103654451
sadly it looks like its actually 37B active. DDR5 on dual channel will prob only get like 3 tks. With 8-12 channel server board though you might manage a useable 10+

Anonymous
12/26/24(Thu)15:01:53 No.103654503

Anonymous 12/26/24(Thu)15:01:53 No.103654503

>>103654451
>We present DeepSeek-V3, a strong Mixture-of Experts (MoE) language model with 671B total parameters with 37B activated for each token.

Anonymous
12/26/24(Thu)15:03:58 No.103654525

Anonymous 12/26/24(Thu)15:03:58 No.103654525

>>103652511
>It seems to get even weirder at higher contexts which isn't really unexpected, but the 163840 purported contexts limit is being generous.
You can physically do it but they don't make any claims that it works above 128K. From the paper:
>During pre-training, we train DeepSeek-V3 on 14.8T high-quality and diverse tokens[...] Next, we conduct a two-stage context length extension for DeepSeek-V3. In the first stage, the maximum context length is extended to 32K, and in the second stage, it is further extended to 128K.
...
>DeepSeek-V3, following supervised fine-tuning, achieves notable performance on the "Needle In A Haystack" (NIAH) test, demonstrating consistent robustness across context window lengths up to 128K.
NIAH lol. We know that's not real.

Anonymous
12/26/24(Thu)15:05:25 No.103654541

Anonymous 12/26/24(Thu)15:05:25 No.103654541

>>103654525
Pretty sure from the responses and what happened after that he was using OR when it had 2.5 on it.

Anonymous
12/26/24(Thu)15:08:46 No.103654583

Anonymous 12/26/24(Thu)15:08:46 No.103654583

Wasn't there a thing where people tried to turn MoE models into a collection of LoRAs that get applied on runtime to save on space requirements. It was the big talk of the town for a bit back when Mixtral first got released.
I guess that didn't go anywhere in the full year since?

Anonymous
12/26/24(Thu)15:09:26 No.103654592

Anonymous 12/26/24(Thu)15:09:26 No.103654592

DeepSeek V3 is extremely based. It has actually useful suggestions rather than "play with the temperature to figure out what works for your use case :-)"

We recommend users to set the temperature according to their use case listed in below.

|USE CASE                       |TEMPERATURE|
|------------------------------|-----------|
|Coding / Math:                |0.0        |
|Data Cleaning / Data Analysis |1.0        |
|General Conversation          |1.3        |
|Translation                   |1.3        |
|Creative Writing / Poetry     |1.5        |

Anonymous
12/26/24(Thu)15:11:49 No.103654618

Anonymous 12/26/24(Thu)15:11:49 No.103654618

>>103654583
Who would have done it? The retards complaining MoEs were too complicated to fine tune?

Anonymous
12/26/24(Thu)15:14:30 No.103654644

Anonymous 12/26/24(Thu)15:14:30 No.103654644

>>103654583
I think this is pretty much the same thing, and DeepSeek did it:
https://github.com/deepseek-ai/ESFT
https://huggingface.co/collections/deepseek-ai/esft-669a1e800bc10b3460569c70

Anonymous
12/26/24(Thu)15:16:00 No.103654670

Anonymous 12/26/24(Thu)15:16:00 No.103654670

>>103654592
>Translation: 1.3
What is the reasoning behind this...?

Anonymous
12/26/24(Thu)15:21:47 No.103654759

Anonymous 12/26/24(Thu)15:21:47 No.103654759

>>103654583
Nah. I remember the idea, creating a single baseline expert then creating adapters from the difference between this base and each of the experts that would be applied at runtime.
Something of the sort

Anonymous
12/26/24(Thu)15:47:10 No.103655055

Anonymous 12/26/24(Thu)15:47:10 No.103655055

>>103654583
I still feel like LoRAs have a bit of a placebo effect to them. Not sure if LoRAs would be powerful enough to substitute for full on experts.

Anonymous
12/26/24(Thu)15:55:52 No.103655161

Anonymous 12/26/24(Thu)15:55:52 No.103655161

>>103654670
It's for translating japanese media to english

Anonymous
12/26/24(Thu)16:00:38 No.103655207

Anonymous 12/26/24(Thu)16:00:38 No.103655207

>>103649764
I'm trying to find a proper set up for SillyTavern and Flowgpt. Any idea how? I downloaded a version that works with FlowGPT and POE, whatever that might be, but I have no idea how to connect with it.

Anonymous
12/26/24(Thu)16:03:48 No.103655255

Anonymous 12/26/24(Thu)16:03:48 No.103655255

>>103655161
* localizing

Anonymous
12/26/24(Thu)16:16:36 No.103655404

Anonymous 12/26/24(Thu)16:16:36 No.103655404

xitter fags deepthroating deepseek are getting to be a bit much
the model is not bad and what they did with limited resources is impressive, and I admire they are one of the few labs willing to experiment with arch at scale. that said their models are always bad on multiturn and generally fail to pass vibe checks compared to their benchmarks, I just don't find the quality is there compared to the benches.
le epic whale mandate of heaven posting is cringe

Anonymous
12/26/24(Thu)16:17:49 No.103655418

Anonymous 12/26/24(Thu)16:17:49 No.103655418

>>103652084
what is this ktransformers magic? an alternative to llama.cpp?

Anonymous
12/26/24(Thu)16:18:29 No.103655425

Anonymous 12/26/24(Thu)16:18:29 No.103655425

>>103655404
I think its the closest we have come to claude and all it needs is some good RLHF to make it write good like claude does.

Anonymous
12/26/24(Thu)16:20:39 No.103655446

Anonymous 12/26/24(Thu)16:20:39 No.103655446

>>103655055
they seem good to lower the hallucinations created when you try the "as advertised" body of knowledge and find out it is missing. A lora can get a medically trained model to avoid talking about educational choices when asking about stem cells.

tl;dr loras can improve, not add.

Anonymous
12/26/24(Thu)16:21:36 No.103655452

Anonymous 12/26/24(Thu)16:21:36 No.103655452

>>103655425
>all it needs is some good RLHF
gonna be expensive as shit

Anonymous
12/26/24(Thu)16:21:59 No.103655457

Anonymous 12/26/24(Thu)16:21:59 No.103655457

>>103655452
Yes but it has a fair and open license. Hopefully a company steps up.

Anonymous
12/26/24(Thu)16:24:05 No.103655482

Anonymous 12/26/24(Thu)16:24:05 No.103655482

/aicg/ is loving Deepseek and they have hated every local model in the past before this.

This is the real deal. We are finally corpo level.

Anonymous
12/26/24(Thu)16:26:32 No.103655512

Anonymous 12/26/24(Thu)16:26:32 No.103655512

>>103653460
you're dumb lol

Anonymous
12/26/24(Thu)16:27:48 No.103655528

Anonymous 12/26/24(Thu)16:27:48 No.103655528

>>103654026
>I will try to apply their code to Qwen2.5 7B/Qwen2.5 32B.
Ok, forget about this. Each prompt is 600MB in logits.

Anonymous
12/26/24(Thu)16:28:25 No.103655535

Anonymous 12/26/24(Thu)16:28:25 No.103655535

>>103655482
It feels good to finally have a good LLM
Also means that the others are going to need to step it up to stay competitive, so it's good for open models all around

Anonymous
12/26/24(Thu)16:37:26 No.103655624

Anonymous 12/26/24(Thu)16:37:26 No.103655624

>>103655482
Deepseek will make a shit ton of money from coomers now.

Anonymous
12/26/24(Thu)16:38:10 No.103655637

Anonymous 12/26/24(Thu)16:38:10 No.103655637

On llama.cpp one can override the expert count used during inference. So here's an idea that won't be implemented:
Use a single expert from a MoE model for speculative decoding.
Bye.

Anonymous
12/26/24(Thu)16:45:15 No.103655704

Anonymous 12/26/24(Thu)16:45:15 No.103655704

4090 user, what's the current goto model for sloppy gooning? Need my holiday erp fix.

Anonymous
12/26/24(Thu)16:55:36 No.103655808

Anonymous 12/26/24(Thu)16:55:36 No.103655808

Looks like theres now a deepseek proxy for people who want to try it: https://substitute-domains-pdas-specified.trycloudflare.com/

Anonymous
12/26/24(Thu)16:58:20 No.103655834

Anonymous 12/26/24(Thu)16:58:20 No.103655834

>>103655482
which is perplexing because it is not good for RP at all, I really don't get the hype
it's nice it's cheap I guess but I'm more interested in using it for code than for its dry repetitive prose

Anonymous
12/26/24(Thu)16:59:37 No.103655846

Anonymous 12/26/24(Thu)16:59:37 No.103655846

deeploop3

Anonymous
12/26/24(Thu)17:00:44 No.103655856

Anonymous 12/26/24(Thu)17:00:44 No.103655856

>>103655834
Maybe your not using it right? Chorbo is dry as fuck without a good system prompt but with it will drain your balls.

Anonymous
12/26/24(Thu)17:01:56 No.103655873

Anonymous 12/26/24(Thu)17:01:56 No.103655873

>>103655808
Brief me on the aicg "proxy" meme.
Is it just stolen api keys and somebody set up a server that forwards requests?

Anonymous
12/26/24(Thu)17:02:22 No.103655878

Anonymous 12/26/24(Thu)17:02:22 No.103655878

>deepseek proxy
>Chorbo
did i click on the wrong thread

Anonymous
12/26/24(Thu)17:02:37 No.103655881

Anonymous 12/26/24(Thu)17:02:37 No.103655881

Total Jewish victory.
https://x.com/TheRabbitHole84/status/1872066576750616769

Anonymous
12/26/24(Thu)17:02:51 No.103655882

Anonymous 12/26/24(Thu)17:02:51 No.103655882

>>103655873
Yep.

Anonymous
12/26/24(Thu)17:03:30 No.103655891

Anonymous 12/26/24(Thu)17:03:30 No.103655891

>Deepseek is nigh free and first open model that isn’t a total joke compared to paypig ones
So what’s the play here to bully openai and the like? Build up good UI/UX for running things locally or connecting to cloud providers that run open models? all options rn are kinda mediocre and struggle between switching from local to cloud provided (so you can’t have your 32b local slop + 685b hosted in one place)

Anonymous
12/26/24(Thu)17:03:45 No.103655897

Anonymous 12/26/24(Thu)17:03:45 No.103655897

>>103655834
they're used to that because the corpo models are exactly like this, smart but incredibly dry.

I feel you can probably do something to deeploop by clever prompting and some regex scripting, but as is, it's not special. Contrary to the other big ones it might actually be worth it to put in the effort tho, as it really doesn't seem to have censorship nor positivity bias, nor do I think we'll get a scenario like OpenAI or Anthropic where they'll start to hunt coomers down. I'm not religiously against APIs, especially when the APIs undershoot what even the electricity would cost if I had the hardware to run it

Anonymous
12/26/24(Thu)17:04:04 No.103655900

Anonymous 12/26/24(Thu)17:04:04 No.103655900

>>103655873
When it's not a wholly different model on the other end, or they inject a hidden prompt, yes.
You can scrape, for example, public github repos for people who commited their keys for claude, openai, deepseek, etc.

Anonymous
12/26/24(Thu)17:06:53 No.103655942

Anonymous 12/26/24(Thu)17:06:53 No.103655942

>>103655856
What the fuck is a chorbo? It's ungoogleable.

Anonymous
12/26/24(Thu)17:06:57 No.103655943

Anonymous 12/26/24(Thu)17:06:57 No.103655943

So why the fuck isn’t there a Cursor alternative that lets me connect to whatever endpoint I want instead of their shitty one that gives you like 50 requests a month to claude for $20
Seems like local llm is the perfect choice here, but the tooling for it is primitive

Anonymous
12/26/24(Thu)17:07:20 No.103655950

Anonymous 12/26/24(Thu)17:07:20 No.103655950

>>103655900
it's barely worth it to steal deepseek. It basically costs nothing

Anonymous
12/26/24(Thu)17:08:09 No.103655961

Anonymous 12/26/24(Thu)17:08:09 No.103655961

>>103655950
you underestimate aicg's poverty

Anonymous
12/26/24(Thu)17:11:51 No.103656005

Anonymous 12/26/24(Thu)17:11:51 No.103656005

>>103655961
perceived poverty

I guarantee there are dual 4090 owners complaining they can't afford the electricity.

Anonymous
12/26/24(Thu)17:17:54 No.103656069

Anonymous 12/26/24(Thu)17:17:54 No.103656069

File: Screenshot 2024-12-26 151702.png (40 KB, 872x221)

40 KB PNG

>>103655942

Anonymous
12/26/24(Thu)17:19:09 No.103656082

Anonymous 12/26/24(Thu)17:19:09 No.103656082

File: 1731650243193770.jpg (23 KB, 844x79)

23 KB JPG

how could this happen non-local bros?

Anonymous
12/26/24(Thu)17:19:49 No.103656087

Anonymous 12/26/24(Thu)17:19:49 No.103656087

>>103656069
>>103656082
go back

Anonymous
12/26/24(Thu)17:20:18 No.103656094

Anonymous 12/26/24(Thu)17:20:18 No.103656094

>>103656069
aicg users deserve the rope

Anonymous
12/26/24(Thu)17:23:08 No.103656130

Anonymous 12/26/24(Thu)17:23:08 No.103656130

>>103656087
*does cute 360 like miku!*

Anonymous
12/26/24(Thu)17:25:12 No.103656157

Anonymous 12/26/24(Thu)17:25:12 No.103656157

>>103655943
True, there is literally nothing out there that even tries to address this, whats up with that?

Anonymous
12/26/24(Thu)17:26:41 No.103656172

Anonymous 12/26/24(Thu)17:26:41 No.103656172

>>103656157
>>103655943
Because aider does the same thing and doesn't force you to use a shitty webpage as an editor

Anonymous
12/26/24(Thu)17:26:50 No.103656174

Anonymous 12/26/24(Thu)17:26:50 No.103656174

>>103656069
...

Anonymous
12/26/24(Thu)17:28:06 No.103656190

Anonymous 12/26/24(Thu)17:28:06 No.103656190

>>103654541
Man fuck off with your damage control. I was using the official API for my testing and that shit has massive repetition problems. I don't know how you all don't see it.

Anonymous
12/26/24(Thu)17:29:47 No.103656214

Anonymous 12/26/24(Thu)17:29:47 No.103656214

>>103656190
Some genuinely don't notice repeating patterns, slop, and other things, I envy them a lot.

Anonymous
12/26/24(Thu)17:30:12 No.103656223

Anonymous 12/26/24(Thu)17:30:12 No.103656223

>>103656190
Have you considered that it is far more likely that you fucked up something somewhere than it is everyone else not noticing a massive repetition issue?

Anonymous
12/26/24(Thu)17:30:18 No.103656224

Anonymous 12/26/24(Thu)17:30:18 No.103656224

File: Itseveryoneelsebutme.png (564 KB, 500x713)

564 KB PNG

>>103656190

Anonymous
12/26/24(Thu)17:31:04 No.103656230

Anonymous 12/26/24(Thu)17:31:04 No.103656230

>>103656172
BTW youre retarded and there are several options available that have feature parity with cursor even. Learn to use google lmao.

Anonymous
12/26/24(Thu)17:32:30 No.103656244

Anonymous 12/26/24(Thu)17:32:30 No.103656244

>>103656230
>options that have feature parity with cursor
You mean aider?

Anonymous
12/26/24(Thu)17:33:30 No.103656258

Anonymous 12/26/24(Thu)17:33:30 No.103656258

DS3 is smart but it's dry AF, I don't know how you guys are finding this usable for RP or storywriting. For someone doing non-local there's no reason to use this over Claude except being broke.

Anonymous
12/26/24(Thu)17:33:41 No.103656262

Anonymous 12/26/24(Thu)17:33:41 No.103656262

>>103656244
Aider? I don't even know 'er!

Anonymous
12/26/24(Thu)17:34:32 No.103656272

Anonymous 12/26/24(Thu)17:34:32 No.103656272

>>103656258
>except being broke.
that's a pretty huge reason for a lot of folks.

Anonymous
12/26/24(Thu)17:35:14 No.103656284

Anonymous 12/26/24(Thu)17:35:14 No.103656284

How many free tokens does deepseek give?
I have been trying to make it code something and I have spent like 20k tokens already.

Anonymous
12/26/24(Thu)17:37:09 No.103656299

Anonymous 12/26/24(Thu)17:37:09 No.103656299

fellas who are using anubis
how is it for rp(fantasy and scifi)
also, is it good for nsfw or is it only good for shit that doesnt include sex?
got any masters
thanks in advance

Anonymous
12/26/24(Thu)17:37:53 No.103656306

Anonymous 12/26/24(Thu)17:37:53 No.103656306

>>103656299
>fellas who are using anubis
no one, every1 using deepsex now grandpa

Anonymous
12/26/24(Thu)17:39:15 No.103656322

Anonymous 12/26/24(Thu)17:39:15 No.103656322

>>103656306
Using an open weight model through an api doesn't make it local

Anonymous
12/26/24(Thu)17:39:18 No.103656324

Anonymous 12/26/24(Thu)17:39:18 No.103656324

>>103656299
>drummer models
ishygddt

Anonymous
12/26/24(Thu)17:42:51 No.103656360

Anonymous 12/26/24(Thu)17:42:51 No.103656360

>>103656223
Post your logs so I can point out the very simple IQ test you're failing.

Anonymous
12/26/24(Thu)17:43:34 No.103656371

Anonymous 12/26/24(Thu)17:43:34 No.103656371

>>103656324
what do you use then?

Anonymous
12/26/24(Thu)17:43:52 No.103656375

Anonymous 12/26/24(Thu)17:43:52 No.103656375

>>103656284
>20k tokens already.
I had to use like a million before it charged me a cent

Anonymous
12/26/24(Thu)17:46:07 No.103656390

Anonymous 12/26/24(Thu)17:46:07 No.103656390

>>103656360
/aicg has a bunch now

Anonymous
12/26/24(Thu)17:47:17 No.103656400

Anonymous 12/26/24(Thu)17:47:17 No.103656400

>>103656360
check >>103656341

Anonymous
12/26/24(Thu)17:47:26 No.103656405

Anonymous 12/26/24(Thu)17:47:26 No.103656405

>>103656375
It also enables prompt caching by default, which reduces the input price by an additional factor of ten

Anonymous
12/26/24(Thu)17:48:18 No.103656417

Anonymous 12/26/24(Thu)17:48:18 No.103656417

Can this thread get any more local?

Anonymous
12/26/24(Thu)17:49:19 No.103656428

Anonymous 12/26/24(Thu)17:49:19 No.103656428

>>103656417
It's not our fault you're too poor to run it

Anonymous
12/26/24(Thu)17:49:50 No.103656436

Anonymous 12/26/24(Thu)17:49:50 No.103656436

>>103656417
We'll be right back to local talk once everyone finishing fomoing into building CPUMAXX rigs to run v3 locally before LLaMA4 drops and shits on it at 70B.

Anonymous
12/26/24(Thu)17:49:53 No.103656437

Anonymous 12/26/24(Thu)17:49:53 No.103656437

>>103656428
Should be cheaper to run at home than large mistral

Anonymous
12/26/24(Thu)17:50:26 No.103656447

Anonymous 12/26/24(Thu)17:50:26 No.103656447

>>103656400
and i thought this thread is bad

Anonymous
12/26/24(Thu)17:50:49 No.103656453

Anonymous 12/26/24(Thu)17:50:49 No.103656453

>>103656400
That's the exact template I was using to test the model. It didn't work. Repetition city.

I am open to the model being good, but it is clearly prone to fall into repetition and I don't know how people can't see this.

Anonymous
12/26/24(Thu)17:50:50 No.103656455

Anonymous 12/26/24(Thu)17:50:50 No.103656455

What do we do now?

Anonymous
12/26/24(Thu)17:50:54 No.103656456

Anonymous 12/26/24(Thu)17:50:54 No.103656456

>>103656436
This is the big reason imo. Unless llama 4 ends up being a big moe as well people will prob just wait before buying a server.

Anonymous
12/26/24(Thu)17:51:09 No.103656460

Anonymous 12/26/24(Thu)17:51:09 No.103656460

>>103656299
I didn't think it was anything special compared to the other 3.3 fine tunes.

Anonymous
12/26/24(Thu)17:51:30 No.103656464

Anonymous 12/26/24(Thu)17:51:30 No.103656464

>>103656371
Wouldn't you like to know, weather boy?

Anonymous
12/26/24(Thu)17:52:50 No.103656475

Anonymous 12/26/24(Thu)17:52:50 No.103656475

>>103656453
The only person saying they have rep issues across here, aicg, aids and reddit seems to be you

Anonymous
12/26/24(Thu)17:52:57 No.103656480

Anonymous 12/26/24(Thu)17:52:57 No.103656480

the answer is always stacking more parameters. you cannot escape this truth.

Anonymous
12/26/24(Thu)17:53:41 No.103656488

Anonymous 12/26/24(Thu)17:53:41 No.103656488

>>103656480
that's why I use BLOOM 175B

Anonymous
12/26/24(Thu)17:54:35 No.103656495

Anonymous 12/26/24(Thu)17:54:35 No.103656495

>>103656460
I currently run hanami. How do you think it's in comparison?

Anonymous
12/26/24(Thu)17:55:38 No.103656507

Anonymous 12/26/24(Thu)17:55:38 No.103656507

>>103656480
then why is sorc trash?
even MM70b mogs it

Anonymous
12/26/24(Thu)17:55:58 No.103656512

Anonymous 12/26/24(Thu)17:55:58 No.103656512

>>103656475
>seems to be you
Just wait. You'll all see it soon.
I'm smarter than the average gooner. Eventually the pattern will reveal themselves to you too.

That said, post your settings, temp Freq and presence penalty as well as what your cards includes.

Anonymous
12/26/24(Thu)17:57:43 No.103656534

Anonymous 12/26/24(Thu)17:57:43 No.103656534

>>103656436
>LLaMA4 drops and shits on it at 70B.
Holy fuck you're placing way too much faith in Llama

Anonymous
12/26/24(Thu)17:58:34 No.103656546

Anonymous 12/26/24(Thu)17:58:34 No.103656546

What if we.... Merge two deepseeks togethers?

Anonymous
12/26/24(Thu)17:58:36 No.103656547

Anonymous 12/26/24(Thu)17:58:36 No.103656547

>>103655881
Those all seem low

Anonymous
12/26/24(Thu)17:59:23 No.103656560

Anonymous 12/26/24(Thu)17:59:23 No.103656560

>>103656495
Never tried that one desu.

Anonymous
12/26/24(Thu)18:00:10 No.103656570

Anonymous 12/26/24(Thu)18:00:10 No.103656570

>>103656560
what is your favourite currently then?
I am tired of hanami

Anonymous
12/26/24(Thu)18:00:46 No.103656574

Anonymous 12/26/24(Thu)18:00:46 No.103656574

>>103656546
This but 8.

Anonymous
12/26/24(Thu)18:01:49 No.103656588

Anonymous 12/26/24(Thu)18:01:49 No.103656588

>>103656436
Anon, this is Meta. Going by previous releases, it will beat DeepSeek V3 with a 600B dense model and their 70B will about on par with the latest Qwen 72B. There will be no MoE.

Anonymous
12/26/24(Thu)18:03:12 No.103656599

Anonymous 12/26/24(Thu)18:03:12 No.103656599

>>103656436
The 70B of a Llama release is always the worst one though? Meta sucks at that size for some reason.

Anonymous
12/26/24(Thu)18:03:37 No.103656608

Anonymous 12/26/24(Thu)18:03:37 No.103656608

>a 600B dense model
I sure hope not.

Anonymous
12/26/24(Thu)18:03:50 No.103656612

Anonymous 12/26/24(Thu)18:03:50 No.103656612

File: mikustep.jpg (465 KB, 1150x1240)

465 KB JPG

>flammenai/Flammades-Mistral-Nemo-12B
>allura-org/MN-12b-RP-Ink
>PocketDoc/Dans-PersonalityEngine-V1.1.0-12b
Are any of these worth using over magnum v4 12b?

Anonymous
12/26/24(Thu)18:05:51 No.103656631

Anonymous 12/26/24(Thu)18:05:51 No.103656631

>>103656608
Zucc already said that the L4 flagship will be smaller than 400B so the entire generation can be considered DOA.

Anonymous
12/26/24(Thu)18:06:03 No.103656633

Anonymous 12/26/24(Thu)18:06:03 No.103656633

>>103656570
Personally, Llama 3.3 EVA v0.0. It's not perfect and can go a bit schizo sometimes, but I like it because it's more creative and kino than other models that output things that are more expected and not as interesting for me.

Anonymous
12/26/24(Thu)18:06:19 No.103656636

Anonymous 12/26/24(Thu)18:06:19 No.103656636

>>103656574
This but prune each expert to 1b first

Anonymous
12/26/24(Thu)18:06:44 No.103656642

Anonymous 12/26/24(Thu)18:06:44 No.103656642

>>103656633
What's your setup for running that?

Anonymous
12/26/24(Thu)18:07:58 No.103656653

Anonymous 12/26/24(Thu)18:07:58 No.103656653

>>103656631
NTA, but sauce for this?

Anonymous
12/26/24(Thu)18:10:10 No.103656685

Anonymous 12/26/24(Thu)18:10:10 No.103656685

>>103656436
I'm hoping for a 70B model specialized in coding.

Anonymous
12/26/24(Thu)18:10:49 No.103656692

Anonymous 12/26/24(Thu)18:10:49 No.103656692

>>103656588
>70B model
bold of you to assume they will have one of those
L4 will be 3B and 800B

Anonymous
12/26/24(Thu)18:11:36 No.103656702

Anonymous 12/26/24(Thu)18:11:36 No.103656702

>>103656692
If there is a god, L4 will 3B 30B and 300B

Anonymous
12/26/24(Thu)18:13:13 No.103656720

Anonymous 12/26/24(Thu)18:13:13 No.103656720

>>103656258
>50x cheaper
>Uuuhhh... What are you? Poor?
retard

Anonymous
12/26/24(Thu)18:16:47 No.103656755

Anonymous 12/26/24(Thu)18:16:47 No.103656755

I was trying to make an application with deepseek and it didn't work for a while, I was getting disappointed in it, until I realized I modified a json file the program needed and it was fucking everything up.
tfw dumber than an AI

Anonymous
12/26/24(Thu)18:17:53 No.103656764

Anonymous 12/26/24(Thu)18:17:53 No.103656764

>>103656702
If there is a god, L4 will be
>draft model
>fits in 5090 with draft model
>fits in 2x5090 with draft model
>free space

Anonymous
12/26/24(Thu)18:19:18 No.103656775

Anonymous 12/26/24(Thu)18:19:18 No.103656775

>>103656702
You WILL get 1B 3B and 400B and you WILL be sad

Anonymous
12/26/24(Thu)18:19:36 No.103656780

Anonymous 12/26/24(Thu)18:19:36 No.103656780

>>103656764
fuckin better not be
i was told 3090s would never be obsolete

Anonymous
12/26/24(Thu)18:20:26 No.103656790

Anonymous 12/26/24(Thu)18:20:26 No.103656790

>>103656780
Don't worry, your 3x3090 build will only be slower by factor 3.

Anonymous
12/26/24(Thu)18:20:28 No.103656791

Anonymous 12/26/24(Thu)18:20:28 No.103656791

>>103656780
It's ok you can run the 1B 2d2raft model

Anonymous
12/26/24(Thu)18:21:04 No.103656797

Anonymous 12/26/24(Thu)18:21:04 No.103656797

>>103649764
Hey OP, don't forget to mention in the news that DeepSeekV3 (the instruct one) got released too.

Anonymous
12/26/24(Thu)18:26:49 No.103656871

Anonymous 12/26/24(Thu)18:26:49 No.103656871

>>103656642
Here.
files.catbox.moe/4kzr9w.json
I also have a setting for COT which I think can be interesting in some situations but it takes some time to adapt to certain cards and it also makes replies slower so I stopped using it. If you'd like to try it out though here.
files.catbox.moe/nyvz0p.json
If a model does not do the <Thinking> thing, just prefill it in and it'll learn to do it after every reply. Same for if it doesn't give a response after thinking and ends the reply, just prefill with an asterisk or something random.

Credits to anons who I stole most of this from and did some revisions on.

Anonymous
12/26/24(Thu)18:28:12 No.103656890

Anonymous 12/26/24(Thu)18:28:12 No.103656890

>>103656871
I should have specified that I'm asking about hardware but I will try the thinking thing, thanks.

Anonymous
12/26/24(Thu)18:33:19 No.103656953

Anonymous 12/26/24(Thu)18:33:19 No.103656953

File: ComfyUI_temp_xhgsl_00050_.png (1.44 MB, 832x1216)

1.44 MB PNG

>>103652243
>electro magnet snow board
sick, that's a perfect late xmas gift anon
8 extra RAM sticks to run deepseek works too

Anonymous
12/26/24(Thu)18:34:12 No.103656961

Anonymous 12/26/24(Thu)18:34:12 No.103656961

looks like deepseek is popping off in aicg and aids. Guess I need to get me a damn server

Anonymous
12/26/24(Thu)18:35:29 No.103656971

Anonymous 12/26/24(Thu)18:35:29 No.103656971

>>103656961
>and aids
Do you think nobody can go and read that thread?

Anonymous
12/26/24(Thu)18:35:32 No.103656973

Anonymous 12/26/24(Thu)18:35:32 No.103656973

>>103656890
Oh I use a DDR5 rig with a 3090 and 3060 12G. IQ4_XS runs at like 3.7 t/s (0 context).

Anonymous
12/26/24(Thu)18:36:50 No.103656985

Anonymous 12/26/24(Thu)18:36:50 No.103656985

Wait why are they both named aicg now
>>>/vg/507722579
>>103654413

Anonymous
12/26/24(Thu)18:37:37 No.103656994

Anonymous 12/26/24(Thu)18:37:37 No.103656994

>>103656985
>now

Anonymous
12/26/24(Thu)18:38:16 No.103657000

Anonymous 12/26/24(Thu)18:38:16 No.103657000

>>103656994
Sorry if I haven't frequented aids in...8 months?

Anonymous
12/26/24(Thu)18:38:36 No.103657005

Anonymous 12/26/24(Thu)18:38:36 No.103657005

>>103656985
hi newfriend

Anonymous
12/26/24(Thu)18:38:53 No.103657009

Anonymous 12/26/24(Thu)18:38:53 No.103657009

>>103656985
aicg had a split off into a /g/ and a /vg/ thread months ago now aids is mostly unrelated

Anonymous
12/26/24(Thu)18:39:49 No.103657020

Anonymous 12/26/24(Thu)18:39:49 No.103657020

wasn't aids the /vg one?

Anonymous
12/26/24(Thu)18:40:11 No.103657028

Anonymous 12/26/24(Thu)18:40:11 No.103657028

>>103657009
why

Anonymous
12/26/24(Thu)18:40:47 No.103657034

Anonymous 12/26/24(Thu)18:40:47 No.103657034

damn deepseek is bland as fuck. chinks are soulless bug people so we will get local models that solve the Riemann hypothesis but can't hold an interesting conversation. we're never getting Claude at home

Anonymous
12/26/24(Thu)18:40:53 No.103657038

Anonymous 12/26/24(Thu)18:40:53 No.103657038

>>103657020
No, it's /vg/aicg.

Anonymous
12/26/24(Thu)18:41:07 No.103657042

Anonymous 12/26/24(Thu)18:41:07 No.103657042

>>103657028
stupid drama, it's aicg we're talking about
>>103657020
again aids is another thing from the /vg/aicg thread

Anonymous
12/26/24(Thu)18:41:23 No.103657045

Anonymous 12/26/24(Thu)18:41:23 No.103657045

>>103657034
skill issue of highest order

Anonymous
12/26/24(Thu)18:41:56 No.103657048

Anonymous 12/26/24(Thu)18:41:56 No.103657048

>>103657034
go to aicg and steal one of their JBs lol

Anonymous
12/26/24(Thu)18:44:14 No.103657080

Anonymous 12/26/24(Thu)18:44:14 No.103657080

>go to /vg/aicg
>ctrl-f "deepseek"
>33 mentions
>literally every single one of them is shitting on it
Wow!!!

Anonymous
12/26/24(Thu)18:50:36 No.103657155

Anonymous 12/26/24(Thu)18:50:36 No.103657155

>>103657020
/aids/ is the thinly veiled novelai general that's visited and kept alive by n.ai staff and n.ai imgen spam. It's not a place to actually talk about LLMs.

Anonymous
12/26/24(Thu)18:56:00 No.103657211

Anonymous 12/26/24(Thu)18:56:00 No.103657211

File: not spam!.png (84 KB, 669x1083)

84 KB PNG

>>103657080
Damn thing kept saying it was spam there is so many times you are wrong in just one thread.

Anonymous
12/26/24(Thu)18:57:16 No.103657227

Anonymous 12/26/24(Thu)18:57:16 No.103657227

>>103657211
See first reply in this thread.

Anonymous
12/26/24(Thu)18:58:03 No.103657237

Anonymous 12/26/24(Thu)18:58:03 No.103657237

>>103657227
I assume that was you?

Anonymous
12/26/24(Thu)18:58:39 No.103657247

Anonymous 12/26/24(Thu)18:58:39 No.103657247

>>103657155
Oh right that's the thread the offline-nc schizo trolls 24/7.

Anonymous
12/26/24(Thu)18:59:00 No.103657252

Anonymous 12/26/24(Thu)18:59:00 No.103657252

>>103657155
Yes, it becomes extremely obvious if you start mentioning anything besides n.ai. The general suddenly gets very active disparaging whatever you mentioned. NAI is a has-been service. Please nobody waste their money there. I'd be nicer if they actually had nice staff.

https://api-docs.deepseek.com/quick_start/pricing
lol, the chinks made it more expensive and pretend it's actually a "discounted price" right now. Never, ever trust API services.

Anonymous
12/26/24(Thu)18:59:26 No.103657259

Anonymous 12/26/24(Thu)18:59:26 No.103657259

>>103657237
I don't know what "that" is but the answer is no. I meant this >>103649767 the 9 replies limit is a known thing.

Anonymous
12/26/24(Thu)19:00:06 No.103657269

Anonymous 12/26/24(Thu)19:00:06 No.103657269

File: Untitled.png (68 KB, 1627x621)

68 KB PNG

>>103657227
you / he got instantly jumped on

Anonymous
12/26/24(Thu)19:01:07 No.103657276

Anonymous 12/26/24(Thu)19:01:07 No.103657276

>>103657259
>9 replies limit
But I have logs I posted a good deal longer than 9 replies? And im not the only one either.

Anonymous
12/26/24(Thu)19:01:34 No.103657281

Anonymous 12/26/24(Thu)19:01:34 No.103657281

>>103657276
On 4chan, dummy.

Anonymous
12/26/24(Thu)19:01:51 No.103657284

Anonymous 12/26/24(Thu)19:01:51 No.103657284

>>103657252
>Never, ever trust API services.
ah yes, the old rug pull

Anonymous
12/26/24(Thu)19:02:14 No.103657289

Anonymous 12/26/24(Thu)19:02:14 No.103657289

>>103657276
He means 9
>>postnumber
when posting on 4chan.

Anonymous
12/26/24(Thu)19:03:20 No.103657299

Anonymous 12/26/24(Thu)19:03:20 No.103657299

>>103657211
You just have to divide them in batches of 9 quotes.

Anonymous
12/26/24(Thu)19:04:03 No.103657308

Anonymous 12/26/24(Thu)19:04:03 No.103657308

>>103657252
lmao, doubled the input price and quadrupled the price for output tokens. Very nice.

Anonymous
12/26/24(Thu)19:04:07 No.103657310

Anonymous 12/26/24(Thu)19:04:07 No.103657310

>>103657252
Still pretty cheap, but yeah.

Anonymous
12/26/24(Thu)19:04:23 No.103657312

Anonymous 12/26/24(Thu)19:04:23 No.103657312

>do some RP with a card of an existing character from a moderately popular franchise using deepseek
>suddenly the character brings up another character from the same series that's not listed in the card and constructively uses details about them to compliment the situation
How I missed this with local models. That's why trivia knowledge about popular franchises is such a nice thing to have in a model.

Anonymous
12/26/24(Thu)19:04:39 No.103657315

Anonymous 12/26/24(Thu)19:04:39 No.103657315

>>103657155
>>103657252
The samefag attempt would work a bit better if you didn't use the same weird n.ai spelling.

Anonymous
12/26/24(Thu)19:04:58 No.103657321

Anonymous 12/26/24(Thu)19:04:58 No.103657321

>>103657312
just rag dude

Anonymous
12/26/24(Thu)19:06:02 No.103657330

Anonymous 12/26/24(Thu)19:06:02 No.103657330

>>103657252
>>103657284

That is the price for the 200B model, so yeah, it's a discount since their API is now using the 600B model.

Anonymous
12/26/24(Thu)19:06:37 No.103657338

Anonymous 12/26/24(Thu)19:06:37 No.103657338

>>103657312
That is the biggest positive for me. Claude was legit the ONLY other model that knew enough random triva about my fav fandoms to do spontaneous stuff like that. It brings stuff to life way better.

>>103657321
Even if we had 1M context you could not give it the same sort of understanding with RAG as training it on the entire fandom of something including ALL of its fanfiction.

Anonymous
12/26/24(Thu)19:06:57 No.103657341

Anonymous 12/26/24(Thu)19:06:57 No.103657341

>>103657315
He's not wrong. Are you denying that it's NAI shill central?

Anonymous
12/26/24(Thu)19:07:09 No.103657343

Anonymous 12/26/24(Thu)19:07:09 No.103657343

>>103657312
Yeah. This thing has amazing knowledge of dungeons and dragons material beyond the most recent stuff.
Really fucking dope.

Anonymous
12/26/24(Thu)19:07:36 No.103657348

Anonymous 12/26/24(Thu)19:07:36 No.103657348

>>103657341
>He
kek

Anonymous
12/26/24(Thu)19:08:17 No.103657354

Anonymous 12/26/24(Thu)19:08:17 No.103657354

>>103657312
I feel annoyed when this happens, because it's like the character isn't respecting MY canon.

Anonymous
12/26/24(Thu)19:08:56 No.103657365

Anonymous 12/26/24(Thu)19:08:56 No.103657365

The level of difficulty to run things in this space: Imagegen (easiest) < LLM <<< TTS

Anonymous
12/26/24(Thu)19:09:12 No.103657371

Anonymous 12/26/24(Thu)19:09:12 No.103657371

>>103657341
I think you need to get a fucking life dude
>>>/vg/507726019

Anonymous
12/26/24(Thu)19:09:17 No.103657375

Anonymous 12/26/24(Thu)19:09:17 No.103657375

>>103657354
I need the feeling of living in the actual world of my fandom, not some shitty stage resembling it like all smaller models do.

Anonymous
12/26/24(Thu)19:09:50 No.103657380

Anonymous 12/26/24(Thu)19:09:50 No.103657380

>>103656797
I also updated the param count.

>>103657359
>>103657359
>>103657359

Anonymous
12/26/24(Thu)19:10:27 No.103657390

Anonymous 12/26/24(Thu)19:10:27 No.103657390

>>103657371
>no counter arguments
So are you denying it or not?

Anonymous
12/26/24(Thu)19:11:41 No.103657406

Anonymous 12/26/24(Thu)19:11:41 No.103657406

>>103657371
Kek

Anonymous
12/26/24(Thu)19:12:11 No.103657412

Anonymous 12/26/24(Thu)19:12:11 No.103657412

>>103657390
Why does NovelAI keep living rent free in your head? You localtards have been insecure about it since the start.

Anonymous
12/26/24(Thu)19:13:49 No.103657440

Anonymous 12/26/24(Thu)19:13:49 No.103657440

>>103657412
Disappoint strikes deep anon, you would know if your dad ever came back with cigs

Anonymous
12/26/24(Thu)19:14:33 No.103657452

Anonymous 12/26/24(Thu)19:14:33 No.103657452

>talking to himself

Anonymous
12/26/24(Thu)19:14:54 No.103657457

Anonymous 12/26/24(Thu)19:14:54 No.103657457

>>103657412
This is kind of the thing. NovelAI is not relevant and will likely never be relevant again. I don't understand the appeal of spending all day every day shitting up multiple generals to copy paste the same post over and over.
Schizokun - nobody here gives a fuck about your mortal enemy. Either make an ontopic post or go somewhere else.

Anonymous
12/26/24(Thu)19:16:16 No.103657473

Anonymous 12/26/24(Thu)19:16:16 No.103657473

>>103657412
Someone said "deepseek is popping off in aicg and aids" when he actually meant "aicg and vg/aicg" and he was rightfully informed that aids is a fake general meant to shill a singular product where no discussion takes place.

Anonymous
12/26/24(Thu)19:18:12 No.103657494

Anonymous 12/26/24(Thu)19:18:12 No.103657494

>>103657473
that someone was 100% you

Anonymous
12/26/24(Thu)19:19:45 No.103657511

Anonymous 12/26/24(Thu)19:19:45 No.103657511

>>103657494
There are, in fact, multiple people that can see /aids/ for what it really is.

Anonymous
12/26/24(Thu)19:25:57 No.103657578

Anonymous 12/26/24(Thu)19:25:57 No.103657578

Any anons actually running Deepseek on a home setup?

Anonymous
12/26/24(Thu)19:38:56 No.103657698

Anonymous 12/26/24(Thu)19:38:56 No.103657698

>>103657578
Not me, don't ask me anything.

Anonymous
12/26/24(Thu)19:45:08 No.103657762

Anonymous 12/26/24(Thu)19:45:08 No.103657762

>>103657698
What is your shirt size?

Anonymous
12/26/24(Thu)19:45:24 No.103657763

Anonymous 12/26/24(Thu)19:45:24 No.103657763

>>103657578
I'm trying, but adapting their torchrun generate.py script to cpu is a PITA. Lots of CUDA stuff baked in

Anonymous
12/26/24(Thu)20:14:13 No.103658020

Anonymous 12/26/24(Thu)20:14:13 No.103658020

>>103649866
cpumaxx 4800 ddr5 ram 384ram 9334 dual cpus with a 1x 4090 is snailpace, requiring me to alt tab while I wait for it to process my ?eva 3.3 model 6bit which was 59gb size. I mean it works and is powerful but, for the goon on the run, look towards just gpus and cope.

I have 2 4090s but yet to test both atm I just say cpumaxx only if you are patient (you arent) as it streams at a snails pace.

Anonymous
12/26/24(Thu)20:16:16 No.103658037

Anonymous 12/26/24(Thu)20:16:16 No.103658037

>>103658020
deepseekv3? What inference engine are you using for cpu?

Anonymous
12/26/24(Thu)20:36:58 No.103658245

Anonymous 12/26/24(Thu)20:36:58 No.103658245

>>103658037
I had only tried the 3.3 model in the gguf version (forgot the full name) that was shilled here in the last week and haven't had time to do much with any other tests,

I plan to test more and shitpost about it here later like deepseek etc but Im a wagecuck and I can't even get time to setup my server properly yet

Anonymous
12/26/24(Thu)20:51:02 No.103658378

Anonymous 12/26/24(Thu)20:51:02 No.103658378

>>103657762
G

Anonymous
12/26/24(Thu)20:51:19 No.103658381

Anonymous 12/26/24(Thu)20:51:19 No.103658381

>>103649866
Its by far the best local yet and is close to claude. That said a better model could come out next month from Meta so who knows.

Anonymous
12/26/24(Thu)21:11:31 No.103658562

Anonymous 12/26/24(Thu)21:11:31 No.103658562

>>103658381
>close
Is it really still just close?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.