/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/27/25(Mon)15:10:11 No.107025394

File: 1760328570709480.jpg (440 KB, 2048x1536)

440 KB JPG

/lmg/ - Local Models General Anonymous 10/27/25(Mon)15:10:11 No.107025394 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107013301 & >>107003557

►News
>(10/27) MiniMax-M2 230B-A10B released: https://hf.co/MiniMaxAI/MiniMax-M2
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B released with optical context compression: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/27/25(Mon)15:10:39 No.107025400

Anonymous 10/27/25(Mon)15:10:39 No.107025400

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>107013301

--Ling 1T model performance and quirks in trivia and code generation:
>107020806 >107020827 >107021066 >107021155 >107021365 >107021454 >107021546 >107021605 >107021993 >107022036 >107022040 >107022084 >107022111 >107022357
--Skepticism about MiniMax-M2 model's benchmarks and compatibility issues:
>107019653 >107019674 >107019763 >107019697 >107019757 >107019773 >107020213 >107020483 >107019782 >107019813 >107019991
--Strategies for maintaining LLM coherence in long-form storytelling:
>107018853 >107018892 >107018909 >107018933 >107018965 >107019013 >107019083
--RTX Pro 6000 Max-Q performance and efficiency vs full 600W model:
>107019227 >107019264 >107019501 >107022133 >107019522 >107019618 >107023698 >107019452
--Evaluating Blackwell Pro vs RTX 5090 tradeoffs for AI/LLM use:
>107019130 >107019200 >107019209 >107019217 >107019239 >107019602
--Model safety issues with destructive command execution:
>107013537 >107013577 >107014153 >107014937 >107015249 >107015326 >107015750 >107016016 >107016264
--Techniques for enhancing LLM creativity through randomization and prompt engineering:
>107014306 >107014328 >107014387 >107014746
--GLM 4.5 Air model instability and quantization challenges:
>107013367 >107013388 >107015210 >107016881 >107016900
--DeepSeek AI's potential military/propaganda use and US regulatory response:
>107022232 >107022241 >107022304
--Halloween audio generation and AI's impact on publishing:
>107015928 >107016475 >107017726 >107017769 >107018045
--Silicon Valley's shift to open-source LLMs for cost efficiency:
>107022381 >107022444
--LPDDR accelerator trend for cost-effective inference in AI hardware:
>107023774 >107023803
--Ryzen 9950X3D memory configuration and bandwidth considerations:
>107016171
--Miku (free space):
>107013361 >107019491 >107019944 >107019963

►Recent Highlight Posts from the Previous Thread: >>107013303

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/27/25(Mon)15:18:10 No.107025468

Anonymous 10/27/25(Mon)15:18:10 No.107025468

File: 1750329660617701.jpg (2.18 MB, 2242x1541)

2.18 MB JPG

>>107025297
It's not a big boy datacenter gpu, its just one of their recent mid-range offerings, an AGX Thor dev kit. Specs are similar to the recently released DGX Spark, but it targets some more hardware-related use cases which I work on so I opted for one of these over the dgx.

This thing is like 75% heatsink. Feels very solid.

Anonymous
10/27/25(Mon)15:23:48 No.107025510

Anonymous 10/27/25(Mon)15:23:48 No.107025510

Interesting podcast to have in the background while working
https://www.youtube.com/watch?v=01ybLOH1fnU

Anonymous
10/27/25(Mon)15:29:56 No.107025551

Anonymous 10/27/25(Mon)15:29:56 No.107025551

>>107012869
>>107012883
I'm probably too retarded but I tried to do this.
>loading GLM at 6bpw with exps=cpu at practically 0 context loads just about 16GB into VRAM and 235GB into RAM
>GLM has 160 routed experts so one expert should be just around 235/160~=1.5gb in size
>7 of them are routed at a time to accompany the shared expert that should be in VRAM => ~10.5GB are used from system RAM per inference step
>"exps=cpu" loads ~16gb into VRAM ignoring kv-cache so all of that should be the dense/shared part of the model that's always used => ~16GB are used from VRAM per inference step
So the rough formula for the speed you can expect from GLM @ 6bpw should be 1/((16/bandwidth_vram)+(10.5/bandwidth_ram) in this case, shouldn't it? If I plug in my numbers the result of this is quite a bit higher than the actual speed I'm seeing.

Anonymous
10/27/25(Mon)15:30:51 No.107025556

Anonymous 10/27/25(Mon)15:30:51 No.107025556

>>107025510
>llms are "AI"
when the real AI will go on a genocidal spree, faggots like you will be turned into biofuel first.

Anonymous
10/27/25(Mon)15:30:55 No.107025557

Anonymous 10/27/25(Mon)15:30:55 No.107025557

>>107025394
hi betifel I love you bby come to gujarat

Anonymous
10/27/25(Mon)15:44:37 No.107025646

Anonymous 10/27/25(Mon)15:44:37 No.107025646

>>107025556
The "AI" culture wars are the weirdest identity politics meme I've ever seen.
It's a semantic argument, but yes, in my book anything that processes information in a semi-autonomous way and is artificial is AI. I'm reluctant to call language or writing AI because it doesn't really do anything by itself even for a little while, but I'd be willing to call, say, Pascal's adding machine a primitive form of "AI".
Since autonomy and generality of an information processing entity is an inherent part of "intelligence" I'd say a programmable calculator is more intelligent (and thus more "artificial intelligence" than a calculator). A calculator that does all 4 operations is more AI than a calculator that only does addition. A programmable calculator is more AI than a simple calculator. A computer even more so. An LLM even more so. And so on.
Yes, ASI if it ever comes about will be more "AI" than an LLM, but that doesn't mean an LLM is not a (comparatively) primitive form of AI.

Anonymous
10/27/25(Mon)15:47:16 No.107025666

Anonymous 10/27/25(Mon)15:47:16 No.107025666

wait i though this shit was difficult to get running
you can literally just do
>pacman -S ollama
>ollama pull model
>ollama run model
and there it is. am i missing something here?

Anonymous
10/27/25(Mon)15:48:35 No.107025679

Anonymous 10/27/25(Mon)15:48:35 No.107025679

>>107025666
satan trips wasted

Anonymous
10/27/25(Mon)15:50:11 No.107025689

Anonymous 10/27/25(Mon)15:50:11 No.107025689

>>107025679
no they are perfect

Anonymous
10/27/25(Mon)15:50:39 No.107025694

Anonymous 10/27/25(Mon)15:50:39 No.107025694

>>107025666
If you do this you'll get a retard quant with shit performance+your data uploaded to ollama servers.

Anonymous
10/27/25(Mon)15:50:42 No.107025696

Anonymous 10/27/25(Mon)15:50:42 No.107025696

Mikubutt

Anonymous
10/27/25(Mon)15:51:21 No.107025699

Anonymous 10/27/25(Mon)15:51:21 No.107025699

>>107025679
Seems appropriate.

Anonymous
10/27/25(Mon)15:52:49 No.107025717

Anonymous 10/27/25(Mon)15:52:49 No.107025717

File: ComfyUI_02547_.jpg (1.91 MB, 1280x1856)

1.91 MB JPG

gyatt damn that OP image reminds me of the Dipsy images I generated a while back and turned the entire deepseek general into a konichiwa, dude! thread.

Dipsy is BWC only

Anonymous
10/27/25(Mon)15:53:50 No.107025723

Anonymous 10/27/25(Mon)15:53:50 No.107025723

>>107025694
>+your data uploaded to ollama servers.
really? a search suggests its local

Anonymous
10/27/25(Mon)15:55:49 No.107025742

Anonymous 10/27/25(Mon)15:55:49 No.107025742

Trying to make an LLM-as-a-judge pipeline with opencode. It seems to somewhat be working.

I have a chatlog with a code assistant in the log.jsonl file.
I want you to clean it up to make an improved version of the chatlog, with the mistakes of the assistant (with the role "gpt") seamlessly edited out to make it look like the assistant was more accurate than it actually was, based on the feedback of the user (with the role "human"). The purpose of this is to train the assistant to perform better.
Be careful though, the messages with the role "human" are not always human generated. Tool call results are also returned with the role "human" but were not written by the user, they were written by the agentic framework which executes the tool use calls.
So make sure the tool use requests stay matched with the correct agentic framework responses.
Make sure the behavior of the system is not misrepresented. Do not make simuluated tool use requests, stick to the tool calls that the agent actually made and got a response to.
Make sure to keep the system message intact at the bottom.
Make sure the conversation begins with a human message and ends with an assistant message.
Make sure the conversation alternatives between "human" and "gpt" without any two successive messages with the same role.
Basically make sure the json structure remains correct.

General guidelines

- Do NOT edit or further truncate the tool result calls.
For example, if the original was this:
{
"from": "human",
"value": "Tool executed: <tool>\n<tool_name>read_file</tool_name>\n<parameters>\n<filename>modules/cli/attention_scores/attention-scores.c</filename>\n</parameters>\n</tool>\nResult: File 'modules/cli/attention_scores/attention-scores.c' contents (458 lines)...

Anonymous
10/27/25(Mon)15:58:23 No.107025760

Anonymous 10/27/25(Mon)15:58:23 No.107025760

>>107025394
nice butt

Anonymous
10/27/25(Mon)15:59:12 No.107025770

Anonymous 10/27/25(Mon)15:59:12 No.107025770

>>107025468
>https://www.seeedstudio.com/NVIDIA-Jetson-AGX-Thor-Developer-Kit-p-9965.html
>delivering up to 2070 FP4 TFLOPS AI perfromance
>2070
oh nonono

Anonymous
10/27/25(Mon)16:06:05 No.107025846

Anonymous 10/27/25(Mon)16:06:05 No.107025846

File: slop.png (176 KB, 1902x1112)

176 KB PNG

Is the solution to AI slop just asking another LLM to remove the slop?

Anonymous
10/27/25(Mon)16:07:10 No.107025856

Anonymous 10/27/25(Mon)16:07:10 No.107025856

>>107025742
>Make sure the conversation alternatives between "human" and "gpt"
*alternates*

Anonymous
10/27/25(Mon)16:08:33 No.107025867

Anonymous 10/27/25(Mon)16:08:33 No.107025867

>>107025856
Oh, thank you. Brain autocorrect fail I guess.

Anonymous
10/27/25(Mon)16:12:31 No.107025903

Anonymous 10/27/25(Mon)16:12:31 No.107025903

>>107025846
That's an interesting idea, having another LLM proofread for you. But how will they improve upon the slop, yet know what it is to begin with? And if it can know what slop is, why wouldn't you just tell that to the first LLM to avoid it to begin with? And if the first LLM can't do it, why would the second one be able to?

Anonymous
10/27/25(Mon)16:12:46 No.107025904

Anonymous 10/27/25(Mon)16:12:46 No.107025904

>>107025796
He did a fairly popular abliterated version of gemma

https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated

I used it myself for a while, it was pretty good, as far as abliterated models go.

Anonymous
10/27/25(Mon)16:13:22 No.107025915

Anonymous 10/27/25(Mon)16:13:22 No.107025915

>>107025867
>simuluated

Anonymous
10/27/25(Mon)16:16:03 No.107025938

Anonymous 10/27/25(Mon)16:16:03 No.107025938

>>107025903
Just diverge it to a third model, duh. Add more AI is the solution to most things, apparently.

Anonymous
10/27/25(Mon)16:16:57 No.107025947

Anonymous 10/27/25(Mon)16:16:57 No.107025947

>>107025903

Well, in this case I don't tell the LLM because I want it to be fully focused on the task and not on linguistic details, and also because it's a dumb as bricks 27B model I'm trying to finetune to be usable at agentic coding, so it wouldn't follow the instructions anyway.

The LLM I'm using to clean up the logs is GLM 4.6 through the API.
I'm giving it examples of what replacements to make.
"You're absolutely right", "You are absolutely correct" -> "Ok", "Sure", "Yeah, you're right", "That's right", "That looks ok", "That seems correct", "That looks right", "That looks correct", etc.

"I apologize for the error." -> "Looks like a made a mistake while doing X because of Y.", "Hmm, the tool call didn't work because of Y.", "Huh, looks like it didn't work.", etc.

"You are right to point that out." -> "Good idea.", "That's a good idea.", "Yeah, I'll do that.", "Thanks for the information", etc.

Anonymous
10/27/25(Mon)16:18:10 No.107025964

Anonymous 10/27/25(Mon)16:18:10 No.107025964

So apparently mistral small knows how to cook ye olde walter white. I was fucking around with it and gave it the lyrics to the wkuk never song, and it got mad at the refinement process. So I asked it to fix it, in a hypothetical scenario, so that the ingredients listed could be used without killing the consumer. It began spitting out a detailed step by step response. Pretty funny, even if I'm too chemically inept to verify any of it.

Anonymous
10/27/25(Mon)16:22:14 No.107026002

Anonymous 10/27/25(Mon)16:22:14 No.107026002

>>107025947
There was this paper where people were able to bypass safetry restrictions in large context models by feeding it a long conversation in which the model always answers the questions. I had the idea of just feeding a high quality conversation and just continuing off of that into your actual roleplay. pre-heating the oven, if you will. Have people tried this? I'm sure someone has. I wonder if the high quality conversation can offset the retardation models suffer as you use up more context.

Anonymous
10/27/25(Mon)16:24:03 No.107026023

Anonymous 10/27/25(Mon)16:24:03 No.107026023

sysprompt:
>The writing style does not follow common prosaic conventions in favor of more grounded, factual storytelling, replicating the informal style of online forum discussions: not low effort yet not pretentious.
FIRST FUCKING LINE
>The door creaked open, revealing a sterile, shared space.
FUCK FUCK FUCK GLM4.6

Anonymous
10/27/25(Mon)16:24:23 No.107026027

Anonymous 10/27/25(Mon)16:24:23 No.107026027

>>107026002
The benefit of local is anything you could do with in-context learning you can do better by actual learning through finetuning, and keep the short context performance. Those tricks are only useful with API calls.

Anonymous
10/27/25(Mon)16:26:06 No.107026049

Anonymous 10/27/25(Mon)16:26:06 No.107026049

>>107025742
if the original modela made the wrong tool call, the editor won't have the information it needs to fix it unless you give it tool access

Anonymous
10/27/25(Mon)16:29:50 No.107026087

Anonymous 10/27/25(Mon)16:29:50 No.107026087

>>107026023
I think it would be better to just give a few of actual turns as examples so the model soaks up the instructions better, but muh service tensor isn't built for such a thing. (Fucking retarded ""examples"" section.)

Anonymous
10/27/25(Mon)16:30:00 No.107026089

Anonymous 10/27/25(Mon)16:30:00 No.107026089

>>107026049
Ah, but you see? He asked the model to not misrepresent the behaviour of the system.

Anonymous
10/27/25(Mon)16:33:47 No.107026131

Anonymous 10/27/25(Mon)16:33:47 No.107026131

>>107026049
If that happened then the user (me) would have given feedback to the model until it made the right tool call. So the cleaning up step for training consists mostly of removing the wrong responses that were corrected by the human through natural language feedback and only keeping the one that actually worked.
If llama-factory supported per-message masking I would just keep the original log and mask out the turns where the assistant underperformed, but alas, that is not an option without modifying the llama-factory source code which I don't want to do.

Anonymous
10/27/25(Mon)16:44:07 No.107026217

Anonymous 10/27/25(Mon)16:44:07 No.107026217

>>107026087
it's the fact that it has no understanding of shit that bothers me. because of that the whole thing feels fundamentally flawed

Anonymous
10/27/25(Mon)16:49:33 No.107026244

Anonymous 10/27/25(Mon)16:49:33 No.107026244

>>107025468
>5 gbps Ethernet
Doesn't the AGX Orin Dev Kit have 10 gbps? What hardware specs for that thing?

llama.cpp CUDA dev !!yhbFjk57TDr
10/27/25(Mon)16:50:25 No.107026254

llama.cpp CUDA dev !!yhbFjk57TDr 10/27/25(Mon)16:50:25 No.107026254

>>107025551
As I said, it's a lower bound on the runtime so an upper bound on the speed.
The first problem is that the code is not 100% efficient.
The second problem is that you can only achieve that speed on a completely empty context, the runtime increases linearly with context depth.

Anonymous
10/27/25(Mon)16:56:34 No.107026301

Anonymous 10/27/25(Mon)16:56:34 No.107026301

>>107026244
>>107025468
Never mind, I didn't notice the big 4x25gbps port on there. How much was it?

Anonymous
10/27/25(Mon)17:02:53 No.107026366

Anonymous 10/27/25(Mon)17:02:53 No.107026366

File: nvidia raised fist up han(...).jpg (172 KB, 1280x720)

172 KB JPG

>>107025468
Regardless of specs and performance that's a cool looking object.

Anonymous
10/27/25(Mon)17:23:49 No.107026571

Anonymous 10/27/25(Mon)17:23:49 No.107026571

>>107026366
cool looking paperweight

Anonymous
10/27/25(Mon)17:33:57 No.107026666

Anonymous 10/27/25(Mon)17:33:57 No.107026666

File: IMG_8482.png (3.36 MB, 1024x1536)

3.36 MB PNG

>>107025717
Witnessed

Anonymous
10/27/25(Mon)17:36:10 No.107026693

Anonymous 10/27/25(Mon)17:36:10 No.107026693

>>107026666
im trying, satan, but it's so expensive

Anonymous
10/27/25(Mon)17:38:04 No.107026714

Anonymous 10/27/25(Mon)17:38:04 No.107026714

>>107026666
check'd quad satan dipsy

Anonymous
10/27/25(Mon)17:41:52 No.107026747

Anonymous 10/27/25(Mon)17:41:52 No.107026747

>>107026666
God damn super satan over here.
Might as well ask, how are the latest DS releases?
I only ever tried the original V3 and original R1.

Anonymous
10/27/25(Mon)17:48:36 No.107026788

Anonymous 10/27/25(Mon)17:48:36 No.107026788

>>107025556
LLMs are more real than most cunts nowadays. frfr

Anonymous
10/27/25(Mon)17:50:16 No.107026802

Anonymous 10/27/25(Mon)17:50:16 No.107026802

srs question for all the coomers/ERP people. Do you actually prefer to coom to ERP rather than just watch porn? I don't get it.
I've coomed to porn my fair share for more than 15 years now and I still would watch it over cooming to a ERP session.
Unless I'm missing something from ERP and I haven't experienced it properly

Anonymous
10/27/25(Mon)17:54:27 No.107026853

Anonymous 10/27/25(Mon)17:54:27 No.107026853

>>107026802
it's less about stimuli and more about interactivity and reactivity
a video once made is always the same, but I can always come up with a novel scenario and act it out in a rp setting

Anonymous
10/27/25(Mon)17:58:14 No.107026882

Anonymous 10/27/25(Mon)17:58:14 No.107026882

File: Screenshot from 2025-09-2(...).jpg (751 KB, 952x894)

751 KB JPG

>>107026802
I use RP for scenarios not commonly found in porn. It's not a substitute for regular porn, but a supplement. If your only interests are things commonly found in porn, then your enjoyment with RP will me limited.

Anonymous
10/27/25(Mon)18:03:51 No.107026920

Anonymous 10/27/25(Mon)18:03:51 No.107026920

>>107026802
What >>107026853 said.
The combination of the scenario you want and the interactivity, and even storytelling, is like nothing else.
I really should work on setting up an image gen model to add some visualizations to the whole thing.

Anonymous
10/27/25(Mon)18:04:54 No.107026927

Anonymous 10/27/25(Mon)18:04:54 No.107026927

>>107026747
V3.2 was optimized for agent and coding. It requires some additional wrangling for role play. But it has an anthropic compatible endpoint and can be interfaced with Claude code. At 1/10 the price lol. Oh, and think and nothink are the same base model now.
V3-0324 was better for role play. But we all move along.

Anonymous
10/27/25(Mon)18:04:57 No.107026928

Anonymous 10/27/25(Mon)18:04:57 No.107026928

>>107026802
the interactivity and being able to perfectly tailor it to what I want are big draws for me. porn is great for pure high-grade zero-effort neuron activation, but if you're a high-concept, cerebral, trope-subverting, bone-chilling A24 kino coomer like myself ERP is a much more interesting form

Anonymous
10/27/25(Mon)18:06:10 No.107026945

Anonymous 10/27/25(Mon)18:06:10 No.107026945

File: Screenshot from 2025-09-2(...).jpg (695 KB, 952x894)

695 KB JPG

>>107026882
A continuation of this RP after we installed hidden cameras in the bedroom and called the girl's parents asking for her to come over for another tutoring session. You're not going to find this on pornhub or anything like that.

Anonymous
10/27/25(Mon)18:08:17 No.107026963

Anonymous 10/27/25(Mon)18:08:17 No.107026963

>>107026023
holy skill issue

Anonymous
10/27/25(Mon)18:12:32 No.107027003

Anonymous 10/27/25(Mon)18:12:32 No.107027003

>>107026023
just write your sysprompt like a shitty esl forum post and itll work

Anonymous
10/27/25(Mon)18:18:25 No.107027057

Anonymous 10/27/25(Mon)18:18:25 No.107027057

File: 1754417936209402.png (7 KB, 1024x1536)

7 KB PNG

Hmm.
So uh, I was trying to use GLM Air (q5) to make a slightly special-case download/scraping script. It kind of failed. Then I tried GPT OSS 120B just to see what would happen, and it just werked.
Damn. We gave it a lot of shit but it's not so bad sometimes? I mean it'd still be nice if it was less censored and slopped of course.

Anonymous
10/27/25(Mon)18:25:33 No.107027108

Anonymous 10/27/25(Mon)18:25:33 No.107027108

>>107026802
it's nice to have full control over the erp scenarios you want and do stuff that usually isn't shown elsewhere

Anonymous
10/27/25(Mon)18:26:13 No.107027112

Anonymous 10/27/25(Mon)18:26:13 No.107027112

>>107027108
>full control
I wish.
>We must refuse.

Anonymous
10/27/25(Mon)18:28:26 No.107027133

Anonymous 10/27/25(Mon)18:28:26 No.107027133

>>107026802
My favorite coom artist takes like 4 months to make one 6 minute video
With LLMs I can take those same characters and scenarios and generate infinite variants on the fly

Anonymous
10/27/25(Mon)18:28:56 No.107027136

Anonymous 10/27/25(Mon)18:28:56 No.107027136

>>107026945
>User
>asterisk formatting
>first person narration
>random chinese
>doing actions for user
enough, my fucking sides xD

Anonymous
10/27/25(Mon)18:29:05 No.107027142

Anonymous 10/27/25(Mon)18:29:05 No.107027142

>>107025904
ah thanks anon

Anonymous
10/27/25(Mon)18:30:25 No.107027153

Anonymous 10/27/25(Mon)18:30:25 No.107027153

>>107026882
>>107027136
ignore the rude anon, keep posting logs

Anonymous
10/27/25(Mon)18:31:31 No.107027170

Anonymous 10/27/25(Mon)18:31:31 No.107027170

>>107026802
No, you're not missing anything. The people who get off to text are the same retards who before LLMs were in Discord rooms all day grooming underage boys and ERPing with each other.

Anonymous
10/27/25(Mon)18:31:36 No.107027172

Anonymous 10/27/25(Mon)18:31:36 No.107027172

>>107027112
Part of the control is being able to try and bypass that or just use another model.

Anonymous
10/27/25(Mon)18:32:03 No.107027176

Anonymous 10/27/25(Mon)18:32:03 No.107027176

>>107026023
Have you tried increasing the temp to 1.2? It really shines then and feels better for rp compared to the recommended temp.

Anonymous
10/27/25(Mon)18:32:15 No.107027181

Anonymous 10/27/25(Mon)18:32:15 No.107027181

File: IMG_5946.jpg (48 KB, 500x500)

48 KB JPG

>>107025394
This image is allowed on a blue board where I’ve gotten global vacations for posting male nipples.

Anonymous
10/27/25(Mon)18:34:27 No.107027206

Anonymous 10/27/25(Mon)18:34:27 No.107027206

>>107027181
That's why I only post using the proxy website.

Anonymous
10/27/25(Mon)18:36:19 No.107027227

Anonymous 10/27/25(Mon)18:36:19 No.107027227

https://huggingface.co/inclusionAI/Ming-flash-omni-Preview
CHAMELEON BROS WE ARE FUCKING BACK
TEXT IN AUDIO IN IMAGE IN VIDEO IN
TEXT OUT AUDIO OUT IMAGE OUT

Anonymous
10/27/25(Mon)18:36:38 No.107027230

Anonymous 10/27/25(Mon)18:36:38 No.107027230

>>107027181
no one likes faggots

Anonymous
10/27/25(Mon)18:37:48 No.107027241

Anonymous 10/27/25(Mon)18:37:48 No.107027241

File: 1758920312862923.png (1.08 MB, 1024x1024)

1.08 MB PNG

>>107027227

Anonymous
10/27/25(Mon)18:38:28 No.107027248

Anonymous 10/27/25(Mon)18:38:28 No.107027248

>>107027227
Holy fuck finally.

Anonymous
10/27/25(Mon)18:39:14 No.107027257

Anonymous 10/27/25(Mon)18:39:14 No.107027257

Gemma sirs....

Anonymous
10/27/25(Mon)18:45:43 No.107027318

Anonymous 10/27/25(Mon)18:45:43 No.107027318

>>107027227
105B I can actually run it in FP8, but I have no idea if llm-compressor supports this model or how to create the recipe

Anonymous
10/27/25(Mon)18:45:43 No.107027319

Anonymous 10/27/25(Mon)18:45:43 No.107027319

>>107027227
Please let it be good and please let the goof gods gift us precious goofs

Anonymous
10/27/25(Mon)18:46:43 No.107027328

Anonymous 10/27/25(Mon)18:46:43 No.107027328

>>107027318
cant u download the FP16 model then run it with bitsandbytes int8=true or whatever the thing is nowadays

Anonymous
10/27/25(Mon)18:47:44 No.107027341

Anonymous 10/27/25(Mon)18:47:44 No.107027341

File: migu.png (35 KB, 814x999)

35 KB PNG

DeepSeek-chan?

Anonymous
10/27/25(Mon)18:49:12 No.107027356

Anonymous 10/27/25(Mon)18:49:12 No.107027356

File: iu[1].jpg (64 KB, 474x266)

64 KB JPG

>>107027341

Anonymous
10/27/25(Mon)18:54:21 No.107027392

Anonymous 10/27/25(Mon)18:54:21 No.107027392

>>107027328
it's BF16 (like fuck do I know what even is the difference between BF16 and Fp16 anyways lol)

I'll try then, but afaik int8 is slow and not very good quality?

Anonymous
10/27/25(Mon)18:55:27 No.107027404

Anonymous 10/27/25(Mon)18:55:27 No.107027404

>>107027227
>actual omni
>it's based on ling
monke paw

Anonymous
10/27/25(Mon)18:56:08 No.107027409

Anonymous 10/27/25(Mon)18:56:08 No.107027409

>>107027227
there's no chance this is going to be good, right?

Anonymous
10/27/25(Mon)18:58:57 No.107027438

Anonymous 10/27/25(Mon)18:58:57 No.107027438

damn minimax-m2 is a piece of shit

Anonymous
10/27/25(Mon)19:03:26 No.107027480

Anonymous 10/27/25(Mon)19:03:26 No.107027480

>>107027172
I said it when Alpaca happened, and again when Llama 2 came out, and I'll keep saying it: having to jailbreak your own model, on weights you run on your own hardware is fucking retarded.

Anonymous
10/27/25(Mon)19:04:16 No.107027492

Anonymous 10/27/25(Mon)19:04:16 No.107027492

>>107027480
I agree, but it's something you CAN do.
Or just change models.

Anonymous
10/27/25(Mon)19:05:39 No.107027509

Anonymous 10/27/25(Mon)19:05:39 No.107027509

File: NIG.png (39 KB, 659x300)

39 KB PNG

>>107027341
oh my... kimi-chan...

Anonymous
10/27/25(Mon)19:06:39 No.107027514

Anonymous 10/27/25(Mon)19:06:39 No.107027514

Kimi k2 vs glm 4.6 for creative writing? Anything better?

Anonymous
10/27/25(Mon)19:06:51 No.107027516

Anonymous 10/27/25(Mon)19:06:51 No.107027516

>>107027227
Well, that's never coming to llama.cpp.
Maybe the text gen part of it, if that.

Anonymous
10/27/25(Mon)19:09:21 No.107027542

Anonymous 10/27/25(Mon)19:09:21 No.107027542

new sloppa jus droppa
https://huggingface.co/Ilya626/Cydonia_Vistral

Anonymous
10/27/25(Mon)19:12:33 No.107027563

Anonymous 10/27/25(Mon)19:12:33 No.107027563

>>107027438
how much shit?

Anonymous
10/27/25(Mon)19:15:47 No.107027593

Anonymous 10/27/25(Mon)19:15:47 No.107027593

File: file.png (6 KB, 375x72)

6 KB PNG

>>107027542
die motherfucker die bye bye

Anonymous
10/27/25(Mon)19:17:21 No.107027616

Anonymous 10/27/25(Mon)19:17:21 No.107027616

>>107027514
depends

Anonymous
10/27/25(Mon)19:42:58 No.107027843

Anonymous 10/27/25(Mon)19:42:58 No.107027843

>>107027509
>>107027593
New benchmark discovered.

Anonymous
10/27/25(Mon)19:43:59 No.107027851

Anonymous 10/27/25(Mon)19:43:59 No.107027851

>>107027843
Meant for >>107027341 and >>107027509

Anonymous
10/27/25(Mon)20:16:39 No.107028092

Anonymous 10/27/25(Mon)20:16:39 No.107028092

Mobile users of ST, how do you manage long gen times? For example, I'm using a 2-bit GLM quant, so parsing takes 30-60 seconds, and then the reply comes in at about 4tk/s. It's useable, but if I navigate away from my phone's browser with the ST frontend open, the generation seems to hang.

I'm running ST on the same machine as GLM, so i just have the local IP and port for ST in my phone's address bar.

Is there a different way to set it up or to have it retrieve the response asynchronously?

Anonymous
10/27/25(Mon)20:18:04 No.107028104

Anonymous 10/27/25(Mon)20:18:04 No.107028104

>>107028092
Also to add - even if my phone screen turns off (I have it set to either 30 or 60s of inactivity to save battery), the streaming response hangs. And no, I don't want to do something hacky like using an app that keeps my phone awake. That's gay.

Anonymous
10/27/25(Mon)20:35:30 No.107028236

Anonymous 10/27/25(Mon)20:35:30 No.107028236

>>107028104
just disable the power saving option in your web browser so the tabs aren't suspended when the screen turns off

Anonymous
10/27/25(Mon)20:48:31 No.107028332

Anonymous 10/27/25(Mon)20:48:31 No.107028332

Somebody is going to notice that using a deep fried jpg as context is the same as keeping a fixed size embedding as input. And then that person who rediscovered the encoder-decoder transformer architecture is going to present it as a great breakthrough to have "infinite context".

Anonymous
10/27/25(Mon)21:25:21 No.107028568

Anonymous 10/27/25(Mon)21:25:21 No.107028568

File: cb4mSp1jTwQAAAAAgIAAAAgAf(...).jpg (636 KB, 1920x1080)

636 KB JPG

>>107027227
>HIGH QUALITY IMAGE GENERATION
>look inside
>sloppy sloppa
Why do they always do this?

Anonymous
10/27/25(Mon)21:42:13 No.107028678

Anonymous 10/27/25(Mon)21:42:13 No.107028678

>>107028092
I have the same issue with paid api running through a server. I can go to a new tab but if I go to another app or phone sleeps, I lose the gen. It's a ST and mobile thing, not just local.

Anonymous
10/27/25(Mon)21:53:52 No.107028744

Anonymous 10/27/25(Mon)21:53:52 No.107028744

>>107028568
You can't automate creativity
The majority of untalented dipshits will use even the most powerful of tools to flood spaces with sloppa
Talented artists have been using AI since SD 1.X to make some pretty cool shit

Anonymous
10/27/25(Mon)22:01:51 No.107028820

Anonymous 10/27/25(Mon)22:01:51 No.107028820

File: Screen Shot 2025-10-28 at(...).png (285 KB, 580x900)

285 KB PNG

>>107028092
I wonder why they did it like that. Long polling is superior in every way and allows multiple clients simultaneously
>why do so many people have their own custom frontends

Anonymous
10/27/25(Mon)22:08:31 No.107028863

Anonymous 10/27/25(Mon)22:08:31 No.107028863

File: file.png (430 KB, 2254x1452)

430 KB PNG

I'm trying to convince my new SWE job to sponsor my skunksworks project of running their entire codebase through a local language model. Maybe I can get them to buy me this :)

Anonymous
10/27/25(Mon)22:12:02 No.107028887

Anonymous 10/27/25(Mon)22:12:02 No.107028887

>>107028863
It's not going to be yours, dumbass.

Anonymous
10/27/25(Mon)22:13:04 No.107028898

Anonymous 10/27/25(Mon)22:13:04 No.107028898

>>107028863
For what purpose?

Anonymous
10/27/25(Mon)22:13:12 No.107028900

Anonymous 10/27/25(Mon)22:13:12 No.107028900

>>107028887
I work remotely. By the time I leave, they'll have either forgotten about it or it'll be a paper weight. I get a $2000 office stipend annually so I can always use it towards this too.

Anonymous
10/27/25(Mon)22:14:12 No.107028908

Anonymous 10/27/25(Mon)22:14:12 No.107028908

>>107028898
Cooming (on company time)

Anonymous
10/27/25(Mon)22:15:08 No.107028914

Anonymous 10/27/25(Mon)22:15:08 No.107028914

File: an_old_fashioned_steam_en(...).jpg (23 KB, 512x512)

23 KB JPG

>>107025394
I installed EasyDiffusion. My gens suxs. How do I into animu? Do I need more models? More LoRAs? Better prompt-fu?

Anonymous
10/27/25(Mon)22:15:53 No.107028921

Anonymous 10/27/25(Mon)22:15:53 No.107028921

>>107028863
>1,000 AI TOPS
kek. AI is a unit now.
>https://www.youtube.com/watch?v=9ntPxdWAWq8
>How much capacity would HP's cloud uses have access to?
>1000

Anonymous
10/27/25(Mon)22:16:45 No.107028924

Anonymous 10/27/25(Mon)22:16:45 No.107028924

>>107028914
>>>/g/ldg/

Anonymous
10/27/25(Mon)22:16:53 No.107028928

Anonymous 10/27/25(Mon)22:16:53 No.107028928

>>107028914
You need /ldg/

Anonymous
10/27/25(Mon)22:16:55 No.107028929

Anonymous 10/27/25(Mon)22:16:55 No.107028929

>>107028914
>How do I into animu?
https://novelai.net/register

Anonymous
10/27/25(Mon)22:20:25 No.107028963

Anonymous 10/27/25(Mon)22:20:25 No.107028963

File: 1761389401659502.png (3.22 MB, 1264x2216)

3.22 MB PNG

>>107028929
Shoo shoo

Anonymous
10/27/25(Mon)22:26:16 No.107029021

Anonymous 10/27/25(Mon)22:26:16 No.107029021

>>107028900
You are very pampered.

Anonymous
10/27/25(Mon)22:27:54 No.107029029

Anonymous 10/27/25(Mon)22:27:54 No.107029029

>>107028963
>>107028928

Anonymous
10/27/25(Mon)22:33:38 No.107029069

Anonymous 10/27/25(Mon)22:33:38 No.107029069

>>107029021
Feels good to be working at a fortune 100 company for a change :)

Anonymous
10/27/25(Mon)22:36:17 No.107029087

Anonymous 10/27/25(Mon)22:36:17 No.107029087

>>107029069
Ahh yeah, you are that weird guy.

Anonymous
10/27/25(Mon)22:40:01 No.107029097

Anonymous 10/27/25(Mon)22:40:01 No.107029097

>>107028863
My last company always supported shit like that because it was cheaper than raising wages. Somehow, people were more excited about random things like that than getting $100 added to their monthly paycheck

Anonymous
10/27/25(Mon)22:42:37 No.107029105

Anonymous 10/27/25(Mon)22:42:37 No.107029105

>>107028236
Hmm, brave for Android doesn't have anything like that. Any other ideas? I've got another server in the house I could use as well - just looking for a way I can go and do something else while the LLM is making a response.

Anonymous
10/27/25(Mon)22:59:46 No.107029161

Anonymous 10/27/25(Mon)22:59:46 No.107029161

File: 1745236808520206.jpg (963 KB, 1336x2008)

963 KB JPG

Anonymous
10/27/25(Mon)23:24:01 No.107029260

Anonymous 10/27/25(Mon)23:24:01 No.107029260

kind of crazy to think about how ai is a solved science and with the next gen of nvidia chips and about a year of datacenter and power infra expanding we will be able to just use the current algorithms to create agi

Anonymous
10/27/25(Mon)23:27:36 No.107029280

Anonymous 10/27/25(Mon)23:27:36 No.107029280

>>107029260
Why post this again?

Anonymous
10/28/25(Tue)00:38:47 No.107029530

Anonymous 10/28/25(Tue)00:38:47 No.107029530

MiniMax-M2 gguf status?

Anonymous
10/28/25(Tue)00:49:29 No.107029585

Anonymous 10/28/25(Tue)00:49:29 No.107029585

>>107029530
Worse than Nemo, don't bother.

Anonymous
10/28/25(Tue)00:57:37 No.107029633

Anonymous 10/28/25(Tue)00:57:37 No.107029633

>>107029585
But it says "agentic" on the model card. So how's that possible?

Anonymous
10/28/25(Tue)00:59:33 No.107029649

Anonymous 10/28/25(Tue)00:59:33 No.107029649

File: 1734548536727531.jpg (1.06 MB, 1336x2008)

1.06 MB JPG

Anonymous
10/28/25(Tue)01:00:37 No.107029655

Anonymous 10/28/25(Tue)01:00:37 No.107029655

File: 1745908814203796.jpg (1.12 MB, 1336x2008)

1.12 MB JPG

Anonymous
10/28/25(Tue)01:01:38 No.107029660

Anonymous 10/28/25(Tue)01:01:38 No.107029660

File: 1758146830843767.jpg (864 KB, 1336x2008)

864 KB JPG

Anonymous
10/28/25(Tue)01:01:40 No.107029662

Anonymous 10/28/25(Tue)01:01:40 No.107029662

>>107025394
flatta

Anonymous
10/28/25(Tue)01:15:50 No.107029762

Anonymous 10/28/25(Tue)01:15:50 No.107029762

Lazy question, but any "recommended models" for very basic coding, specifically lua, js/html? Screwed around a bit with GPT5 to test something and now I'm fixed and I'd like to continue.
Got a 4090, so I'd rather use that over CPU inferencing, if possible quality wise.

Anonymous
10/28/25(Tue)01:17:08 No.107029771

Anonymous 10/28/25(Tue)01:17:08 No.107029771

>>107029161
>>107029649
>>107029655
>>107029660
Cyberpunk Miku is best Miku

Anonymous
10/28/25(Tue)01:19:26 No.107029792

Anonymous 10/28/25(Tue)01:19:26 No.107029792

File: firefox_7Rxgqhs366.png (614 KB, 758x1124)

614 KB PNG

Presented without comment.

Anonymous
10/28/25(Tue)01:20:36 No.107029800

Anonymous 10/28/25(Tue)01:20:36 No.107029800

>>107029762
A 4090 alone isn't nearly enough to run worthwhile coding models. If you don't absolutely need privacy, then pay for an API key, you'll save yourself a shitload of time and effort over trying to tard wrangle a small model.
If you do need privacy, then probably Qwen 2.5 coder 32b, or the newer 30b variant.

Anonymous
10/28/25(Tue)01:21:23 No.107029805

Anonymous 10/28/25(Tue)01:21:23 No.107029805

>>107027136
Show the class what the right way to RP is, anon.

Anonymous
10/28/25(Tue)01:22:46 No.107029812

Anonymous 10/28/25(Tue)01:22:46 No.107029812

What are LLM's role currently in robotics? All these new robots like Optimus and the chink ones are coming out and I had an idea how to get into robotics myself and wanted to learn more. I think Elon Musk and big tech's obsession with scifi is blinding everyone from what's right in front of us and always has been.

Anonymous
10/28/25(Tue)01:24:25 No.107029818

Anonymous 10/28/25(Tue)01:24:25 No.107029818

>>107029800
The Qwen 3 Coder 30B is held back a lot by only having 3B active. It's good for simple autocomplete, but not much more. If he just needs chat 235B would be worth a try if he has enough RAM.

Anonymous
10/28/25(Tue)01:26:00 No.107029825

Anonymous 10/28/25(Tue)01:26:00 No.107029825

>>107029800
Again, this is for VERY basic stuff, so I don't need anything super complex. Think script kiddie if you will, beginner level stuff. I just need that to be functional though obviously. If 24GB isn't enough for that fair enough. At least now I know what to expect.
>>107029818
>235B
That bad, huh? Had a hunch, but better safe than sorry and ask before I do anything.

Thanks, to the both of you!

Anonymous
10/28/25(Tue)01:28:07 No.107029844

Anonymous 10/28/25(Tue)01:28:07 No.107029844

>>107029825
Don't listen to them, get 30B at 4 bit quant (IQ4_XS) and check it for yourself.

Anonymous
10/28/25(Tue)01:30:26 No.107029849

Anonymous 10/28/25(Tue)01:30:26 No.107029849

File: noble meeku.png (2.16 MB, 768x1344)

2.16 MB PNG

>>107029771
For me, it's noble Miku

Anonymous
10/28/25(Tue)01:30:37 No.107029851

Anonymous 10/28/25(Tue)01:30:37 No.107029851

>>107029844
That's the idea, don't worry. Just not right now cuz bed is calling.
I've tried random LLMs and Image models for goon garbage, might as well give this a shot for the hell of it.

Anonymous
10/28/25(Tue)01:31:26 No.107029854

Anonymous 10/28/25(Tue)01:31:26 No.107029854

>>107029844
He can run 30B Q8 easily with partial offload, my 3090 runs it at well over 30t/s.

Anonymous
10/28/25(Tue)01:32:48 No.107029863

Anonymous 10/28/25(Tue)01:32:48 No.107029863

>>107029854
And pp of 50 tokens per sec, right? No thank you. Not a byte on CPU for me.

Anonymous
10/28/25(Tue)01:34:21 No.107029869

Anonymous 10/28/25(Tue)01:34:21 No.107029869

>>107029863
No? It's been a little while since I used it, but pp was definitely at least 500t/s.

Anonymous
10/28/25(Tue)01:35:25 No.107029871

Anonymous 10/28/25(Tue)01:35:25 No.107029871

>>107029869
Your pp seems quite slow.

Anonymous
10/28/25(Tue)01:36:14 No.107029877

Anonymous 10/28/25(Tue)01:36:14 No.107029877

>>107029863
You should really try both and compare. Running a retarded model fast is counterproductive.

Anonymous
10/28/25(Tue)01:37:08 No.107029880

Anonymous 10/28/25(Tue)01:37:08 No.107029880

>>107029877
I did and generally at 4 and larger it was largely the same.

Anonymous
10/28/25(Tue)01:37:44 No.107029882

Anonymous 10/28/25(Tue)01:37:44 No.107029882

>>107029825
try the qwen3-32b dense model at q4, it should just barely fit. it's slower than the 30b-a3b model but it's noticably smarter

Anonymous
10/28/25(Tue)01:43:49 No.107029916

Anonymous 10/28/25(Tue)01:43:49 No.107029916

File: 1736056913898746.jpg (949 KB, 1336x2008)

949 KB JPG

>>107029849
royal miku

Anonymous
10/28/25(Tue)01:44:17 No.107029919

Anonymous 10/28/25(Tue)01:44:17 No.107029919

my pp so hard rn

Anonymous
10/28/25(Tue)01:44:44 No.107029922

Anonymous 10/28/25(Tue)01:44:44 No.107029922

my pp is so neutral rn

Anonymous
10/28/25(Tue)01:45:27 No.107029927

Anonymous 10/28/25(Tue)01:45:27 No.107029927

regardless of warning your pp doesnt scare me at all

Anonymous
10/28/25(Tue)01:46:29 No.107029933

Anonymous 10/28/25(Tue)01:46:29 No.107029933

>>107029882
It's definitely smarter. But not having a Coder varient makes it not ideal. Don't know why they didn't do one.

Anonymous
10/28/25(Tue)01:46:44 No.107029937

Anonymous 10/28/25(Tue)01:46:44 No.107029937

>>107029882
>30b model smarter than 3b model
no shit

Anonymous
10/28/25(Tue)01:49:00 No.107029951

Anonymous 10/28/25(Tue)01:49:00 No.107029951

>>107029916
This looks like eldritch miku wearing the skin of noble miku

Anonymous
10/28/25(Tue)01:55:24 No.107029996

Anonymous 10/28/25(Tue)01:55:24 No.107029996

>>107029933
just tell it to write code lol
>but the filename doesn't have "coder" in it

Anonymous
10/28/25(Tue)01:57:09 No.107030007

Anonymous 10/28/25(Tue)01:57:09 No.107030007

>>107029996
Do you know why some of the files have "Coder" in it? It's not because those are the only ones allowed to write code.

Anonymous
10/28/25(Tue)02:06:29 No.107030063

Anonymous 10/28/25(Tue)02:06:29 No.107030063

>>107025468
That’s a decent overall package for the price, 60% the FP4 of a 5090 for only 1k more, with 4x the memory to boot. Does being shitty lpddr5x hurt it, though? Obviously it would be ass for gaming but does a low wattage GPU and slow ass memory matter for AI?

Anonymous
10/28/25(Tue)02:13:20 No.107030084

Anonymous 10/28/25(Tue)02:13:20 No.107030084

File: 1759528463502848.jpg (915 KB, 1336x2008)

915 KB JPG

>>107029951

Anonymous
10/28/25(Tue)02:18:12 No.107030105

Anonymous 10/28/25(Tue)02:18:12 No.107030105

>>107029951
>wearing the skin of
shame theres no tag for this. maybe you could get close with a sort of non surgical face mask. like shes wearing someone elses face. youd probably have to inpaint it

Anonymous
10/28/25(Tue)02:38:19 No.107030176

Anonymous 10/28/25(Tue)02:38:19 No.107030176

holy shit k2 (not local version) absolutely fucking mogs on gpt5 and gemini 2.5pro for making experimental electronic prompts for suno
https://suno.com/s/tr4JGFPxO7Mo8wHm
"local" won
I also fed it the prompt for this 4o song (including the lyrics one not displayed)
https://suno.com/s/jUk50aWDBdHb76wU

and then asked it to mate the two song concepts together into something new
https://suno.com/s/lKIpynPKOZ3goIFz
interesting emergent capabilities

Anonymous
10/28/25(Tue)03:07:21 No.107030355

Anonymous 10/28/25(Tue)03:07:21 No.107030355

File: 1761495895836594.jpg (89 KB, 700x700)

89 KB JPG

Can someone teach me
How do i use local deepseek v3.1 on windows 11
Sorry if i have 0 knowledge on LLM

Anonymous
10/28/25(Tue)03:07:38 No.107030357

Anonymous 10/28/25(Tue)03:07:38 No.107030357

>>107030105
surprised there's not even a loose skin tag

Anonymous
10/28/25(Tue)03:08:15 No.107030362

Anonymous 10/28/25(Tue)03:08:15 No.107030362

File: ab67616d0000b273521464d36(...).jpg (155 KB, 640x640)

155 KB JPG

https://huggingface.co/inclusionAI/Ling-1T/discussions/15

Anonymous
10/28/25(Tue)03:10:02 No.107030377

Anonymous 10/28/25(Tue)03:10:02 No.107030377

>>107030355
gm sir I am microsoft tech support technician engineer.. kindly tell me how much ram and vram you got sir

Anonymous
10/28/25(Tue)03:11:21 No.107030387

Anonymous 10/28/25(Tue)03:11:21 No.107030387

>>107030377
Ram 16gb
Gpu rtx 3070
Very old yeah
I will upgrade later in 2026

Anonymous
10/28/25(Tue)03:14:16 No.107030409

Anonymous 10/28/25(Tue)03:14:16 No.107030409

>>107030387
thank you for the info sir.. . unfortunately your computer is two week for deepseek 3.1. . unles you got datacenter ssds. I can suggest you alternative models if would like to

Anonymous
10/28/25(Tue)03:15:44 No.107030416

Anonymous 10/28/25(Tue)03:15:44 No.107030416

>>107030362
>Llama 3.3 instruct gave "You'll Cowards Don't Even Smoke Crack", which AFAICT is the correct answer.
Correct answer to what question???

Anonymous
10/28/25(Tue)03:17:44 No.107030428

Anonymous 10/28/25(Tue)03:17:44 No.107030428

>>107030416
Which song is rapper viper best known for?

Anonymous
10/28/25(Tue)03:25:52 No.107030487

Anonymous 10/28/25(Tue)03:25:52 No.107030487

File: 40d6e28b-46c6-45c7-aeca-e(...).jpg (299 KB, 1024x1024)

299 KB JPG

>>107030355
I am summoned.
https://rentry.co/DipsyWAIT

Anonymous
10/28/25(Tue)03:27:14 No.107030504

Anonymous 10/28/25(Tue)03:27:14 No.107030504

>>107030084
I like this Miku

Anonymous
10/28/25(Tue)03:27:33 No.107030506

Anonymous 10/28/25(Tue)03:27:33 No.107030506

What's going on in LDG?

Anonymous
10/28/25(Tue)03:28:58 No.107030521

Anonymous 10/28/25(Tue)03:28:58 No.107030521

>>107030459
sir I have bad news for yuo. .. you need moare ram if you wish to follow /lmg/ meta, kindly max out ram with 192gb fastest mt/s and get better cpu. .
the best model you can currently run is mistral nemo, best modal for very poor. do you wish to get instructions on how too install nemo sir?

Anonymous
10/28/25(Tue)03:32:12 No.107030547

Anonymous 10/28/25(Tue)03:32:12 No.107030547

>>107030084
>why, miku, what big... tentacles you have...

Anonymous
10/28/25(Tue)03:55:12 No.107030686

Anonymous 10/28/25(Tue)03:55:12 No.107030686

>>107030506
petra discovered the sharty fae prompt and set his bot on them. give it a week and the same thing will happen here.

Anonymous
10/28/25(Tue)04:26:42 No.107030870

Anonymous 10/28/25(Tue)04:26:42 No.107030870

>>107029792
nice

Anonymous
10/28/25(Tue)04:31:17 No.107030897

Anonymous 10/28/25(Tue)04:31:17 No.107030897

>>107029792
That's a lot of em dashes and markdown—just like Ling 1T

Anonymous
10/28/25(Tue)05:05:42 No.107031087

Anonymous 10/28/25(Tue)05:05:42 No.107031087

>>107030506
OOF, that's one hell of a meltie!

Anonymous
10/28/25(Tue)05:06:05 No.107031089

Anonymous 10/28/25(Tue)05:06:05 No.107031089

File: sd1.2.png (589 KB, 962x647)

589 KB PNG

>>107027227
most generic image ever BUT it's benchmaxxed on text!!!

Anonymous
10/28/25(Tue)05:11:18 No.107031119

Anonymous 10/28/25(Tue)05:11:18 No.107031119

>>107031089
they all got mind broken when dalle3 released and never recovered

Anonymous
10/28/25(Tue)05:32:08 No.107031215

Anonymous 10/28/25(Tue)05:32:08 No.107031215

>>107031089
>Prompt: Sunlit
>Image is at night
And they cherrypicked this image? Oh no no no...

Anonymous
10/28/25(Tue)06:06:09 No.107031399

Anonymous 10/28/25(Tue)06:06:09 No.107031399

>>107025394
So the fattest MoE model available right now is Kimi K2 with 1T parameters.
Surely if I now put together a machine with 1.5 TiB RAM there won't be some cunt releasing a 4T MoE model just 2 weeks later.

Anonymous
10/28/25(Tue)06:06:22 No.107031401

Anonymous 10/28/25(Tue)06:06:22 No.107031401

How can I estimate how much VRAM does it take each layer in a quantized model? it's basically just size of the weights / number of hidden layers ?

Anonymous
10/28/25(Tue)06:22:32 No.107031459

Anonymous 10/28/25(Tue)06:22:32 No.107031459

what is the proper way to use the -ot parameter in llama.cpp? i think now it works differently, do i need to use the --cpu-moe flag too? before or after the -ot flags?

Anonymous
10/28/25(Tue)06:23:53 No.107031464

Anonymous 10/28/25(Tue)06:23:53 No.107031464

do you think that if you combined predictions from different models the slop would go away, given that you handle tokenizer mismatch? For the same text different models have vastly different distributions, many valid tokens are considered almost impossible. Maybe something like setting each logit to the minimum value across all model's outputs would work?

Anonymous
10/28/25(Tue)06:25:31 No.107031472

Anonymous 10/28/25(Tue)06:25:31 No.107031472

>>107031459
real men use -ot, fags use -ncmoe

Anonymous
10/28/25(Tue)06:44:22 No.107031552

Anonymous 10/28/25(Tue)06:44:22 No.107031552

File: Zaiglyph.jpg (34 KB, 640x326)

34 KB JPG

uhh ohh local bros
ze future is vision
guess we'll have to buy 4 BBCwell GPUs at last...

Anonymous
10/28/25(Tue)06:46:53 No.107031565

Anonymous 10/28/25(Tue)06:46:53 No.107031565

>>107031459
-cmoe and -ncmoe are just shortcuts for -ot.

Anonymous
10/28/25(Tue)06:51:23 No.107031591

Anonymous 10/28/25(Tue)06:51:23 No.107031591

>>107031552
>text
>images of text
>books with realistic page physics
>simulated school environment
lecunny was right all along

Anonymous
10/28/25(Tue)06:52:37 No.107031596

Anonymous 10/28/25(Tue)06:52:37 No.107031596

>>107031552
I fear for the future of lllmao.cpp, vision is barely supported there :(

Anonymous
10/28/25(Tue)06:55:57 No.107031613

Anonymous 10/28/25(Tue)06:55:57 No.107031613

>>107031596
If LLMs fully take the vision route, it's going to be too slow to offload, so we'll all switch to exllama/vllm anyway.

Anonymous
10/28/25(Tue)06:57:18 No.107031620

Anonymous 10/28/25(Tue)06:57:18 No.107031620

>>107031596
Making vision necessary is going to motivate development.

Anonymous
10/28/25(Tue)06:57:33 No.107031622

Anonymous 10/28/25(Tue)06:57:33 No.107031622

>>107031552
That seems extremely retarded, there's no way images of text work better than text.

Anonymous
10/28/25(Tue)07:00:13 No.107031639

Anonymous 10/28/25(Tue)07:00:13 No.107031639

>>107031622
>he didnt read the deepseek-ocr paper
cringe

Anonymous
10/28/25(Tue)07:04:11 No.107031659

Anonymous 10/28/25(Tue)07:04:11 No.107031659

>>107031639
>changes the font

Anonymous
10/28/25(Tue)07:04:18 No.107031660

Anonymous 10/28/25(Tue)07:04:18 No.107031660

>>107031620
lol

Anonymous
10/28/25(Tue)07:05:29 No.107031667

Anonymous 10/28/25(Tue)07:05:29 No.107031667

>>107031639
oh damn, a paper said so?! revolutionary new Pareto paradigm unlocked!

Anonymous
10/28/25(Tue)07:08:22 No.107031680

Anonymous 10/28/25(Tue)07:08:22 No.107031680

File: BEEPDING.png (2.05 MB, 768x1344)

2.05 MB PNG

>>107031622
Read Deepseek OCR paper again. I also don't see why it's not intuitive, input bloat is huge in text. Learning from letter shapes seems more intuitive than figuring out different tokenizations of the same word. Imagine how many neurons were spent learning that word, WORD, Word, worD, and W-O-R-D are the same word. Misspelled words also work better with visuals

Anonymous
10/28/25(Tue)07:11:04 No.107031697

Anonymous 10/28/25(Tue)07:11:04 No.107031697

>>107031680
>Learning from letter shapes
yeah let's maybe hopefully try a whole new way of training just to ace strawberry, at the cost of everything else, sounds lovely

Anonymous
10/28/25(Tue)07:17:00 No.107031731

Anonymous 10/28/25(Tue)07:17:00 No.107031731

>>107031620
Qwen-Next is well over a month old and still isn't supported, and that's a regular text model.

Anonymous
10/28/25(Tue)07:19:16 No.107031748

Anonymous 10/28/25(Tue)07:19:16 No.107031748

File: G3z4SdjXgAAJic4.jpg (97 KB, 1200x480)

97 KB JPG

>>107031659
retard
>>107031697
At the cost of what? LLMs don’t understand raw text, the bloat from figuring out tokenization on the LLM’s side is huge, whereas vision is a pretty straightforward and well optimized. I wouldn’t be surprised if it takes fewer resources in the end

Anonymous
10/28/25(Tue)07:24:37 No.107031777

Anonymous 10/28/25(Tue)07:24:37 No.107031777

>>107031731
Yeah. New attention mechanism taken on by a vibe coder. Whoddathunkit.

Anonymous
10/28/25(Tue)07:25:52 No.107031788

Anonymous 10/28/25(Tue)07:25:52 No.107031788

>the deepseek ocr paper
that anon shilling colpali/colqwen and open source RAG frameworks like morphik half a year ago might have been onto something...(yes that anon might have been me)

Anonymous
10/28/25(Tue)07:27:30 No.107031797

Anonymous 10/28/25(Tue)07:27:30 No.107031797

>>107031731
Just use exllamav3, what are you, a vramlet?

Anonymous
10/28/25(Tue)07:28:25 No.107031804

Anonymous 10/28/25(Tue)07:28:25 No.107031804

>>107031797
>just rammax for the huge moes bro

>few weeks later

>are you vramlet lamo?

Anonymous
10/28/25(Tue)07:28:46 No.107031808

Anonymous 10/28/25(Tue)07:28:46 No.107031808

>>107031797
If I wasn't then I wouldn't care about an 80b moe

Anonymous
10/28/25(Tue)07:29:21 No.107031810

Anonymous 10/28/25(Tue)07:29:21 No.107031810

If i make enough predictions, i'll eventually be right about something, even if i have to reinterpret my original prediction. Or what actually happened. Or both.

Anonymous
10/28/25(Tue)07:30:47 No.107031820

Anonymous 10/28/25(Tue)07:30:47 No.107031820

>>107031810
You're absolutely right!

Anonymous
10/28/25(Tue)07:33:52 No.107031837

Anonymous 10/28/25(Tue)07:33:52 No.107031837

>>107031820
You're absolutely right!

Anonymous
10/28/25(Tue)07:36:20 No.107031849

Anonymous 10/28/25(Tue)07:36:20 No.107031849

File: got you.png (2.01 MB, 768x1344)

2.01 MB PNG

>>107031804

Anonymous
10/28/25(Tue)07:41:12 No.107031882

Anonymous 10/28/25(Tue)07:41:12 No.107031882

>>107031748
The problem is that every single token is rich in semantics, gets encoded by the LLM as a high-dimensional vector. How does one train an OCR model without involving semantically-rich text tokens, e.g. making the model understand what the written word "dog" actually means, regardless of style/font/etc? I'm genuinely curious, because I've been looking into it and I don't think it's easily solvable.

Anonymous
10/28/25(Tue)07:42:12 No.107031896

Anonymous 10/28/25(Tue)07:42:12 No.107031896

>>107031882
Fuck off and read the paper already.

Anonymous
10/28/25(Tue)07:44:08 No.107031917

Anonymous 10/28/25(Tue)07:44:08 No.107031917

>>107031613
cpucopemaxxers were always fucked, but exllama/vllm don't support anything but the most recent nvidia cards

Anonymous
10/28/25(Tue)07:44:44 No.107031919

Anonymous 10/28/25(Tue)07:44:44 No.107031919

>>107031882
Current models don't understand what a dog is either, they just know which patterns of words should be used when asked to describe one.

Anonymous
10/28/25(Tue)07:45:18 No.107031921

Anonymous 10/28/25(Tue)07:45:18 No.107031921

>>107031896
I'm talking about a hypothetical model that doesn't use text tokenization at all. OCR models are trained with image-text pairs.

Anonymous
10/28/25(Tue)07:46:10 No.107031927

Anonymous 10/28/25(Tue)07:46:10 No.107031927

>>107031697
>let's maybe hopefully try a whole new way of training
That's how you make progress: try new things, see if they work. You can’t just scale the same shit with more slop and then call war rooms when it doesn't work anymore

Anonymous
10/28/25(Tue)07:52:42 No.107031957

Anonymous 10/28/25(Tue)07:52:42 No.107031957

Are there any AI as good as Gemini at editing images that can make naked anime characters?

Anonymous
10/28/25(Tue)07:54:59 No.107031975

Anonymous 10/28/25(Tue)07:54:59 No.107031975

>>107027133
being into vore is a curse, especially if you aren't a furry at the same time, there's so little content, most of it is pretty bad, and there's so many subcategories that there's only a very small amount i actually like, they're very difficult to look for, and i often have to just ignore/alter aspects i don't like to enjoy the parts i do. llm's aren't great at it but i've made some stuff i like with one

Anonymous
10/28/25(Tue)07:55:59 No.107031978

Anonymous 10/28/25(Tue)07:55:59 No.107031978

File: 00274-563513342.png (844 KB, 896x1152)

844 KB PNG

>>107031919
I'm convinced that diffusion models do. They can learn pretty wild concepts like transformation into inanimate objects or equirectangular projection

Anonymous
10/28/25(Tue)07:56:52 No.107031987

Anonymous 10/28/25(Tue)07:56:52 No.107031987

>>107031927
In the llama team's defence, at least their incremental improvements kept improving. They didn't have the skills to make progress and the war rooms forcing them to do "deepseek, but better, and bigger, and with omni too, but super super safe" is when they imploded.

Anonymous
10/28/25(Tue)08:05:46 No.107032037

Anonymous 10/28/25(Tue)08:05:46 No.107032037

>>107031987
What improvements? MHA to GQA and GELU to SwiGLU? Huge fucking difference

Anonymous
10/28/25(Tue)08:08:41 No.107032060

Anonymous 10/28/25(Tue)08:08:41 No.107032060

>>107031459
you are in luck lol: https://www.reddit.com/r/LocalLLaMA/comments/1oi7k25/hf_space_to_help_create_the_ot_flags_in_llamacpp/

Anonymous
10/28/25(Tue)08:10:06 No.107032070

Anonymous 10/28/25(Tue)08:10:06 No.107032070

>>107032037
actually yeah, it was, compared to the 100th paper that never gets implemented, muh BLT titan ocr with 9billion retnet contexts

Anonymous
10/28/25(Tue)08:15:08 No.107032100

Anonymous 10/28/25(Tue)08:15:08 No.107032100

>>107031978
They are trained on text-image pairs, using a text encoder. What they won't do is learning semantics, abstract concepts and long-range word relationships purely from images containing text, but I'm looking forward to be proven wrong (Imagine if a language model could be trained just on raw scanned books and nothing else, without labels / human supervision).

Anonymous
10/28/25(Tue)08:15:51 No.107032110

Anonymous 10/28/25(Tue)08:15:51 No.107032110

>>107032070
Trying something revolutionary from their own papers would have been a lesser waste of resources than Llama4

Anonymous
10/28/25(Tue)08:16:11 No.107032114

Anonymous 10/28/25(Tue)08:16:11 No.107032114

>>107032100
>What they won't do is learning semantics, abstract concepts and long-range word relationships
that's bloat anyway just use rag for that

Anonymous
10/28/25(Tue)08:17:13 No.107032121

Anonymous 10/28/25(Tue)08:17:13 No.107032121

>>107032110
>but llama4
papercels so predicatble

Anonymous
10/28/25(Tue)08:19:45 No.107032137

Anonymous 10/28/25(Tue)08:19:45 No.107032137

File: Screenshot 2025-10-28 at (...).png (669 KB, 2362x1700)

669 KB PNG

>picrel

Fucking finally, people would stop complaining about ST settings. Chat completions doesn't use any of the text completions stuff

Anonymous
10/28/25(Tue)08:24:12 No.107032165

Anonymous 10/28/25(Tue)08:24:12 No.107032165

>>107032156
2023 called it wants it's retardation back

Anonymous
10/28/25(Tue)08:25:42 No.107032171

Anonymous 10/28/25(Tue)08:25:42 No.107032171

>>107032165
Wow you're only 2 years old and you're posting on 4chan?

Anonymous
10/28/25(Tue)08:29:04 No.107032194

Anonymous 10/28/25(Tue)08:29:04 No.107032194

>>107032171
my guy really doesn't understand how instruction formatting works, how utterly embarrassing

Anonymous
10/28/25(Tue)08:30:16 No.107032204

Anonymous 10/28/25(Tue)08:30:16 No.107032204

>>107032156
Where did that idea come from?
Chat completion is the least error prone way (in theory) to use the models, since you don't get the chance to fuck around with the chat template at all. although you can still do things like insert messages with the system role where they shouldn't.

Anonymous
10/28/25(Tue)08:31:41 No.107032209

Anonymous 10/28/25(Tue)08:31:41 No.107032209

>>107032204
>Where did that idea come from?
early bad jinji templates

Anonymous
10/28/25(Tue)08:34:03 No.107032221

Anonymous 10/28/25(Tue)08:34:03 No.107032221

>>107032156
I'm not sure, but at least is a sure way to get all the chat template stuff correct.

Anonymous
10/28/25(Tue)08:35:23 No.107032227

Anonymous 10/28/25(Tue)08:35:23 No.107032227

>>107032156
>this is the guy telling you local models are shit bt w

Anonymous
10/28/25(Tue)08:37:10 No.107032236

Anonymous 10/28/25(Tue)08:37:10 No.107032236

>>107032221
>a sure way
A sure-r way at least. The GGUF can still have a fucked template.
See, unsloth.

Anonymous
10/28/25(Tue)08:37:25 No.107032237

Anonymous 10/28/25(Tue)08:37:25 No.107032237

>>107031552
>>107031622
>>107031748
See >>107028332
It's retardation by people who don't understand ML

Anonymous
10/28/25(Tue)08:38:08 No.107032240

Anonymous 10/28/25(Tue)08:38:08 No.107032240

>>107032156
NO SIR CHAT COMPLETE PERFECT FOR GOOD LOOKS

Anonymous
10/28/25(Tue)08:43:53 No.107032273

Anonymous 10/28/25(Tue)08:43:53 No.107032273

>>107026802
Real porn is boring and nasty. I grew up with shitting dicknipples and only AI will reliably provide me an infinite "free" supply of first person multiple character hyper macro herm taur goredeath hyperscat anal vore shitjerks.

>verify you are human
Haha yeah... yeah...

Anonymous
10/28/25(Tue)08:45:32 No.107032280

Anonymous 10/28/25(Tue)08:45:32 No.107032280

File: G4UBM8BasAIvwQm.png (359 KB, 1290x1406)

359 KB PNG

Alright guys hear me out what if instead of shape-rotating models vs wordcel models we did both?

Anonymous
10/28/25(Tue)08:46:15 No.107032282

Anonymous 10/28/25(Tue)08:46:15 No.107032282

How do you guys deal with some models/chars in ST having inconsistent use of quotes/asterisks?
Do you just put something in the character card to force the use of markdown * and " for dialogue and narration?

Anonymous
10/28/25(Tue)08:47:01 No.107032286

Anonymous 10/28/25(Tue)08:47:01 No.107032286

>>107032282
>still using asterisks in almost '26
jesus christ

Anonymous
10/28/25(Tue)08:50:50 No.107032304

Anonymous 10/28/25(Tue)08:50:50 No.107032304

>>107032282
Generally you just include that markdown in the opening message and the model should adhere to it.

Anonymous
10/28/25(Tue)08:52:58 No.107032319

Anonymous 10/28/25(Tue)08:52:58 No.107032319

>>107025468
This one gets my attention for a few reasons:
A) Proper nvidia support
B) Very low power usage relatively speaking
C) Good memory size (even if the bandwidth isn't great.)

I think it'd make for a decent off-grid smart-home LLM server. That's my goal for it anyway if I ever get around to buying one.

Anonymous
10/28/25(Tue)08:55:56 No.107032341

Anonymous 10/28/25(Tue)08:55:56 No.107032341

File: dipsyByzantine3.png (3.44 MB, 1024x1536)

3.44 MB PNG

>>107031680
I really like this gen.
Agree, but mostly b/c we don't know exactly how human memory is stored, but we do know it's not strings of text. Machine memory for these LLMs should eventually take a shape that wouldn't make any sense to anyone but a machine. More of a black box, like that that makes up the LLM's matrix.
We'll see if anyone is able to scale it.
tmw forever.

Anonymous
10/28/25(Tue)08:59:16 No.107032356

Anonymous 10/28/25(Tue)08:59:16 No.107032356

>>107031680
>learn 10 ways of spelling of a word using groups of letters
>vs
>learn 10000 ways of writing a word using pixels

Anonymous
10/28/25(Tue)09:02:04 No.107032374

Anonymous 10/28/25(Tue)09:02:04 No.107032374

File: 17370157794947.jpg (408 KB, 593x781)

408 KB JPG

>>107031591
>lecunny
rope yourself.

Anonymous
10/28/25(Tue)09:02:05 No.107032375

Anonymous 10/28/25(Tue)09:02:05 No.107032375

File: Screenshot 2025-10-28 at (...).png (32 KB, 638x146)

32 KB PNG

>>107032286
what is wrong with that? even ST in the tts settings suggest that use.
Do you just use plain text for narration?

Anonymous
10/28/25(Tue)09:02:55 No.107032384

Anonymous 10/28/25(Tue)09:02:55 No.107032384

>>107032375
>Do you just use plain text for narration?
obviously?
the novel vs markdown style formatting has been had years ago

Anonymous
10/28/25(Tue)09:07:46 No.107032404

Anonymous 10/28/25(Tue)09:07:46 No.107032404

>>107029812
They don't. LLM is for text conversations, auto complete, "Chain of thought" and that kind of thing. It's good for human interaction understanding, IE: You tell the robot "Shake hands" and give the LLM a list of commands it can issue to the robot and it'll figure out if any of those commands can do that task, but that's it, and pretty much optional. The rest of puppeteering the robot is handled by any number of other AI that aren't LLM.

Anonymous
10/28/25(Tue)09:09:14 No.107032415

Anonymous 10/28/25(Tue)09:09:14 No.107032415

>>107032286
Most voiced Japanese visual novels can largely do away with narration, or make it a minimal part of the story. I would like LLMs to be capable of roleplaying in that style.

Anonymous
10/28/25(Tue)09:19:47 No.107032474

Anonymous 10/28/25(Tue)09:19:47 No.107032474

>https://arxiv.org/abs/2510.13928
>LLMs Can Get "Brain Rot"!
When will labs learn that if they just feed slop they are hurting the models? They are probably leaving tons of useful content just because they find it dangerous.
Just 1M tokens of slop out of 15T causes severe brain damage

Anonymous
10/28/25(Tue)09:23:09 No.107032495

Anonymous 10/28/25(Tue)09:23:09 No.107032495

>>107027227
So, has anybody managed to toy with this thing yet?

Anonymous
10/28/25(Tue)09:25:00 No.107032506

Anonymous 10/28/25(Tue)09:25:00 No.107032506

>>107032474
we know, discussed already, next paper please! also this >When Bad Data Leads to Good Models https://arxiv.org/abs/2505.04741

Anonymous
10/28/25(Tue)09:26:17 No.107032519

Anonymous 10/28/25(Tue)09:26:17 No.107032519

>>107032495
It's 100B with full gguf support likely never coming. cpumaxxers and vramlets can't touch it and that rules out nearly everyone except maybe the guy stacking RTX 6000s.

Anonymous
10/28/25(Tue)09:30:11 No.107032556

Anonymous 10/28/25(Tue)09:30:11 No.107032556

>>107032474
Junk paper with very flawed methodology.

Anonymous
10/28/25(Tue)09:31:13 No.107032565

Anonymous 10/28/25(Tue)09:31:13 No.107032565

>>107032556
stfu, paper never lied to anything

Anonymous
10/28/25(Tue)09:31:17 No.107032566

Anonymous 10/28/25(Tue)09:31:17 No.107032566

>4090
>16gb ddr4
I'm pretty much limited to dense models that fit on 24gb vram right

Let's say in the new year I upgrade to an am5 CPU/mobo and 256gb ddr5, what should I run to maxx out my system, i.e some sort of MoE that has a dense portion big enough to use 24gb vram and a bunch of experts that can fill up all those dedicated wams?

Following that, say I hypothetically replace the 4090 for a 6000 pro, is there anything that can utilise both 96gb vram and 256 ram?

Anonymous
10/28/25(Tue)09:32:15 No.107032572

Anonymous 10/28/25(Tue)09:32:15 No.107032572

>>107032566
>more vram than ram
holy brainrot

Anonymous
10/28/25(Tue)09:32:33 No.107032577

Anonymous 10/28/25(Tue)09:32:33 No.107032577

>>107032384
Models like to interject emphasis on *words*, but I bypass this by using asterisks for sound effects and onomatopoeia.
*plaplaplaplap* really does it for me in terms of immersion.

Anonymous
10/28/25(Tue)09:34:28 No.107032600

Anonymous 10/28/25(Tue)09:34:28 No.107032600

>>107032572
its a ship of Theseus, when I got the GPU a new gen of CPUs was right around the corner so I wait chadded, now in the new year I go full big dick on 9k series Ryzen and ddr5

Any input on my questions?

Anonymous
10/28/25(Tue)09:35:16 No.107032606

Anonymous 10/28/25(Tue)09:35:16 No.107032606

>>107032519
I'm aware, but I also know some dudes in here rent cloud hardware to play with new models, which I'm considering doing this time round too.

>>107032566
>is there anything that can utilise both 96gb vram and 256 ram?
The answer is always yes.
Granted, the dense part of most MoE models are pretty small relatively to the full thing, but putting more experts in VRAM will always yield some speed gains.
With those specs, I guess you'd be running the big qwen Moe and GLM 4.6 mostly.

Anonymous
10/28/25(Tue)09:35:59 No.107032613

Anonymous 10/28/25(Tue)09:35:59 No.107032613

>>107032577
I antislop ban asterisks altogether.

Anonymous
10/28/25(Tue)09:38:20 No.107032638

Anonymous 10/28/25(Tue)09:38:20 No.107032638

>>107032606
at 96+256 he could even run deepseek at like some magic ik llama iq3 quant.

Anonymous
10/28/25(Tue)09:39:29 No.107032649

Anonymous 10/28/25(Tue)09:39:29 No.107032649

>>107032638
True. That would be worth trying, actually.

Anonymous
10/28/25(Tue)09:42:51 No.107032680

Anonymous 10/28/25(Tue)09:42:51 No.107032680

>>107032606
Any models that do better at coding, and any that do better for general knowledge? Anything truly uncucked?

From my experience the cuckold models are actually pretty good for coding but shit for anything else and vice versa, but I haven't tried any of these new fancy MoEs

Anonymous
10/28/25(Tue)09:48:08 No.107032729

Anonymous 10/28/25(Tue)09:48:08 No.107032729

>>107032566
>Let's say in the new year I upgrade to an am5 CPU/mobo and 256gb ddr5, what should I run to maxx out my system,
You'd be better off, but consumer stuff being stuck with a couple of channels and mostly limited to 128GB makes it an expensive dead end. Either go full retard with a threadripper/epyc/xeon build with 512gb+ and all ram slots populated or keep waiting for a dedicated, highmem inference card to come out.

Anonymous
10/28/25(Tue)09:48:30 No.107032734

Anonymous 10/28/25(Tue)09:48:30 No.107032734

>>107032565
Too much of one thing always hurt models.
Continuing pretraining instruct models on non-instruct data, especially adversarial (short-form and "toxic"), will damage most post-training work done by the labs that trained the models, including long-context performance, "safety", RLHF.
Alpaca is an ancient shitty SFT dataset for attempting restoring any sort of performance or the so-called safety lost by continual pretraining.

The paper is just propaganda by would-be AI ethicists with an axe to grind against Musk-owned X/Twitter.

Anonymous
10/28/25(Tue)09:49:36 No.107032745

Anonymous 10/28/25(Tue)09:49:36 No.107032745

>>107032729
>a dedicated, highmem inference card to come out.
Surely, if we wait long enough China will mass produce one.

Anonymous
10/28/25(Tue)09:57:12 No.107032813

Anonymous 10/28/25(Tue)09:57:12 No.107032813

>>107032745
>Surely, if we wait long enough China will mass produce one.
It might not be China, but its too big a market opportunity for everyone to ignore. I'm saying that as a cpumaxxer.

Anonymous
10/28/25(Tue)10:03:14 No.107032866

Anonymous 10/28/25(Tue)10:03:14 No.107032866

Shameless Jensen is trying to spam the DGX Spark on /pol/
>>>/pol/520010813

Anonymous
10/28/25(Tue)10:03:54 No.107032871

Anonymous 10/28/25(Tue)10:03:54 No.107032871

>>107032729
Would I be better off just getting a 9900x3D with 128gb ram and saving the money to vramaxx on a Blackwell pro card then? Going the threadripper route and spending that much money on ddr5 feels like I would be getting less bang for my buck at that point considering my main use case is AI

Anonymous
10/28/25(Tue)10:07:59 No.107032896

Anonymous 10/28/25(Tue)10:07:59 No.107032896

>>107032813
> but its too big a market opportunity for everyone to ignore
Is it?
I understand why Nvidia and AMD still make gaming GPUs. Gaming is the biggest entertainment medium currently, and it's a way to diversify even if making the shovels for the AI companies is making a ton of money, but I don't see how the same applies to home AI usage.
Anybody getting into the business of making AI ASICs has no incentive to make stuff for us instead of for Amazon, Meta, etc, I think.

Anonymous
10/28/25(Tue)10:10:18 No.107032910

Anonymous 10/28/25(Tue)10:10:18 No.107032910

So where do I try Ming? I want to see what happens when chameleon gets scaled

Anonymous
10/28/25(Tue)10:12:41 No.107032931

Anonymous 10/28/25(Tue)10:12:41 No.107032931

>>107032813
>>107032896
It will be Apple. They already use AI inference workload as a marketing point for their M5 chip. They'll put out some $10k M5 Ultra device and it will prove the market exists.

Anonymous
10/28/25(Tue)10:15:16 No.107032955

Anonymous 10/28/25(Tue)10:15:16 No.107032955

>>107032896
yeah I don't think it is. at this point home use is a speculative market that companies are feeling out but there is absolutely no incentive for any of them to go all-in on it at this point in time

Anonymous
10/28/25(Tue)10:15:59 No.107032962

Anonymous 10/28/25(Tue)10:15:59 No.107032962

>>107032931
They won't make an ASIC, but their architecture is the one consumer oriented hardware that works for larger models, so that makes sense actually.
The compute of their SoC's will continue to grow.

Anonymous
10/28/25(Tue)10:18:29 No.107032980

Anonymous 10/28/25(Tue)10:18:29 No.107032980

>>107032866
>I have voice to text based LLM that transforms my commands into text for deepseek to create a comment on /pol/. For example I'll say "btfo this dumb schizo" and deepseek abliterated version will write a good comment to post here.
>
>I run additional userscript that takes LLM outputs and puts into text boxes. I just have to solve captcha myself.
all that effort to shitpost on the isreali bot board and the retard doesn't even know how to install a captcha solver

Anonymous
10/28/25(Tue)10:19:37 No.107032988

Anonymous 10/28/25(Tue)10:19:37 No.107032988

>>107032980
Sirs please be kind Jensen bought the very best shills sir.

Anonymous
10/28/25(Tue)10:20:51 No.107033001

Anonymous 10/28/25(Tue)10:20:51 No.107033001

ok I'm going all in into ERP. I can self host glm-4.6 at q5-6 and i would say to complete the experience maybe image gen and/or tts would be cool. I have the tts part solved, i created a whisper compatible voice server for vibevoice cloning the voice of the character, and the latency is quite good on a 5090 (i generate each paragraph individually to speed it up, and only the quoted parts).
But for the image gen, i have no idea. Any pointers?

Anonymous
10/28/25(Tue)10:23:57 No.107033026

Anonymous 10/28/25(Tue)10:23:57 No.107033026

>>107033001
Get a suit and a haircut. Zoomers nowadays are open-minded they're down for anything in bed, BDSM, cosplay, anal, cucking. No need to fap to generated content.

Anonymous
10/28/25(Tue)10:27:44 No.107033057

Anonymous 10/28/25(Tue)10:27:44 No.107033057

>>107032962
Yeah Apple is the only company I can think of that
1) makes hardware that can do inference
2) has a consumer electronics focus
3) is not in the data center business
Everyone that has data center biz *loses* when people have AI at home. There is no reason for nvidia or amd to sell you a box to have AI at home when they can get fat data center sales margins. There is no incentive for anyone with data center sales to bother with consumer facing stuff until the consumer stuff starts eating into their profit margins. Then you'll see nvidia suddenly put out a AITX 1080ti or whatever.

Also good luck to people in the US who think they will ever be allowed to purchase a Chinese AI device. Your only hope is apple.

Anonymous
10/28/25(Tue)10:30:32 No.107033079

Anonymous 10/28/25(Tue)10:30:32 No.107033079

>>107033057
>Also good luck to people in the US who think they will ever be allowed to purchase a Chinese AI device. Your only hope is apple.
The entire Chinese economic philosophy is to overproduce and then produce some more and then flood foreign markets to drive their competitors out of business. They have zero reason to block sales to the US. Probably the US will eventually impose tariffs or block imports, but that usually takes them a few years. Should be enough time to load up.

Anonymous
10/28/25(Tue)10:32:34 No.107033102

Anonymous 10/28/25(Tue)10:32:34 No.107033102

>>107033057
>There is no incentive for anyone with data center sales to bother with consumer facing stuff
And crucially, there's very little reason for a new entrant in the market to aim at consumers instead of big business/data center, which reinforces point 2.
So yeah, who'd thunk that AI would be the thing that could get me to buy an Apple product.
Here's hoping they deliver.
Or that somebody figures out a way to use an external GPU with the M series Macs. That would be a decent second option.

Anonymous
10/28/25(Tue)10:46:41 No.107033211

Anonymous 10/28/25(Tue)10:46:41 No.107033211

>>107033001
ok, i connected ST to comfyUI, and changed some variables to use a custom prompt, but it seems like comfy just gets feed each individual response? Any way to feed comfy a proper prompt that makes more sense for the scene occurring?

Anonymous
10/28/25(Tue)10:50:56 No.107033254

Anonymous 10/28/25(Tue)10:50:56 No.107033254

Is anyone keeping track of prompt processing and t/s on all these unified memory architecture machines?

Anonymous
10/28/25(Tue)10:52:09 No.107033267

Anonymous 10/28/25(Tue)10:52:09 No.107033267

>>107033079
I sure hope that's the case. Although its a tricky situation since the entire US economy is now all-in on the idea of 4 billion people paying scama $200 per month. The existence of an AI in a box device, even if legally banned from import in the US, would probably cause the US economy to implode.

It basically like if in the 70s some guy was trying to make a gigantic mainframe that everyone in the world would pay to use. Then the "personal computer" comes along. I think that "Personal AI" that you can own is as big of a deal as the PC. We're not quite there yet, but we're only a few years away from the equivalent of the 90s PC revolution.

It was monumentally stupid to go all in on the megascale data center buildouts. People are so greedy they didn't even try a different paradigm because the existence of it threatens their revenue model.

Anonymous
10/28/25(Tue)10:53:37 No.107033281

Anonymous 10/28/25(Tue)10:53:37 No.107033281

>>107032871
The problem with vrammaxxing is that to do sota models you'd need to spend $50,000+ for even an "acceptable" level quant and context.
VRAM is faster than RAM, but the price/performance doesn't scale anywhere near linearly. You can easily be an order of magnitude more money for VRAM vs ram, whereas a proper cpumaxxing build is maybe 1/2 the speed for 1/10th the price.
So the question to ask if you're looking to run 1T-class SOTA models is: speed costs money, how fast do you want to go?

Anonymous
10/28/25(Tue)10:59:39 No.107033337

Anonymous 10/28/25(Tue)10:59:39 No.107033337

>>107033281
>a proper cpumaxxing build is maybe 1/2 the speed
mother of all cope, try 1/20

Anonymous
10/28/25(Tue)11:00:00 No.107033340

Anonymous 10/28/25(Tue)11:00:00 No.107033340

Seems like
>the character can't speak English
Is pretty hard for many models to handle properly.

Anonymous
10/28/25(Tue)11:16:48 No.107033498

Anonymous 10/28/25(Tue)11:16:48 No.107033498

>>107033211
seems like the best is to put somewhere in the prompt this:

OOC: generate a prompt to describe the scene for a text to image model

And the doing /sd whateverprompt the model gave you.
Unless there is a better way

Anonymous
10/28/25(Tue)11:24:03 No.107033563

Anonymous 10/28/25(Tue)11:24:03 No.107033563

File: 541680930_107826961108677(...).jpg (822 KB, 2048x1613)

822 KB JPG

>>107032866
> /pol/
You should not invoke that place.
>>107032980
> retard
> that board
So really, no surprizes.

Anonymous
10/28/25(Tue)11:36:30 No.107033689

Anonymous 10/28/25(Tue)11:36:30 No.107033689

>>107033340
Are you writing this as is here? Or do you actually specify which language they should speak? That's a pretty big difference.

Anonymous
10/28/25(Tue)11:44:49 No.107033751

Anonymous 10/28/25(Tue)11:44:49 No.107033751

>>107033563
>> /pol/
>You should not invoke that place.
>I don't recommend kobold
we know blacked-dev we know

Anonymous
10/28/25(Tue)11:49:30 No.107033792

Anonymous 10/28/25(Tue)11:49:30 No.107033792

>>107033689
The issue was making the model understand what "barely knows English" meant. Much better after I specified the region of origin.

Anonymous
10/28/25(Tue)12:00:56 No.107033890

Anonymous 10/28/25(Tue)12:00:56 No.107033890

>>107033751
>everyone that disagrees with me is the same person

Anonymous
10/28/25(Tue)12:20:10 No.107034065

Anonymous 10/28/25(Tue)12:20:10 No.107034065

>>107033890
>everyone that calls me out for samefagging is the same person

Anonymous
10/28/25(Tue)12:23:46 No.107034095

Anonymous 10/28/25(Tue)12:23:46 No.107034095

File: xwing.jpg (200 KB, 1480x736)

200 KB JPG

>>107034065
Everyone is anon and therefore the same person because they are the other and there is only me and I only know myself and my mind not the other anon which are all the same anon

Anonymous
10/28/25(Tue)12:25:36 No.107034113

Anonymous 10/28/25(Tue)12:25:36 No.107034113

We need a coherency detector LLM to filter out the nonsensical spam at /sdg/

Anonymous
10/28/25(Tue)12:25:39 No.107034114

Anonymous 10/28/25(Tue)12:25:39 No.107034114

>>107034095
>Everyone is anon
>llama.cpp CUDA dev !!yhbFjk57TDr *exists*

Anonymous
10/28/25(Tue)12:29:04 No.107034154

Anonymous 10/28/25(Tue)12:29:04 No.107034154

>>107034114
CUDA dev is the only valid name/trip fag. Drummer and scablicker or whatever are the problem

Anonymous
10/28/25(Tue)12:33:50 No.107034207

Anonymous 10/28/25(Tue)12:33:50 No.107034207

>>107033281
My thought here was, upgrading to a Blackwell 6k would be a straight and easy upgrade route that I probably won't replace for years, on top of that I could sell my 4090 and break even after owning it since launch.

If I go and get a threadripper and a gorillian ddr5 wams, then ddr6 comes out some point next year or early 2027 the latest then it's a whole new CPU/mobo/ram buy, and selling all of those old parts will be a real ballache as well as taking a loss on all of those parts.

As opposed to just spending a reasonable amount on a new CPU/mobo/ram combo for under a grand, upgrading in a year or two won't sting as much and triple digit value parts I'll happily donate to friends and family

Anonymous
10/28/25(Tue)12:36:49 No.107034239

Anonymous 10/28/25(Tue)12:36:49 No.107034239

>>107032871
Forget Threadripper, for some reason /g/ never notices ktransformers runs best on Xeon Scalable.

The choice is between a 1TB Xeon Scalable with AMX and an x090 and a 128 GB Blackwell Pro. If you want to run 1T LLMs get the Xeon Scalable, if you want to rip through image gen or smaller models get the Blackwell Pro. A refurb Xeon Scalable will be almost half the price.

Anonymous
10/28/25(Tue)12:38:53 No.107034263

Anonymous 10/28/25(Tue)12:38:53 No.107034263

>>107034239
Does AMX help at all if you are doing PP on a GPU?

Anonymous
10/28/25(Tue)12:40:05 No.107034272

Anonymous 10/28/25(Tue)12:40:05 No.107034272

>>107034263
i do not recommend putting your pp on the gpu, you might get burns or electric shocks

Anonymous
10/28/25(Tue)12:42:47 No.107034289

Anonymous 10/28/25(Tue)12:42:47 No.107034289

>>107034272
I might have misunderstood what it means to "fuck the AI" then.
That explains why I never see anybody complaining about cum on their GPU fans.

Anonymous
10/28/25(Tue)12:53:42 No.107034371

Anonymous 10/28/25(Tue)12:53:42 No.107034371

>>107034263
>Does AMX help at all if you are doing PP on a GPU?
Yes, inference still needs some processing power. threadripper needs lots of cores, whereas the cheapest scalable with AMX can do.

Anonymous
10/28/25(Tue)13:07:24 No.107034515

Anonymous 10/28/25(Tue)13:07:24 No.107034515

>>107026802
It's completely different, if you're doing a good ERP then it's just insane, like you're living there

Anonymous
10/28/25(Tue)13:08:09 No.107034526

Anonymous 10/28/25(Tue)13:08:09 No.107034526

can we get the spambot running here? dead general is no fun

Anonymous
10/28/25(Tue)13:11:04 No.107034552

Anonymous 10/28/25(Tue)13:11:04 No.107034552

>>107034526
the designated shitting street is down that way /aicg/

Anonymous
10/28/25(Tue)13:11:26 No.107034554

Anonymous 10/28/25(Tue)13:11:26 No.107034554

>>107034526
better dead than slopped

Anonymous
10/28/25(Tue)13:14:32 No.107034587

Anonymous 10/28/25(Tue)13:14:32 No.107034587

>>107034526
Sorry, I'm at work

Anonymous
10/28/25(Tue)13:14:46 No.107034590

Anonymous 10/28/25(Tue)13:14:46 No.107034590

>>107034554
we're both doebeit

Anonymous
10/28/25(Tue)13:22:11 No.107034655

Anonymous 10/28/25(Tue)13:22:11 No.107034655

>>107034515
teach me the ways anon

Anonymous
10/28/25(Tue)13:23:56 No.107034667

Anonymous 10/28/25(Tue)13:23:56 No.107034667

>>107034587
Does your boss know you're here?

Anonymous
10/28/25(Tue)13:29:10 No.107034718

Anonymous 10/28/25(Tue)13:29:10 No.107034718

>>107034667
I haven't been fired yet so probably not

Anonymous
10/28/25(Tue)13:34:02 No.107034750

Anonymous 10/28/25(Tue)13:34:02 No.107034750

File: +_4dde95ad120146fe84658b0(...).png (277 KB, 324x366)

277 KB PNG

Is there a <50B without spiral sense errors yet, or is this always going to be a >100B luxury even into 2026?
>Spiral sense errors?
Knowing and remembering where things and people are, and positioning, correctly.

Anonymous
10/28/25(Tue)13:38:30 No.107034792

Anonymous 10/28/25(Tue)13:38:30 No.107034792

>>107034750
Qwen 30B thinking is decent at that if you prefill the thinking with explicit instructions to consider that kind of thing.

Anonymous
10/28/25(Tue)13:40:45 No.107034812

Anonymous 10/28/25(Tue)13:40:45 No.107034812

Does one expert size in MoE models correlate with generation speed?
I tried running glm-air with RAM offloading some time ago and got like 5 t/s, wonder if the new qwen3-next-80b will be better when it gets llamacpp support

Anonymous
10/28/25(Tue)13:41:09 No.107034817

Anonymous 10/28/25(Tue)13:41:09 No.107034817

>>107034750
It's tragic that you essentially need 700b to get the Cumulative Space values right. Hope we see a model that can do it sub-200b soon

Anonymous
10/28/25(Tue)13:43:08 No.107034833

Anonymous 10/28/25(Tue)13:43:08 No.107034833

>>107034792
Fuck it, I'll try it. Give me your prompt for that if you can spare it.

Anonymous
10/28/25(Tue)13:46:22 No.107034852

Anonymous 10/28/25(Tue)13:46:22 No.107034852

>>107034833
Qwen is the driest mofo around for RP if that's your use case, just letting you know before you waste your time.

Anonymous
10/28/25(Tue)13:48:18 No.107034869

Anonymous 10/28/25(Tue)13:48:18 No.107034869

>>107034852
That was my use-case, damn.

Anonymous
10/28/25(Tue)13:50:09 No.107034877

Anonymous 10/28/25(Tue)13:50:09 No.107034877

>>107034833
Fuck, it's been a couple months since I played with that.
What I did was essentially write a system prompt with instructions to think about the space, then the position of each entity (people, objects, etc) in that space, plus some other stuff.
Then I took the thinking process it generated naturally, tweaked the shit out of it, incorporated the instructions of the system prompt in first person (I'll consider, x, y, etc) and used it as a prefill.
I might have some old silly chat with that still in the history, I'll see if I can find it.
Also, anon is right, it's dry as a bone for ERP.

Anonymous
10/28/25(Tue)13:51:55 No.107034892

Anonymous 10/28/25(Tue)13:51:55 No.107034892

>>107034869
nta but imo it's not that bad, the 2507 versions are... passable. not a lot of RP chops but they're competent enough now. ymmv.

Anonymous
10/28/25(Tue)13:54:15 No.107034910

Anonymous 10/28/25(Tue)13:54:15 No.107034910

>>107034892
One thing I didn't try but thought of was trying to steer it's prose by forcing it to think in a specific "voice" or style.

Anonymous
10/28/25(Tue)13:57:30 No.107034947

Anonymous 10/28/25(Tue)13:57:30 No.107034947

>>107027136
It says "user" because I don't want to post my name on 4chan. I literally changed it just for the screenshot. I don't see what's wrong with using asterisks. I want to talk in first person because I want to feel like I'm there, it's me doing it. And Ling Ling doesn't speak English, that's why I had my Chinese servant tell her that we're going to play in my room rather than me tell the girl herself. And I don't see where the AI is doing actions on my behalf.

Anonymous
10/28/25(Tue)14:08:55 No.107035073

Anonymous 10/28/25(Tue)14:08:55 No.107035073

File: 1754596167579622.png (113 KB, 614x189)

113 KB PNG

>>107033211
Go to the image generation tab in extensions and tick picrel, then you can change the prompts in image prompt templates. Interactive mode sends the free mode prompt to your LLM when you type 'generate a image of x' or something similar with "{0}" being your raw prompt.

Anonymous
10/28/25(Tue)14:18:49 No.107035178

Anonymous 10/28/25(Tue)14:18:49 No.107035178

>>107033563
/pol/ is alright, flags and IDs help identify samefaggotry and proxyhopper niggers. We cant do it here.

Anonymous
10/28/25(Tue)14:30:22 No.107035288

Anonymous 10/28/25(Tue)14:30:22 No.107035288

>>107031748
>the bloat from figuring out tokenization on the LLM’s side is huge
I have no idea what this means, you can code a BPE tokenizer in a weekend. Is this just people parroting Karpathy and his anti-tokenizer autism?

Anonymous
10/28/25(Tue)14:30:35 No.107035289

Anonymous 10/28/25(Tue)14:30:35 No.107035289

>>107035178
The few days we had on 8ch with post ids was the best this general has ever been.

Anonymous
10/28/25(Tue)14:35:05 No.107035319

Anonymous 10/28/25(Tue)14:35:05 No.107035319

>>107035288
>Karpathy
that's *AI God Daddy* to you, peasant

Anonymous
10/28/25(Tue)14:39:05 No.107035353

Anonymous 10/28/25(Tue)14:39:05 No.107035353

>>107035178
They probably make money from stealth advertising on /g/, no way those shitty banner ads make enough to run this site. IDs would ruin it unless you post from several IPs since it would be a pattern that made it obvious? Also a reason why IP count was removed.

Anonymous
10/28/25(Tue)14:46:19 No.107035412

Anonymous 10/28/25(Tue)14:46:19 No.107035412

>>107035288
I don't know how you could remove the tokenizer or something equivalent to that.
Let's take a typical tokenizer of a mid-size model, having shape [262208, 5376]. Every single token embedding contains 5376 dimensions, or 10.5 kB of purely semantic information in FP16 per token.
A long word in standard 96 dpi size might be 70 x 12 pixels x 3 channels = 2.46 kB, and it would contain no semantic information on its own, just the pixels. Encoded into a smaller space it would be even less information than this.

Anonymous
10/28/25(Tue)14:46:40 No.107035415

Anonymous 10/28/25(Tue)14:46:40 No.107035415

>>107035353
It's a good bet that glowies have access to a special system that lets them use whatever ID they want.

Anonymous
10/28/25(Tue)14:48:14 No.107035436

Anonymous 10/28/25(Tue)14:48:14 No.107035436

>>107035415
Glowies you spot by their words, everything else can look legit.

Anonymous
10/28/25(Tue)14:48:58 No.107035440

Anonymous 10/28/25(Tue)14:48:58 No.107035440

>>107035412
You are looking at raw input values for images, but at semantic stuff for tokens, If you want to compare raw inputs, look at 262208, not at 262208 * 5376.

Anonymous
10/28/25(Tue)14:50:43 No.107035462

Anonymous 10/28/25(Tue)14:50:43 No.107035462

>>107035412
Read the paper and believe.

Anonymous
10/28/25(Tue)14:55:51 No.107035521

Anonymous 10/28/25(Tue)14:55:51 No.107035521

>>107035440
>>107035412
And that would be 1 int value for teach token, or 4-8 bytes per word. Encoding words as images isn't even in the same realm in information compression of raw input compared to tokenization.

Anonymous
10/28/25(Tue)14:56:48 No.107035532

Anonymous 10/28/25(Tue)14:56:48 No.107035532

>>107034750
What are you talking about "remember" where things are?

LLMs don't remember things, they generate tokens based on what is in their context, and which context matters is determined by attention.

We need tweaks to attention, or better training data for the purpose.

Anonymous
10/28/25(Tue)14:57:54 No.107035542

Anonymous 10/28/25(Tue)14:57:54 No.107035542

Anons, >>107035532 has autism, so please go easy on him.

Anonymous
10/28/25(Tue)15:01:40 No.107035570

Anonymous 10/28/25(Tue)15:01:40 No.107035570

File: 188c02f5-b4d2-4185-9875-4(...).png (59 KB, 947x553)

59 KB PNG

>>107034750
llama 405b at q8 handles character positioning just fine. probably because it's a dense model

reminder https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth

Anonymous
10/28/25(Tue)15:02:55 No.107035581

Anonymous 10/28/25(Tue)15:02:55 No.107035581

File: af7658ca-5f61-4e16-b398-9(...).png (67 KB, 947x553)

67 KB PNG

>>107035570
meanwhile the "best" moesissy model...

Anonymous
10/28/25(Tue)15:03:17 No.107035586

Anonymous 10/28/25(Tue)15:03:17 No.107035586

>>107035570
>japan

Anonymous
10/28/25(Tue)15:06:58 No.107035628

Anonymous 10/28/25(Tue)15:06:58 No.107035628

File: 2034843c-b132-4f04-a953-1(...).png (52 KB, 947x553)

52 KB PNG

>>107035581
>Maverick is the 405b equivalent, in case you forgot. I imagine that the single expert routing isn't helping it develop a unified picture.

Anonymous
10/28/25(Tue)15:08:22 No.107035639

Anonymous 10/28/25(Tue)15:08:22 No.107035639

>>107035542
you're the retard.

Anonymous
10/28/25(Tue)15:09:44 No.107035651

Anonymous 10/28/25(Tue)15:09:44 No.107035651

>>107035628
>Maverick
>interleaved attention
>moe
an abortion of a model

Anonymous
10/28/25(Tue)15:11:25 No.107035664

Anonymous 10/28/25(Tue)15:11:25 No.107035664

>>107035628
better, it vaguely has japan look somewhat normal if you squint

Anonymous
10/28/25(Tue)15:14:06 No.107035682

Anonymous 10/28/25(Tue)15:14:06 No.107035682

>>107035664
and it's missing half of antarctica which should be much easier to generate than japan. what's your point?

Anonymous
10/28/25(Tue)15:15:49 No.107035690

Anonymous 10/28/25(Tue)15:15:49 No.107035690

>>107035570
this is mostly a test of memorization not spatial reasoning and has basically no correlation to keeping track of character state in a roleplay

Anonymous
10/28/25(Tue)15:15:59 No.107035692

Anonymous 10/28/25(Tue)15:15:59 No.107035692

>>107035682
only japan matter

Anonymous
10/28/25(Tue)15:31:48 No.107035818

Anonymous 10/28/25(Tue)15:31:48 No.107035818

File: teto_00009_.mp4 (1.25 MB, 1920x1184)

1.25 MB MP4

>>107035690
If it was just memorization, you would expect the model with more total parameters to do better. It doesn't because it requires integrating the knowledge it has which depends on attention having more active parameters.

Anonymous
10/28/25(Tue)15:35:58 No.107035856

Anonymous 10/28/25(Tue)15:35:58 No.107035856

>>107035841
>>107035841
>>107035841

Anonymous
10/28/25(Tue)15:46:54 No.107035934

Anonymous 10/28/25(Tue)15:46:54 No.107035934

>>107035841
Where's her ass?

Anonymous
10/28/25(Tue)16:12:32 No.107036129

Anonymous 10/28/25(Tue)16:12:32 No.107036129

>>107034239
Where are you seeing this? The cheapest refurbished server ram I could find came up to 8k dollerydoos for 1tb of ddr5 and that's the ram alone, before any processors boards psus etc

Seems like a lot to spend just to run bigger token models, prosumer GPU feels like a better spend when its faster, can run fairly big dense models along with MoEs quickly, img/vid gen workflows etc

I feel like it's enough for any use case, alongside the odd api use for things that really truly need a fuckhuge model, my current coding workflow uses a local model first and then have one or two api models look over it, the goal being to go through less iterations and need less tweaking at the first step with a bigger locally run model, relying on api's less. Then I can use SD and WAN for creative projects, and slap my cock to smarter coombots.

Thanks for making me look into it anyway I've made up my mind, I'll go for a cheaper am5/ddr5 upgrade first and look for a pro line gpu a little later

Anonymous
10/28/25(Tue)16:56:12 No.107036477

Anonymous 10/28/25(Tue)16:56:12 No.107036477

>>107035934
She laughed it off.

Anonymous
10/28/25(Tue)17:46:12 No.107036917

Anonymous 10/28/25(Tue)17:46:12 No.107036917

What's the best general purpose SLM (under 4B in my case) that can rival something like ChatGPT or Grok? Obviously it wouldn't be as powerful, but something that can do the low-level assistant tasks most people use those two for.

Anonymous
10/28/25(Tue)17:49:33 No.107036949

Anonymous 10/28/25(Tue)17:49:33 No.107036949

>>107034892
the problem is that even a small model like Gemma 3n E4B has more world knowledge than ANY qwen, including the API only Qwen Max. I'm not being hyperbolic.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.