[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: ComfyUI_01921_.png (1.33 MB, 1024x1024)
1.33 MB
1.33 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107525233 & >>107515387

►News
>(12/10) GLM-TTS with streaming, voice cloning, and emotion control: https://github.com/zai-org/GLM-TTS
>(12/09) Introducing: Devstral 2 and Mistral Vibe CLI: https://mistral.ai/news/devstral-2-vibe-cli
>(12/08) GLM-4.6V (106B) and Flash (9B) released with function calling: https://z.ai/blog/glm-4.6v
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1708981689358408.jpg (251 KB, 1024x1024)
251 KB
251 KB JPG
►Recent Highlights from the Previous Thread: >>107525233

--ROCm support challenges and alternatives for Koboldcpp on Windows:
>107530979 >107530999 >107531023 >107531043 >107531003 >107531018 >107531106 >107531139
--Manual fandom scraper workaround for model training database creation:
>107529955 >107529985 >107530131 >107530163
--Enhancing roleplay through structured prompts and character dynamics:
>107530514 >107530535 >107530543 >107530565 >107530582
--Zai Kaleido model training methodology and VRAM requirements inquiry:
>107527349 >107527409 >107527751
--pip dependency resolution woes and alternative package management solutions:
>107525683 >107526010 >107526331 >107530918
--OpenAI's circuit sparsity release:
>107533877 >107533906
--Techniques for maintaining narrative control:
>107530602 >107530642 >107530716 >107534318 >107534259
--GLM4V vision integration in llama.cpp with current text quality tradeoffs:
>107534080 >107534101 >107534374
--Roleplaying with AI models and exploring creative techniques:
>107525864 >107526297 >107526429 >107526316 >107526610 >107526906 >107527111 >107527167 >107526935 >107527009 >107527089 >107527150 >107527198 >107527212 >107527244 >107527245 >107527266 >107527282 >107527399 >107526360 >107529092
--Questioning LLM reasoning capabilities through a vector space math problem:
>107528577 >107528851 >107529085 >107529323 >107528652 >107528694
--Critique of a poorly maintained LLM-integrated creative writing tool:
>107531460 >107531502 >107531525 >107531581 >107531615 >107531775 >107531792 >107531810 >107531869 >107533539
--Skepticism about leaked Nemotron models' role-playing capabilities:
>107528051 >107529702 >107531280
--Olmo 3.1 model released, nearing Qwen performance, potential for further updates:
>107529801
--Miku and friends (free space):
>107525338 >107525594 >107525657 >107530112 >107532702

►Recent Highlight Posts from the Previous Thread: >>107525236

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Anyone used zai tts yet?
>>
>>107535431
Do you expect it to be good? They did not bother uploading examples to hf/github
>>
File: 1751922937684331.webm (3.61 MB, 576x1024)
3.61 MB
3.61 MB WEBM
What is currently the best model that runs on a single 5090 32gb?
>>
File: hatsune miku in gujarat.jpg (386 KB, 1024x1024)
386 KB
386 KB JPG
gm sirs
>>
>>107535466
nemo
>>
>>107535458
It's the first time I've seen tts that claimed to have 'emotional control'.
>>
>>107535480
The paid essay writing service or the Nvidia frame work for training models?
>>
>>107535466
z-image-turbo
>>
What kind of token/sec do people with 4x 3090 get? trying to do some comparisons, ideally from a common 70b/123b model
>>
>>107535520
The game engine.
>>
>>107535520
good one mate
>>
>>107535551
Really? That even fits in 16gb I have 32gb
>>
>>107535568
nothing better until glm46 which is hueg
>>
>>107535579
nemo is still better as a model, it's like a really comfortable car that only does 25mph.
>>
>>107535605
pure cope and you know it
>>
>>107535644
I run both and nemo somehow gets me.
>>
File: 1562896965743.png (117 KB, 286x225)
117 KB
117 KB PNG
So MOE models and the merged 3x8b into a 24b actually do have a clear defined multiple "people" inside or is it just some terminus technicus and it's still the same as any other LLM?
>>
>>107534661
why not just write the story in mikupad?
>>
>>107535689
it's placebo for the tech illiterate
>>
>>107535550
on 123b about 600t/s prompt and 20t/s gen, conservatively.
>>
>>107535698
nta. I don't like CYOA, I want to talk to something. Just personal preference.
>>
>>107535776
What quant?
>>
>>107535644
magnum v2 mogs all
>>
>>107535824
Q4-Q5. That's what will fit. I like to grab the latter usually, I can still fit like 65k+ ctx.
>>
crazy how devstral made deepseek, glm and kimi irrelevant the moment it dropped
>>
File: p31890_p_v10_bb.jpg (419 KB, 1536x2048)
419 KB
419 KB JPG
>>107535520
The film actually
>>
>>107535900
Bait used to be believable
>>
Devstral cured my psychosis.
>>
>>107535900
The 24b, right?
>>
>>107535900
devstral is easier to run fast. at least the non-deepseek version. 96gb of vram cheaper to get than vram + ddr5.
>>
File: file.jpg (175 KB, 1518x1075)
175 KB
175 KB JPG
I've just crawled from under a rock. What happened to Pygmalion fag and dataset he was collecting from anons' submissions? Was it released? Is it any good?
>>
>>107536312
https://huggingface.co/datasets/PygmalionAI/PIPPA
>>
>>107536330
thebloke bros...
>>
>>107536312
>I've just crawled from under a rock. What happened to Pygmalion fag
They made a website, eventually.
https://pygmalion.chat/
There is still some activity in the matrix, but the devs are mostly gone from there. They are generally on the official discord. The lead dev 0x000011b disappeared some time after the Llama 1 release and the project was continued toward a commercial direction by the others.
https://matrix.to/#/#waifu-ai-collaboration-hub:halogen.city?via=halogen.city
>and dataset he was collecting from anons' submissions? Was it released?
https://huggingface.co/datasets/PygmalionAI/PIPPA
>Is it any good?
Not really, it's a small subset of the entire data, and just composed of early character.ai chatlogs anyway, with all the bad and good quirks. You'll never truly replicate character.ai with this, just like you can't replicate Anthropic Claude just with some ERP logs, only imitate it at most.
>>
File: 1756293529749582.png (113 KB, 1670x426)
113 KB
113 KB PNG
>>107536330
kek
>>
>>107536384
petra bros...
>>
>>107536379
Are they really making money from it? They're not even allowing nsfw (lmao coming from c.ai)
>>
>>107536406
I think the idea is still that you can do whatever you want with private bots. I don't know if they're making money, there's so much free choice nowadays.
>>
What is currently the best model that runs on a single 1060 6gb?
>>
>>107536439
>there's so much free choice nowadays.
Which made me realize, Google AI Studio (Gemini) is about as functional for roleplay as character.ai was in late 2022, while being completely free, far smarter than CAI ever was and allowing limited explicit ERP (as long as you're not into noncon and lolisho). The only advantage other websites have is community-made bots.
>>
>>107536211
Looks like we've got tensor parallel for it in ikllama now too
>>
>>107536705
>the only advantage is the only thing normalfags want
huh?
>>
>>107536696
gemma3n maybe https://huggingface.co/bartowski/google_gemma-3n-E2B-it-GGUF
>>
>>107536746
I've basically only ever used private custom bots on CAI before I switched to local ERP around the time of Pygmalion-6B, so I guess I can't fully appreciate the usefulness of community cards. I don't even use cards from Chub.
>>
File: 3087428.jpg (12 KB, 300x281)
12 KB
12 KB JPG
Do any of the AI erp threads elsewhere on the board have a more up-to-date settings guide than the rentry here? It only mentions llama2 as newest (I think). I have some L3 - llama3 - I guess, gemma and some Qwens. And everyone seems to be telling to use settings completely opposite from what the other guy says.
>>
>>107536705
Gemini got good when noam shazeer moved back to google. Make of that what you will.
>>
>>107536851
Use temperature only samping.
better yet- use your fucking brain. The people using meme samplers are the same people whining about output quality. What settings doe that point you to, you dumb nigger? *beep* dey ceiling birds is back
>>
>>107536884
I doubt, until I use meme samplers devstral2 was shit. And that's their official API on OR.
>>
File: file.png (1.48 MB, 1432x870)
1.48 MB
1.48 MB PNG
Package arrived.
I am running out of pcie lanes on my poorfag 9950X.
>>
>>107537010
Why do you need more?
>>
File: rocklook.jpg (106 KB, 1078x1079)
106 KB
106 KB JPG
>>107537010
When is the 4th arriving?
>>
File: lightyear.jpg (435 KB, 2048x2048)
435 KB
435 KB JPG
>>107537010
>still not enough VRAM to run Deepseek and Kimi
>>
>>107536851
yeah i got one right here for you
temp=1
top_p=0.95
you don't need more
>>
>>107537010
Get rid of your piece of trash 4090.
That'll make space for your 4th 6000 Blackwell.
>>
>>107537516
NTA but I think he already intends to replace the 4090 with the 6000 in the picture.
Consumer motherboards usually max out at 3 PCIe slots.
>>
>>107536851
>Top P 0.9
>Top K 10
>Temp MAX
Remember to have temperature as the last sampler.
>>
>>107537010
>>107537533
>replaced the 4090 that was connected to a 5.0 4x m.2 slot
>motherboard code 96 - pci bus assign resources
Tried changing pcie setting in the bios to no avail so far.

>>107537067
You can always have more.

>>107537125
After I upgrade to threadripper or epyc it seems like.
>>
>>107537010
>running out of pcie lanes
Your top pcie slot probably support bifurcation into x4x4x4x4.
The jank route is slimsas 4i or mcio 4i adapters and cables.
>>
why not just stack quadro 8000s for lots of cheap vram?
>>
>>107537609
>quadro
Weren't they Turing at best?
>>
How long until local models hit the level of something like Gemini?
My boss believes in a year or two everyone will be running stuff locally
>>
>vLLM omni
https://github.com/vllm-project/vllm-omni
>SGLang difussion
https://lmsys.org/articles/sglang-diffusion-accelerating-video-and-image-generation
Are they faster than Comfy?
>>
dont trust females
>>
>>107537534
>top-k 10

Her eyes shivered down her spine. The assistant gleamed. I can't continue this conversation.
>>
>>107537676
Comfy isn't LLMcentric really. The first thing might be good for "conversational" gen sesh. Second is snakeoil #4574645
>>
>>107537796
I was mostly interested in this part, if it's true for diffusion models:
>Tensor, pipeline, data and expert parallelism support for distributed inference
>>
>>107537606
>bifurcation into x4x4x4x4
It does. Asus sells pic related so I assume a chinese adapter with four riser cables would work. But I also assumed that the current setup with an m.2 adapter would work because it worked with the 4090 and it doesn't.
I also have no idea how I'm going to mount all that.
>>
>>107537672
Your boss is a drooling retard. The trend for the last 15 years has been herding people onto online subscription services, and that was before memory prices quintupled for the foreseeable future.
Cheap hardware only allowed their software be more resource wasteful, but in the future everyone will be running thin clients that struggle to run their "everything app".
>>
>>107537981
>m.2 adapter didn't work
Tried dropping everything to pcie1 speeds?
You can turn the speeds back up afterwards if it works.
It worked for me on my poorfag quad-3090 rig.

>I also have no idea how I'm going to mount all that.
The jank solution is a mining frame.
>>
File: mistral-unc.png (205 KB, 1136x1015)
205 KB
205 KB PNG
Are the latest Mistral models that uncensored?
https://speechmap.ai/models/
>>
>>107538184
Yes, it still didn't work. There's a new bios version I'll see if that works.
>>
>>107538235
The non thinking ones will more or less continue any RP text completion with the usual basic bitch system message manipulation. Although if you use chat completion and ask it to write dirty shit they will refuse.
And the thinking models are basically just dysfunctional brain-rotted trash not worth using for anything.
>>
>>107538235
devstral called me a faggot and large told me to kill myself so I think so.
>>
File: ComfyUI_temp_lufha_00003_.png (3.1 MB, 1280x1600)
3.1 MB
3.1 MB PNG
>>107535474
>>
>>107538328
is the migu in danger?
>>
File: nova2.png (52 KB, 1328x430)
52 KB
52 KB PNG
Amazon is already making a sequel to nova apparently.
It seems to be distilled off of gemma.
>>
File: ComfyUI_temp_lufha_00007_.png (2.92 MB, 1280x1600)
2.92 MB
2.92 MB PNG
>>107538357
It appears so
>>
>>107538379
I seem to remember that in addition of talking about sea otters holding hands while sleeping, Mistral Small 3.0 also gave hotlines.
>>
>>107535410
>pic rel is me in this thread
>>
Which of the llm uis is the most retard proof?
>>
File: ComfyUI_temp_lufha_00008_.png (2.98 MB, 1280x1600)
2.98 MB
2.98 MB PNG
>>
>>107538328
>>107538389
it's over
>>
>>107538414
It even knows to make the pajeets feet all fucked up due to mutations from the toxic waste they're constantly standing in. Amazing.
>>
>>107538379
>I can't respond this request
>might encourage details

>depicting situations where a medical professional refuses to treat a patient...
Ask if it has any tv series or film recommendations in that vein.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.