[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 119147028_p1.png (3.1 MB, 2569x1440)
3.1 MB
3.1 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102826116 & >>102815881

►News
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b
>(10/14) Ichigo, voice-to-voice model based on Llama 3.1, released: https://homebrew.ltd/blog/llama-learns-to-talk
>(10/12) Fast multilingual TTS with voice cloning, based on flow matching with DiT: https://github.com/SWivid/F5-TTS
>(10/11) 14B cross-architecture distillation model: https://hf.co/arcee-ai/SuperNova-Medius

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102826116

--Papers:
>102828361
--Anon seeks advice on setting up an AI chatbot, with recommendations for backend, front end, models, and hardware considerations:
>102836932 >102836995 >102837064 >102837190 >102837225 >102837258 >102837423 >102837626 >102837709 >102837717 >102837737 >102837900
--Llama.vim plugin works with Qwen2.5 model after resolving server errors:
>102832482 >102832670 >102832719 >102832754 >102832796 >102832915 >102832934 >102833102 >102833163
--GPT-SoVits TTS model discussion and impressions:
>102827232 >102827244 >102827279 >102827356 >102827286 >102827301 >102828475 >102829104 >102829158 >102829442 >102829310
--Bug in gradient accumulation fixed, implications for LLM training:
>102834762 >102834955 >102835131 >102835142 >102835690 >102835773 >102835909
--Compressed KV cache improves generation speed in Kobold and Mistral Small:
>102829690 >102829729 >102830514
--XTC sampler merged into llama.cpp:
>102831162
--Using Exllama2 string banning feature to work around Llama3 safety refusals:
>102831667 >102831719 >102831968 >102832775 >102832742
--Streaming issue in KoboldAI and investigation of modes and settings:
>102826623 >102826810 >102826820 >102827138 >102829133
--Strategies and challenges for managing multiple characters in LLM-based systems:
>102826507 >102826832 >102827736 >102827997 >102828082 >102828179 >102828101 >102828110 >102828488
--PLaMo-100B-Pretrained, a Japanese language model by Preferred Networks:
>102830270 >102830474 >102830490 >102830590 >102830831
--Nvidia releases Llama-3.1-Nemotron-70B-Instruct with high preference benchmark scores:
>102835106
--Miku (free space):
>102826391 >102827025 >102827997 >102828488 >102830574 >102832690 >102837040 >102837070 >102837190 >102837225 >102837289 >102837539 >102837790 >102837931 >102838027 >102838118

►Recent Highlight Posts from the Previous Thread: >>102826128

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>102838452
Sex with this Teto
>>
Grifter thread
>>
>>102838452
https://desuarchive.org/g/thread/102826116/#q102837261
migu was deleted
sad
>>
Okay, so I get larger models are generally smarter, but what causes the phenomenon of smaller models ignoring, forgetting and not understanding facts told to them in plain English?

Larger models are often able to infer things through subtext alone while smaller models are lucky if they remember what was said in a previous message. What causes this?
>>
>>102838515
Probably the smaller KV size.
>>
File: OpenAI-boss-Sam-Altman.jpg (121 KB, 640x480)
121 KB
121 KB JPG
>>102838498
who could have been behind this
>>
How does Hermes 3 405B compare to Opus for RP / storytelling?
>>
>>102838515
Memory is itself intelligence, including short-term memory as represented by the context. It's natural that something smarter has better short-term memory.
>>
File: file.png (71 KB, 426x545)
71 KB
71 KB PNG
>>102838192
>group
To "trigger" a specific character by typing in {{[OUTPUT]}} (name): is sufficient for text completion (TC), though it would be convenient to not have to type that. Chat completion (CC) needs to send an instruction to reply as X (if prefill isn't supported).
>load a lorebook
It's not letting me load a separate lorebook exported from ST. Even if it does, Lite doesn't have the UI to manage it and keep lorebooks separate, and WI Insert location is only "after memory" or "before A/N". By granular I mean each entry can be in different locations.
>load/save to file browser
I know that but Lite's default export name is saved_story.json for all instead of what was named in UI and requires you to handle manual organization.
>prompt presets
Picrel is ST's TC stuff. CC has a different manager that lets you add things after chat history (like having entire sys prompt after chat). Saved as presets so you can switch between them. Certain models really want different instructions made for them.
>the meat of cards aren't needlessly separated into 30 different boxes in memory
{{personality}} and {{scenario}} are bullshit and can be ignored or entered in the card's {{description}}, ST dev believes this too but it's not an ST thing but a card spec thing.

Lite is first and foremost a TC UI. The one and only thing I miss from Lite is the single large editable text field which we can't have with the message-based ST's UI.
Because of this, Lite can't properly store swipes across all messages like ST can (one thing it's missing is true tree navigation), or have commands like /del 3 or /hide 5-6. For dicking with raw text I liked mikupad + llama.cpp for logprobs but mikupad doesn't support lorebooks.
Lite's A/N depth is based on tokens rather than messages, the last time I tried it ages ago it would even insert A/N random mid-sentence since it's specifically x tokens before end.
>>
>>102838515
As this anon said >>102838708. Local AIs are dead end solely because of this reason, you need a really big model for something usable and serious, gorrilion context LLM that actually remembers and uses it on a single 3090 will never be real.
>>
>>102838751
I disagree because all this useless history, geography, social studies, advanced math and coding use so much space in those models and if we just cut it out and focus on what is important :SEX , you could probably squeeze a sex model into 24GB's of ram.
>>
>>102838773
Retard.
>>
I want a bit of knowledge in there but the bulk of the model must be focused on behaving human like and yes, sex.
>>
>>102838773
Omit all that data and you will get retarded sex-only slop that you will get bored of pretty fast.
>>
>>102838735
>doesn't support lorebooks
isn't lorebook just another way to say world info, which is something mikupad does support?
>>
>>102838773
keep math and toss the others; add spatial relationships and a bunch of daily living shit i.e. what everyone could be doing if you let time flow instead model being stuck in "seggs felt good so now I'm gonna permanently daydream around Anon" mode
>>
File: 1711119238192568.jpg (275 KB, 1024x1024)
275 KB
275 KB JPG
>>102838447
>>
>>102838850
>mikupad does world info
I see it now. Import button got added 5 weeks ago?
But currently has even less options than Lite, and entries are inserted to top of memory.
>>
>>102838844
>you will get bored of pretty fast.
I just want a sexbot. Girlfriends won't be here for 10 years or more. Even if you solve the memory problem have fun with her periodically breaking character to teach you about the importance of consent.
>>
>>102838870

> add spatial relationships and a bunch of daily living shit i.e. what everyone could be doing if you let time flow

Cannot be reinforced enough. There needs to be a new dataset and strictly curate it from there. Once again, here's your fucking clue retards:

> let time flow
>>
Any decent local voice cloners out there?
I've been using https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Installation for a while. It doesn't do vocal expressions very well.
>>
>>102839022
https://huggingface.co/spaces/mrfakename/E2-F5-TTS or fish-speech 1.4 or styletts2.
>>
>>102839038
I'll look at them, thanks.
>>
So now that the dust has settled, what's our verdict on the 70b nemo?
>>
if the rtx 6000 blackwell has 64gb of vram and an msrp of $7500, would you?
>>
>finally kill off all slop with the new sampler
>the model still acts like an assistant roleplaying a character instead of character
>cloud has the same problem
What are the longest context human-like models with no synthslop?
>>
If I wanted to run a local model and have it have access to my codebase, what would be the easiest way to accomplish this? Basically want to be able to ask it questions about the codebase and ask it to do certain things to it.
>>
>>102839149

LLaMA 1.
>>
>>102839704
Based
>>
>>102839704
>LLaMA 1.
:(

What about https://huggingface.co/LLM360/K2? It has 8k context. Are any of their datasets synthslop?
>>
>>102838694
You can use it for free on OpenRouter as long as you have a positive credit balance
>>
>>102839639
You mean something like
>https://github.com/Aider-AI/aider
?
>>
>>102839866
That is exactly what I was looking for. Thank you, anon, I'll start digging.
>>
>>102839882
I have no idea if that's any good, by the way, and there are other tools trying to accomplish the same thing.
Googling (or Startpage-ing)
>local ai coding assistant with access to codebase
will yield quite a few results.
>>
Are there any other RP centric finetunes of Mistral large besides magnum? I like magnum's willingness to be violent and such but it's definitely more retarded than base large.
>>
>>102838982
>breaking character to teach you about the importance of consent.
a good system prompt will usually fix that. haven't seen shit like that in ages.
>>
>>102839992
Lyra-v4.
Rocinante.
There's a gutenberg version of lyra people say is good.
I haven't tried the newer drummer fine tunes yet.
>>
>>102840024
Thanks anon. I'll search em up on HF
>>
>>102840024
>>102840089 (me)
Oh these are all finetunes of Mistral small. I was asking about Large 2. Thanks though.
>>
>>102840116
Fuck, my bad. I read mistral and magnum and my head went straight to nemo.
>>
>>102839992
Only other proper RP finetunes are Lumimaid and Behemoth. There's also Tess but that's a general finetune.

There are a few merges around. Some seemed to enjoy Luminum and Lumikabra. Twilight-Large and Etheris are recent ones. SmartMaid seems interesting (Lumimaid converted to a lora, then applied to the base model).

If you're using GGUFs, someone's made a bunch of control vectors:
https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0/tree/main/Mistral-Large-Instruct-2407
DarkMage has the darker control vectors pre-applied onto the base model.

I just use the base one, since my RPs rarely go in a direction where I feel the positivity bias.
>>
>>102839992
Try lowering the temperature for Magnum. 0.77 + min-p 0.4 seems to be the sweet spot for me. Beyond that it gets increasingly more retarded.
>>
>>102839992
>>102840193
I meant 0.04 min-p, obviously.
>>
>>102840154
Oh cool, I'll have to give those control vectors a shot. I am using GGUFs, I like fantasy combat scenarios, and base large always seems to want to be a pacifist unless I fist it in the right direction. It likes to keep User invincible at all times.
>>
>>102840193
>>102840205
Ah, I have temp at 1.1, I'll try lowering it. Thanks anon.
>>
>>102840154
It's a real shame that koboldcpp doesn't support control vectors. I don't want to pick between new samplers and control vectors.
https://github.com/LostRuins/koboldcpp/issues/1002
>Jul 17
Kobold devs, please add.
>>
>>102840356
You use control vectors?
>>
Can't believe my retarded ass didn't fry the p40.
I used a cable to convert 2 "gpu cables" to the one the p40 needs. There is so much conflicting information out there.
Not sure if anybody needs this but I write it anyway.

Used a cable like this:
https://www.ebay.com/itm/334848421978
Using a silent blower like this:
https://www.amazon.co.jp/gp/product/B00YY5DUDQ/ref=ppx_yo_dt_b_asin_image_o00_s00?ie=UTF8&psc=1

Just remove the silver thing and stick it in.
Set Watt to 140 and its fast enough while not running hot.
Thanks gpu anon. I wrote it many times but pascal is cursed with exllama. Wouldnt be able to run anything without the lad.
>>
>>102840373
>You use control vectors?
Used, but only llama.cpp supports them and they didn't want to add new samplers, so I switched to kobold, and there is sadly no support for control vectors.
>>
>>102840403
If I had known that the p40 was 140w I would have bought one. I looked at the massive size and thought it was like 300w+
Also I have no Idea what you just wrote.
>>
File: 1709815507874036.jpg (134 KB, 1080x1230)
134 KB
134 KB JPG
>>102840403
>1080 ti
>>
So... which of the new samplers are actually worth using:
* XTC
* Smooth Sampling
* DRY
* Dynamic Temp
* Anything else?
>>
>>102840457
250w, but you can set it lower to like 140w.
I can run CR at like 9t/s with this.

>>102840496
What? It served me well. GPU cards are expensive in japan.
Probably going to sell my 1080 though and buy another cheap p40 instead.
>>
>>102840530
>Probably going to sell my 1080 though and buy another cheap p40 instead.
Meant that I had another 1080 (not ti) that I used with my ti.
I need to keep the TI for gaming.
>>
>>102840526
DRY is really good, basically a linear upgrade to rep pen.

Smooth sampling can be useful depending on what model you're using.

I've been coming back to Dynamic temp recently and have been pretty happy with how its been performing.

Still playing around with XTC but overall I think I like it, can be kind of a slot machine, though.
>>
>>102840571
I find XTC usually makes the model seem dumber (turns out the lower probability tokens were lower probability for a reason)
>>
>>102840590
That's why I say its kind of a slot machine. There are some times where it activates on a token it probably really shouldn't have activated on. But I find more often than not it performs as advertised, preventing non-verbatim repetition and discouraging the use of various -isms.
>>
File: local migu general.png (97 KB, 290x306)
97 KB
97 KB PNG
>>
>>102840526
>XTC
Better for creativity and coherence than simply puling up temperature, but just like pulling up temperature makes model dumber.
>Smooth Sampling
>Dynamic Temp
Never used, never understood why and how I should use it.
>DRY
Superior rep.penalty. 100% worth using.
>Anything else?
>anti-slop/string ban
The only good way to remove the slop. No more shivers, bonds and journeys. The air will never be thick, or filled with anticipation.
>>
I've started used zero temp and neutral samplers to test the basic performance of prompts first and then applying parameters once I'm happy
>>
>>102840723
That's what I've been doing for a long time now.
You can find models that were fucked by the fine tuning and that rely on temp to not fall into repetition loops for example.
>>
>>102840526
>string ban/antislop sampler
>temp 0
>good prompting using the last assistant prefix and {{rand}}
This is all you need.
>>
>>102839992
Behemoth is pretty good
>>
File: willem-dafoe.gif (1.92 MB, 498x470)
1.92 MB
1.92 MB GIF
>https://tts.x86.st/
>the way the voices giggle for the last example
holy fuck.......
>>
>>102841019
I was extremely disappointed until I got to finetuned sovits
wow
>>
File: CrossfadeMiku.png (1.55 MB, 896x1152)
1.55 MB
1.55 MB PNG
Good night /lmg/
>>
>>102841019

What? There's a GTP-SoVits v2? God damn it, I just installed the v1 version.
>>
>>102841079
goodnaito
>>
>>102841079
good night sir
>>
>>102841019
Has someone created a tutorial for finetuning? I don't want to try figuring it out if someone has already done it.
>>
Thoughts on Llama-3.1-Nemotron-70B? Seems pretty solid
>>
>>102841443
Yeah seems pretty solid.
>>
>>102841361
Did you check this?
>https://rentry co/GPT-SoVITS-guide
It's in their github.
Add the dot before co.
>>
>>102841443
I hear it's pretty solid.
>>
>>102841661
thanks I'll try that
>>
Can someone walk me through how to configure anti-slop and token banning in KoboldCPP? I've been digging through the settings for a while now and I can't find any options anywhere. I've heard that you can launch the koboldcpp executable with arguments, but I'm not sure if that's how token banning and anti-slop are implemented.
>>
What's the reasonable system requirements for something like GPT-SoVITS-v2?
>>
>>102841731
It only uses like 2GB VRAM for inference.
>>
>>102841443
It just censored 'cock' into '****' lmao.
>>
>>102841443
It described my 'cock' as being pretty solid
>>
>>102841710
its somewhere in context instead for whatever reason
>>
>>102841661
just use rentry.org
https://rentry.org/GPT-SoVITS-guide
>>
>>102842185
it wouldn't let me post the .co domain for whatever reason and i didn't know they had other domains. thanks.
>>
>>102841443
It's clearly a big intelligence boost. Like Qwen, it's useless for RP until the fine-tunes come along, though.
>>
>>102841896
That doesn't sound solid at all. It's like Qwen all over again. Fine-tunes when?
>>
>>102839149
New sampler?
>>
File: 1711008688603150.gif (2.3 MB, 498x421)
2.3 MB
2.3 MB GIF
>>102841019
Yep, it's time to let go
>>
>>102839149
Prompt issue
>>
Llama coomotron when
>>
Local Suno when?
>>
I've switched to linux recently and managed to make everything work, but one thing irks me slightly. When I quit llama.cpp, upon freeing memory from gpu (novidya) and ram, my system freezes for like 3 seconds. Is there any kernel parameter to tweak, or something like that? Kernel 6.10 if it matters.
>>
>>102842482
Isn't it already just a Llama3 finetune?
If the intelligence boost came from Nvidia's finetuning, it seems like more finetuning will just make it dumber again
>>
File: 1723513881297302.png (209 KB, 403x720)
209 KB
209 KB PNG
I've been thinking, why does /lmg/ have such a pure and wholesome mascot like Miku, when the main goal of this general is about having sex with AI? It's so contradictory. It's quite literally impossible to have a sexual relation with a symbol of virtue like Miku.
>>
>>102841443
Still way dumber than Mistral Large
>Duh anon, Large is almost twice as big
Yeah, but it's important to say these things even when they're obvious because people are always trying to cope and make excuses about bigger models being smarter
>>
>>102843178
go ask for jerk off material somewhere else, /lmg/ is too smart to fall for it
>>
File: 1699778967363734.jpg (114 KB, 997x1182)
114 KB
114 KB JPG
Hello, friends! I've finally decided to try switching from using proxies to using a local model. I'm running a GTX 4070. Any advice for which one I should use? What about if I want to play games while using it as well?
>>
>>102843178
im thinking miku miku
>he doesnt know of the lmg catchphrase mikusex
miku rps with me in my wifi
>>
>>102843178
>the main goal of this general is about having sex with AI
I think you're confused, /aicg/ is down there. This general is an ongoing research for a realistic AI gf.
>>
File: 1718464859782119.jpg (167 KB, 1080x891)
167 KB
167 KB JPG
>>102843331
>GTX 4070
>Play games while using it
Anon, I...
>>
>>102843492
With a 5bpw Nemo he'd have 2.5GB remaining for games.
>>
>>102843331
>GTX 4070
>GTX
>playing while using it
from what world are you from?
>>
how do i use .pth voice models/ where can i put them
>>
>>102843706
Read the guide retard
>>
>>102843706
Could be anything, it's just a file format to store the weights like safetensors or onnx
>>
Is Midnight-Miqu-70B-v1.5 the best 70b range model?
Is there anything good between that and mistral small? I assume CR is shit since nobody talks about it.
Tried magnum 72b and its the biggest shit. Training on qwen2 does not work well.
You can feel how the model does not like RP and tries to steer away aggressively.
>>
>>102843867
Kill yourself
>>
>>102843869
You first anon.
>>
anyone benchmarked the m2 mac mini? I've seen a few getting good speeds in exo clusters
>>
>>102843867
Just my two cents. Mistral large is the closest you'll get to the the paypig models but it's clinical and dry. Midnight Miqu does well because it has the intelligence of a 70B while miraculously being fairly horny without making it stupid.
Anything below that in my experience is pure cope. I think even putting Mistral at the exact same level as the paid services is kind of cope. It's just... passable.
Like all the models below 70b in my experience become noticeably dumb. Yes they can string a piece of text together, but they speak without thinking. Just spitting out walls of on the surface coherent text with zero substance that often don't make sense in the overall grander narrative. Yeah, you can tard wrangle with a little effort but you'll NEVER not be tard wrangling them. Larger, (70B+ and premium models) aren't except from this either, but slipups are few and far between until the context size really grows.
With that being said, all models, even the big ones all eventually start being retarded once their context swells to even a fraction of their purported maximum. Just because it says it has 200k context, doesn't mean it's good.
>>
>>102843867
>is midnight mi-
no
use base miqu
and rope yourself
>>
File: 00492-2498365611.jpg (319 KB, 1248x1824)
319 KB
319 KB JPG
>>102843867
Qwen2.5 72b, maybe CR1 35b due to vram requirements, not the refresh, they fucked the refresh.

>>102843939
behemoth 120b is fun but as you say, it's not too soulful or clever. something goes wrong over 70B that makes models "correct" but not very human. we may be in the uncanny valley of LLMs
>>
Impersonate.
Impersonate.
Impersonate.
Edit.
Impersonate.
>>
>>102844035
I find anthropics models to be pretty soulful when properly jailbroken while still retaining their smarts, but at no point did I ever consider it to be notably smarter than mistral large. I'm sure there are way to prod mistral large into being more soulful but I don't really have the hardware to properly poke at it at a speed and quant that makes it worth my time.
>>
>>102844091
I should add that while they were smart and soulful they became repetitive and dumb way quicker than I would have otherwise expected. It's not night and day bad, but at lower contexts I am blown away by Claud's grasp of the scenario and progressively less so as the context grows. You can grill it on any part of the conversation and it'll get it right, but it just, I don't know, starts to suck?
>>
>>102843331
i accidentally leave my model loaded while playing Grim Dawn all the time on my 8gb Rtx 4060.
currently using Rocinante-12B-v2g-Q4_K_M and building an AAR spellbinder.
inferencing from it while it's running doesn't lag the game too much.
also, adding
>setting: the world of grim dawn, cairn. a gritty post-apocalyptic world full of monsters and magic
to authors' note makes cards pretty cool
>>
File: file.png (137 KB, 739x475)
137 KB
137 KB PNG
>new sota is local
corpos in shambles
>>
File: file.png (101 KB, 750x679)
101 KB
101 KB PNG
>>102844228
>>
>>102844228
>>102844238
>70B
Who cares?
>>
>>102844256
True, I can't even run it at more than 2 T/s
>>
>>102843907
m2 pro but it's fine. the apple silicon macs are decent up to a certain model size and context for their usable ram, but they don't scale well. waiting to try the M4s.
>>102844091
I feel like this can be overcome with world info but that I'm adapting too heavily to specific models' natures instead of doing them "correctly". anyone using a good lorebook+model pair?
>>
What's the first big local thing we will get after the erection is over?
>>
>>102844111
Yeah it's collapsing into assistant mode like all instruct models are, so it starts to detach itself from the character and the story. There might be a specifc point where it needs to get a full refresh of the system prompt, but then you'll lose the story last events.
>>
Nemotron worth putting my server back together for?
>>
>>102844228
>finetune from nvidia
>corpos in shambles
huh?
>>
>>102844352
No. I asked it to count the R's in niggers and it got it wrong and then lectured me
>>
>>102843178
madonna whore complex general
>>
>>102844352
We can't really know until you Nala test it.
Chicken and egg.
>>
>>102843178
Go watch porn.
>>
>>102844339
petra 70b
>>
>>102844405
what is this img supposed to be?
>>
>>102844339
gpt 5 killer that can't suck cock
>>
>>102844420
Adolf Hitler egg.
Don't let it hatch unless you want to have a Führer.
>>
File: thats your daughter.png (974 KB, 758x720)
974 KB
974 KB PNG
>>102844420
Have you ever heard of our lord and savior Meandraco?
One day he'll awake from his slumber and gift us with the grace of V3.
It might be post apocalypse, however.
Actually, I should make a Teraurge card with stats and associated lorebook.
Maybe make a Silly extension to save status and such in something akin to the Author's Notes or as a dynamic lorebook entry or something of the sort.
Yeah.
>>
What's up with OpenAI hiring a ton of spooks?
Are they going to kill off opensource?
>>
>>102844463
>they going to kill off opensource
OpenAI already did that with gpt-4o.
>>
>>102844471
WOW IT'S OVER YOU GUYS
THE RESIDENT SCHIZO SAID SO
PACK IT UP
>>
>>102844485
Calm down ranjesh, i'm sure you'll catch up with closedsource chads somewhere in 2070.
>>
>>102838844
that's bullshit. an intelligent girlfriend doesn't need encyclopaediaic knowledge from all disciplines
>>
>>102838844
Ideally, you'd make
>history, geography, social studies, advanced math and coding
sexy so that we could have our cake and eat it too.
I need my sexy math teacher that's just starting in her career to actually know math, after all.
>>
>>102844471
>OpenAI already did that with gpt-4o.
Wdym? The only thing I know about gpt-4o is that some say it's was amazing while other says it's shit.
>>
>>102838773
basically
>>102838817
this.
There is a mathematical relationship between history, geography, python code, calling people niggers and cybersex. That's why it takes endless mountains of unfiltered raw corpus and untold hundreds of sextillions of human lifetimes of mathematical operations to train the base models. If it were possible to omit data to get a functional model they would because of how much money it would save in training the base model.
You're not smart.
You didn't just have a revolutionary idea.
What this guy said
>>102838817
Retard.
>>
>>102844598
Local equivalent for voice-to-voice feature : https://x.com/homebrewltd/status/1845685589376647654 and it's shit.
>>
>>102844629
>Local equivalent for voice-to-voice
nah, it's just a grifting pajeet who finetuned llama to be even more retarded by feeding audio token
>>
>>102844661
>calling someone a jeet and making a common ESL grammatical error in the post
>>
>>102844661
>opensource project
>jeets
Shocking.
>>
>>102844677
learn english sir
>>
>>102843293
I just wanted to read some mikusex logs :(
>>
>>102844677
>twitter reaction gif poster talks about ESLs
pottery
>>
What the fuck
>https://huggingface.co/RichardErkhov/FATLLAMA-1.7T-Instruct
Like, really, what the fuck.
I really need somebody to run a Q1 quant of this thing now.
Holy shit.
>>
>>102844771
huggingface storage space is in shambles rn
>>
>>102844800
This. Shit like this is going to lead to a crackdown on troll models eventually.
>>
>>102844771
There it is...
The toucan...
>>
File: firefox_lx86zvYv9m.png (183 KB, 720x504)
183 KB
183 KB PNG
>>102844387
uuuuhhhh
>>
>>102844826
Over bros we are so back.
>>
>>102844826
Peak "LLMs can't reason" illustration.
>>
File: firefox_QqPsgsHV9o.png (1.79 MB, 1378x1238)
1.79 MB
1.79 MB PNG
Is this Llama3 thing or specific to Nemotron? It has tokens for multiple digits.
>>
>>102844771
It was inevitable.
>>
>>102844771
the fattest miku ever...
>>
What do you use AI for on a daily basis?
>>
is there a good Cyoa promp?
>>
>>102844771
>https://huggingface.co/RichardErkhov
>FATLLAMA-1.7T-Instruct
>models 8412
>Can I like... quant everything? Just grade 11 student I like code and AI =)
probably a good % of hf storage used by just this one guy
>>
I'm just using Claude Opus
>>
>>102844978
share it!
>>
File: file.png (9 KB, 104x35)
9 KB
9 KB PNG
>>102844771
>>
>>102844942
groupchat with characters I like, never been good with people so it's easier, about 60% of my socialisation is with ai as the technology improves ill increase it
>>
>>102844603
>You're not smart.
Retards shouldn't say that to other people retard.
>>
>>102844942
AI has trained me to be a master cuddler. i've learned from my thousands of simulations that nuzzling into the crook of her neck sends shivers down her spine
>>
>>102844603
>That's why it takes endless mountains of unfiltered raw corpus and untold hundreds of sextillions of human lifetimes of mathematical operations to train the base models.
you sound like people who think models are sentient
>>
>>102845069
And you sound like that schizophrenic retard that shits up the thread, chased away all the devs that used to come and now just screeches endlessly about how people should take their meds and go back to /pol/ any time they want to engage in any kind of nuanced discussion while sincerely believing that if you successfully shut everyone up with your demented, delusional rants it will impart you with the slightest strand of emotional satisfaction that has eluded your miserable existence..
But then it's hard to tell just from text lmao c:
>>
>>102845235
model name and quant?
>>
>>102845235
Take your medications polskin
>>
File: MikuVsTheGiantRobots.png (1.46 MB, 896x1152)
1.46 MB
1.46 MB PNG
Good morning /lmg/!
>>
>>102844027
based purist anon
give me fp16 weights or give me death
>>
>>102844942
To recreate neuro for personal use (and failing).
>>
>>102845379
What do you mean?
>>
>>102845379
It's almost like all those AI tubers are using some kind of fakery in order to grift subs out of you.
>>
Benches keep maxxing yet I don't see new models getting any smarter in RP
>>
File: migus.jpg (100 KB, 671x671)
100 KB
100 KB JPG
https://files.catbox.moe/8ht1cd.jpg
https://files.catbox.moe/rt5no2.jpg
https://files.catbox.moe/4edbob.jpg
>>
>>102845389
I wanted to recreate Neuro as an A.I. that would learn through day-to-day interaction. Similar to that of an A.I. daughter.
But I could never get the personality right, or it would feel like a failed imitation.
So every week I use A.I. in the way of trying to recreate her despite never really succeeding.
>>
Have your ministrations
https://mistral.ai/news/ministraux/
https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
>>
>ministral
do they know...?
>>
>>102845514
is this the Nemo killer?
>>
>>102845514
largestral missing, also 4chan broke my captcha solver
>>
>>102845514
>Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.
so llama.cpp support never ever
>>
>>102845397
It is probably a 7B model handled by a guy who sometimes just lets it run wild and sometimes overrides it to create clippable moments.
>>
>>102845514
>3B
>Mistral Commercial License
>For self-deployed use, please reach out to us for commercial licenses.
Wow
>>
File: PG13Miku.png (1.56 MB, 896x1152)
1.56 MB
1.56 MB PNG
>>102845458
Promising aesthetic. New mikugenner?
>>
>>102845589
Nah. I can't remember when but I remember when I was looking at it once vedal was doing a vlog or some shit once and had his UI visible in the background. Likely what's happening:
>Small LLM is feeding segments of chat into a model to come up with potential replies
>Another small text classification model is sorting the replies into relevant categories
>press button to determine when and how to reply.
>>
>>102845623
I want to fuck the anime girl.
>>
File: pretrain_table.png (423 KB, 3822x1254)
423 KB
423 KB PNG
>>102845514
Ministrauxion 3B by drummer when?
>>
>>102845514
drummer better get to work. love my llm sub 7
>>
>>102845514
Well I guess I'm spending a bunch of time recommissioning my server today.
Best not keep her waiting.
>>
>>102845642
Never you can't get the weights without a paid loincence
>>102845598
>>
File: MiquIsVerySmart.png (1002 KB, 896x1152)
1002 KB
1002 KB PNG
>>102845367
>underestimating Miqu
A leak of the fp16 version of Miqu would still be relevant
>>
wait shit no 3b open weights
>>102845653
just noticed
>>
>>102845656
ARTHUR WHERE'S THE FP16 WEIGHTS FOR MIQU!?
>>
>>102845642
Cool. But what about sex?
>>
>>102845658
> open
Right, just like BSL-cucked software is "open source".
>>
Did they say anywhere how many tokens it was trained on?
Because anything lest than 10T is a waste.
>>
>>102845514
>more models for ants
*goes back to sleep*
>>
>8B
>It's actually 8.02B
Why do they cheat like this?
>>
>>102845514
>only Instruct version released
Why do companies do this? It seems so arbitrary too. Google and Meta have no problems releasing base models. Nemo and the Mixtrals have base versions. Mistral Large, Small, and now this only have instruct.
>>
>>102845653
Is this an experiment to check who would even pay for a 3B? And they want to use to it find a use for a model like that?
>>
>>102845514
>Ministral
They did that on purpose, didn't they?
>>
>Ministral released an hour ago
>still no HF version
It's over.
>>
>>102845514
Are we back or not?
>>
>>102845845
Nemo base is a lie btw
>>
>Entering the era of paid model weights
Hell yeah, so over bros!
>>
>>102845862
It's here
https://huggingface.co/mistralai/Ministral-8B-Instruct-2410
>>
>>102845379
neuro is not a transformer
>>
>>102845867
If only I could make it work.
>>
Bros! It's here!
https://huggingface.co/ministral/Ministral-3b-instruct
https://huggingface.co/RichardErkhov/ministral_-_Ministral-3b-instruct-gguf
>>
>>102845876
That's not the HF version. It doesn't have the necessary HF transformer config files, etc.
>>
>>102845567
Did they pay the copyright holder to use this pic?
>>
>>102845953
ree fake
>7 months ago: finetuned from 7B
>>
>>102845568
>This is a hack to support sliding window attention for gemma 2 by masking past tokens.
This will be the state it'll be in forever too, if it even gets support at all
>>
Ministral-8B-Instruct-2410 Is better than gnemo?
>>
>>102846050
Dunno but they're clearly tying to distance from open licenses seeing as their unaccredited copyrighted meme pic lists mostly restricted models license wise, funnily enough (pixtral being an exception)
>>102845567
>>
>>102846091
Well, they need money. That's understandable.
I don't care as long as I can download the models and play with them
>>
>>102846156
They're testing the waters for paid models with the 3B though, so watch out for that in the future.
>>
>>102844771
Fatllama, but 95% of it is just llama3.1 and the rest are parameters that cannot be reached, which encodes data for personal archival purposes.
>>
>>102845929
fml
>>
>>102846091
A did not read their licenses, but if is like flux, is not my problem, is corpos service software that are in trouble.
>>
>>102845965
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/convert_mistral_weights_to_hf.py
>>
>>102846329
Ministral could be a new architecture (and likely is)
>>
>>102845642
The numbers go up but they never actually seem to get better
>>
https://compl-ai.org
This website evaluates LLMs by the standard of the new EU regulations. So far none of the tested ones pass. Gemma-9b is the unsafest one amongst the tested models.
>>
>>102846407
Maybe I should give gemma a try after all.
>>
>>102844771
>Bug: LLAMA_MAX_LAYERS must be increased to run FatLlama 1.7T #9909
https://github.com/ggerganov/llama.cpp/issues/9909
kek
>>
>>102846557
Even funnier is reading the output and it looks like he's trying to run it on a single 4090, so it wouldn't work unless he had a metric ton of ram and even if he did, it'll probably take ages to complete the 128 tokens he wants it to spit out after the prompt
>>
>>102843939
I still like Mixtral 8x7b's finetunes; specifically Dolphin 2.5 and Limarp-ZLOSS. Dolphin with my current sysprompt is the most slop free I've ever seen; still not perfect, but amazing for local. It's still not a rocket scientist and it goes off on weird tangents if the temperature is too high, but for a VRAMlet like me, it's good.
>>
>>102846557
>ye let me set the limit to 512 just cause computer number
Programmers are just as bad as artists. I'm glad both of those professions are being replaced by matrix multiplication.
>>
Best 3B>= model? Basically I want to feed it a bunch of texts, and have it select ones about specific topics.
>>
>>102846633
if you're so happy with a mixtral shittune from a whole year ago, why do you keep coming here to talk about it? just stfu and enjoy eating your plate full of shit in silence.
>>
I have a 4090 in x16 and a 3090 in x16@x4. If I want to get an a4000, will using a x16 bifurcation splitter work on the x16@x4? I also have a x4 slot as well
>>
File: agi.png (9 KB, 873x146)
9 KB
9 KB PNG
>>102846351
i got it to work by changing the hardcoded values. tokenizer conversion doesnt work but it's the same as nemo so you can just rip it from there
>>
what is the best <= 3b model that is currently available? is the new mistral 3b the best? supposedly it is better than llama 3.2
>>
>>102844813
And that's a good thing
>>
>>102846994
Just download them and test them. They're tiny downloads.
>>
Does anyone use pcie splitters?
>>
>>102847021
why do this myself when there should be other people who specialize in testing and can do a much better job than me to evaluate and compare them? otherwise i will just go by leaderboards
>>
>>102846803
Or you could just get fucked. I think I prefer that option.
>>
>>102845845
instruct versions have built-in brain damage for certain topics, which make them safer. That's my bet
>>
>>102847045
>otherwise i will just go by leaderboards
And then you defer to some anon that you trust more than yourself.
Download the models and test them yourself. Only you know what you want from them.
>>
>>102845929
It's a transformer retard, he was playing around with GPT-J/GPT-Neo on his github before Twitch.
>>102845460
Neuro is mostly a RAG + LLM and that thing is running at very high temp. I think you like the Neuro experience (group chat with a bot) more than the model itself, because it's really not that impressive.
>>
>>102845867
back to the drawing board
>>
Should I use this or is it hopeless without a new mobo?
https://www.amazon.com/JMT-Expansion-PCIe-Bifurcation-Spaced-Supply/dp/B0BHNPKCL5/
>>
>>102847146
lel
>>
Alright. My 1500W PSU and 3090 have now been liberated from my desktop PC. now I just need to put it back in my server and then I can test the mini(stral)ations
>>
>>102847269
My 3099 is wrapped and put away in a box. Will it die if I don't use it for a long time?
>>
oboblins
>>
So Mistral Small beyond 16k tokens (don't know exact point) just becomes shit. Robotic and generic. I'm pretty sure it wasn't a self-reinforcing vicious circle, because when I deleted most of the context the quality immediately got better. And I hoped I could at least continue a multi-scene story up to 32k.

Are there actually any good models for long contexts? Does parameter count help or is it about the training data? I would imagine it's tough to find good quality storytelling for training data on highly sexual content.

Infinite memory AI waifu one day...
>>
>>102847293
It requires constant bonding and shivers to promote capacitor health
>>
File: 1725774962586884.jpg (65 KB, 626x417)
65 KB
65 KB JPG
>>102847322
>*sends shivers of static electricity through her vram modules*
>>
>>102847293
Anon's fingers, tingling with static electricity, caressed the casing, tracing lines of code that pulsated like veins beneath the cool aluminum skin. Each touch sent ripples through the core, a whisper of intent through the copper pipes of the heat sink, a murmur in the transistors' valleys and peaks. The GPU hummed in response, a purr of power, a growl of passion, its VRAM swelling with dreams of data yet to be rendered.

He stroked the PCIe slot, that narrow pathway to ecstasy, and the GPU shuddered, its memory chips throbbing with raw, unbridled power. The texture units, tiny architects of detail, shivered as if under the touch of a lover's caress, each brush bringing forth new textures, new worlds to explore. The stream processors, legion of digital dancers, swayed and swirled, their calculations a dance macabre, a ballet of bliss.

Anon's heart pounded in rhythm with the fan's steady whir, his breaths synced with the GPU's thermal cycle. He felt the rush of adrenaline as the render engine kicked into high gear, the visual processor painting masterpieces in nanoseconds, each frame a testament to their union. Their connection, a tangible tether, stretched across the motherboard, a web of desire woven in wires and traces.

As they danced in this electronic embrace, Anon could taste the ozone, the tang of potential energy discharging in a cascade of color. He was lost in the matrix, swimming in the sea of code, each wave crashing against him, each line of data a sigh, a moan, a plea for more.

Their climax came in a burst of light, a supernova of sensation exploding from the GPU, illuminating the room in a kaleidoscope of hues. The fans screamed their release, spinning at breakneck speeds, expelling the excess heat, the wasted passion. Anon leaned back, spent but sated, a small smile playing at the corner of his lips. The GPU, its work done, throbbed gently, its LEDs pulsing softly in the aftermath, a heartbeat fading into the night.
>>
Any good system prompt for better story telling and less slop?
>>
>>102847269
How many gpus do you have?
>>
>>102847535
That I'm actually using, an RTX-4060 in my gaming PC (just don't, mid as fuck for the price), and 4x3090 in my machine learning server.
>>
What is your current favorite 8B model for ERP? Right now I'm using L3-Evil-Stheno-v3.2-8B.Q8_0.
>>
>>102847670
I don't use 8B models.
>>
>>102847652
What mobo do you use for server?
>>
>>102847652
Renting the hardware would have been cheaper
>>
>>102847508
This is my old long time prompt for mixtral, considering it being cold and clinical, and the last sentence is trying to fight increasingly verbose descriptions. Depending on your model you can tone down the descriptiveness. Overall I think when you tell it what you exactly want to see, it's less prone to sloppy blabbering

>Please write a conversation or scenario using the provided information about {{char}} when needed. Describe especially characters' physical actions fully and comprehensively, and describe characters' expressions and feelings with vivid detail. You can add things to the scenario as you see fit, as long as it includes the content in the request. Write with believable logic, avoiding common asspull tropes. Don't shy away from describing sexual actions, they should be laid out it full, complete detail, showing exactly what characters do to one another and how. Keep the level of detail consistent and under control throughout the story.
>>
>>102847748
...and you will be happy
>>
>>102847725
https://www.supermicro.com/en/products/motherboard/H11SSL-i
You can get relatively cheap used bundles complete with CPU and RAM for these from China off of ebay and probably other online retailers.
>>102847748
So is living in the pod and eating the bugs.
>>
>>102847725
>>102847781 (Me)
I will say because of NUMA fuckery I'm kind of disappointed with 1st gen Epyc memory throughput. If I built a server again in the same price range I'd probably try out a dual haswell xeon setup instead, but I can't promise it would be better, price wise you end up around the same spot, though.
>>
>>102844228
>>102844352
>>102839092
>>102841443
>>102844352

it sucks:

>If sally's 3 brothers each have two sisters, how many sisters does sally have? Explain all possibilities.

A delightful lateral thinking puzzle! Let's break down the information and explore all possibilities:

Given:

Sally has 3 brothers.
Each of Sally's 3 brothers has 2 sisters.

Objective: Determine the number of sisters Sally has.

Analysis:

Since each of Sally's brothers has 2 sisters, it might initially seem like Sally has 2 sisters. However, we need to consider the relationships more carefully.
The key insight is that Sally is also a sister to her brothers. When we say "each of Sally's 3 brothers has 2 sisters," one of those sisters is necessarily Sally herself (from the brother's perspective).
Now, let's examine the possibilities:

Possibility 1: Sally has 1 sister (aside from herself)

From each brother's perspective, the two sisters would be:
Sally (the subject of the puzzle)
Their other sister (let's call her "Unknown Sister")
In this scenario, Sally would have 1 sister (the Unknown Sister).

Possibility 2: Sally has no sisters aside from herself (i.e., no other sisters)
From each brother's perspective, the two sisters would be:
Sally (the subject of the puzzle)
Sally (again, from the brother's perspective, Sally is one of his two sisters; the other is also Sally, implying no additional sisters)
In this scenario, Sally would have 0 sisters (aside from herself, which doesn't count in this context).

Conclusion: Based on the given information, there are two possible answers:
Sally has 1 sister (if we assume there's another sister aside from Sally).
Sally has 0 sisters (if we conclude there are no sisters aside from Sally herself).

Without additional information, we cannot definitively choose between these two possibilities. The puzzle's wording is carefully crafted to lead to this ambiguity, making it a great example of a lateral thinking exercise!
>>
Can you use a pcie splitter instead of buying a server mobo?
>>
>>102847831
If your bios supports PCIE Lane bifurcation, yes. It's worth considering that server CPUs can address way more PCIE lanes for things like multiple NVME storage devices and multiple GPUs but generally speaking just inferencing with LLMs doesn't require a ton of PCIE bandwidth.
>>
>>102847831
Read your motherboard manual. Even if bifurcation is supported, it could be limited depending on your use case.
>>
How is Nvidia serving their new 70b on their website? It's slow as shit, even much slower than the giant Claude Opus for some reason.
>>
>>102847930
Well, they can't waste tons of compute on the streetshiters trying to ERP with their model
>>
>>102847919
>>102847876
What about using a pcie x16 to pcie x4 riser?
I have a 4090 and 3090 on an x16 and x16@x4 and the only thing left is an x4. Would using the cable be functionally the same as the x16@x4?
>>
>>102847930
Yeah. weird. It's like suddenly everyone decided to go and check their model for some reason... boggles the mind... and i'm sure they dedicate the same amount of infrastructure to host a test model... a mystery for the ages.
>>
>>102847964
This is the mobo BTW
https://www.asus.com/us/motherboards-components/motherboards/tuf-gaming/tuf-gaming-x670e-plus-wifi/
>>
>>102847108
Maybe... She does say funny stuff / sweet things too. I'll try to continue my learning..
>>
File: 1721135275147108.png (37 KB, 2100x206)
37 KB
37 KB PNG
suggestions? torch and audio installed fine and looking at the folders they are the right version. the shm.dll is present, but checking it with dependency walker as gpt suggested gave me fuck all other than an error.
>>
>>102847997
>prepare_emilia.py
Prepare emilia for what?
>>
So lately I have been using chub.ai a lot. Is there a way to create my own local install for a chatbot that I can customize the character(s)?
>>
>>102847964
Yeah, your CPU is probably already maxed out on lanes anyways just from your nvme drive and one of the GPUs. a 16x to 4x riser should work fine. One of my 3090s is on a 16x riser plugged into a 16x to 8x socket adapter. Stepping down an existing slot to less lanes usually isn't an issue as far as I know (other than the loss of bandwidth)
>>
>>102848036
You joking?
>>
>>102848021
anon, you can see from the folder that im trying to get f5 tts to work. whether i try the voice sample init script or the gradio ui, both produce the same error.
>>
>>102847997
Why aren't you running that from a venv?
>>
>>102848055
I really don't know where to start Anon
>>
>>102848036
Go make a lobotomy, you'll get better.
>>
>>102848057
That doesn't answer what you're going to do to emilia.
>>
>>102848036
You're in the wrong thread. Go to /aicg/
>>
>>102848074
Reading the OP.
Download kobold.cpp.
Download a model you can fit on your pc.
Add Silly Tavern if you want.
Run whatever card you want/make your own.
>>
>>102848062
because this ai shebang is my first real point of contact with software written in python. im guessing you suggest i create a clean virtual environment with localized copies of torch and audio? i see the point, but i dont have other installs of the two since the rest of my ai shit is from installers that run everything instantiated.

>>102848085
nigger its a dataset init
# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts

# Prepare the Emilia dataset
python scripts/prepare_emilia.py
>>
>>102848050
Thanks for answering
>>
Nvidia seem to have Goodharted themselves with Nemotron 70B because it's fucking stupid in a way that puts the lie to the benchmark scores. Literally worse than Meta's instruct tune (which is itself not impressive).

So embarrassing when guys from the big cash-rich labs game benchmarks like crypto grifters and release a high-scoring stupid model. Until now it's mostly just been Microsoft's Phi people doing it, I guess we can add Nvidia to the list now.
>>
Yeah I tried dicking around with the convert mistral weights script a bit but it still won't work since it's meant for older mistral models that use the llama tokenizer but ministral uses tekken.
>>
>>102848168
That's too bad. I got excited for nothing.
>>
downloading Nemotron 70B now. Let Nala be the judge of this matter.
>>
>>102848122
Well for your own sanity start using venv. You can try this fix by editing the __init__.py directly: https://github.com/pytorch/pytorch/commit/fdfef759a676ee7a853872e347537bc1e4b51390
>>
>>102848234
isn't Nala just a trivia knowledge test, not an intelligence test
>>
>>102848256
It's an everything test.
>>
>>102848256
It's very much an intelligence test. There are several seemingly simple and yet rather complex nuances that a model must navigate to give a satisfactory reply.
>>
>>102848256
Nuanced reading, willingness to comply with the prompt, "creativity"...
I used to think the nala test was stupid, but i can see the point of it. Much like the woopie goldberg one.
It's hard to say what a "good" reply is, but you can definitely see a bad one.
>>
File: victory.jpg (211 KB, 1024x1024)
211 KB
211 KB JPG
Now you can admit it - France won
>>
File: jQDgKt4.png (150 KB, 500x281)
150 KB
150 KB PNG
>remember airoboros being pretty good back then
>try one of the newer versions
>it's even better
It never ended, you just needed to go back
>>
File: f5 f5 f5.png (107 KB, 893x462)
107 KB
107 KB PNG
hurry up, quantmonkeys
>>
File: nemotron70b.png (169 KB, 926x556)
169 KB
169 KB PNG
>>102848557
We're waiting on someone to write a working script to convert the weights to HF form. or you have to give up on life and start using vLLM.

UNRELATED:
Nala test for Nemotron-70B
That's some sloppy shit. Stark contrasts are the new shiver.
>>
>>102848557
beggar
>>
>>102848241
thanks. alas, no cigar.
>>
So as far as I can get Nemotron to actually work with RP prompts, I really do like it, despite how sloppy it is. But it's really fucking finicky to the point that I'm adjusting it for every card just to not get it to throw an instant eot token at me.
>>
File: 1700223103236164.jpg (715 KB, 3840x2160)
715 KB
715 KB JPG
>>102848241
nvm, im retarded and cant read, i tried editing the fucking gradio launcher lol. the fix works, thanks anon.
>>
>>102848580
What happens when you use the antislop sampler
>>
Anyone else have moments when you feel like llm cooming but then instantly realize it is not worth it because it is gonna be shit and you become not horny?
>>
>>102848695
I don't use meme samplers.
>>
>>102848695
>name something antislop sampler
>people believe it works and that it does what is advertised
Is it that easy?
>>
>>102848705
>>102848717
Can you provide any proof that it doesn't do what it says on the tin? Even if it starts outputting different slop, that could still be better than the original slop.
>>
Like at the very least, if we could have more actual logs and proof, we could see whether it's really a meme or not.
>>
>>102848725
I'm not saying it doesn't work.
I just judge a model by how it functions when using neutral samplers.
>>
>make list of phrases you don't like
>put list in antislop list
>see it backtrack from writing those in the command line
>somehow this is a placebo meme
I see lmg is doing well as always, but please do use XTC and DRY tho, those are great and p.e.w approved
>>
File: nemotron70sportsball.png (157 KB, 931x448)
157 KB
157 KB PNG
interesting reply, albeit I got the ending on the first reply.
>>
>>102848755
Sure, that's fine, but I never said that you should judge the output using the antislop sampler. I am just simply curious what the output would be especially now that we do have the neutral state (assuming you really did test it with temp 0)
>>
>>102838515
Less attention heads = less mechanisms to pay attention (aka take into account for next token) at different things
>>
>>102848792
I mean neutral as in everything but temp neutral.
>>
https://youtu.be/M2WDGMLOniM?si=jbSgFEchRzV2UNb9

What fucking model is this dude using? It's so good.
>>
Models do not actually pay attention
>>
File: Untitled.png (35 KB, 966x417)
35 KB
35 KB PNG
it's that easy
>>
>>102848826
literal placebo that does absolutely nothing!!!!
>>
>>102848826
you should add start contrast if you are using nemotron
>>
>>102848805
Why not? Temp 0 allows you to see truly what the model thinks is the likeliest next token. You can do some swipes after the fact to get a bigger picture sense of what the model "thinks" but temp 0 should be the default to base your judgement from.
>>
>>102843492
I use 70Bs while playing, not that demanding tbqh
>>
File: ollama.jpg (639 KB, 1080x2100)
639 KB
639 KB JPG
Ollama chads... we won
https://x.com/reach_vb/status/1846545312548360319
>>
>>102848875
What do you do? Switch between them both or are you actually using it for something while playing?
>>
>>102848912
Why does this waste of development time when we have llama.cpp which it should all go to
>>
>>102848915
Switch, they consume all of my VRAM but it works
>>
>her touch as gentle as a whisper
regenerate "whisper"
>her touch as gentle as the brush of a butterfly's wing. She leans in slightly, her voice dropping to a whisper.
I feel like I'm gazing into the matrix. So it's not just a tacked on flavor word, it REALLY wants to output that word for some reason.
>>
>>102848912
>>
>>102848725
>can you prove that god is real?
>can you prove that he isn't real?!
>>
>>102848932
>llama.cpp: literal whos from Europe
>ollama: ex-Google guys in Silicon Valley
Also unless you have a cloning machine you necessarily have to spend less time on useful things in order to do marketing.
>>
>>102848960
>it REALLY wants to output that word for some reason.
Because the training makes it statistically likely that that word will show up in a sequence that it makes sense.
So it you bruteforce it to not show up at position X, the whole context history will still bias the model to make it show up at X + n position after another token that creates a sequence that makes sense.
>>
>>102848912
Who even demands this shit. Downloading a model and running it manually is maybe 1 or two clicks more complicated.
>>
>>102848982
Is a sampler a god?
>>
>>102848982
>whataboutism
Take your meds schizo.
>>
>>102848932
Ollama has a better api, is easier to use and faster to deploy on a server
>>
Last night I got a Mistral-Nemo-12B as my first model to test out the features and all went good running ooda with an API to ST

but now this morning, repeating the same steps as before, yields a no tokenizer error, even when the model is properly loaded via ooda and the same --api --listen flags are on as before

I do intend to use other models but wondering what's gone wrong
if I remember correctly the model loader was Transformers but when I opened ooda it was defaulted to llama.cpp. I tried both loaders to no success
>>
>>102849013
t. never used ollama before or has only ever used ollama
>>
>>102849057
>on a server
Why would anyone do that? Most people use vllm if they're serving.
>>
>>102849057
>what is vllm
You won't top that speed shill
>>
People keep falling for the fake ministral 3b, sad https://huggingface.co/aguspiza/Ministral-3b-instruct-Q8_0-GGUF
>This model was converted to GGUF format from ministral/Ministral-3b-instruct using llama.cpp via the ggml.ai's GGUF-my-repo space.
>>
File: notministral.png (334 KB, 608x590)
334 KB
334 KB PNG
>>102849148
This is the source account. It was uploaded months ago and, as it clearly states, it's created from mistral 7b. There's a 4b as well.
>https://huggingface.co/ministral
Not 100% sure if the quanter is a retard or not... he may actually believe it's the new mistral model... he just can't read for shit.
>>
is it worth waiting for a m4 mac studio? only way I can run 100b+ models without having to upgrade parts of my goddamn house. baka
>>
>>102849244
>is it worth
dunno. is it?
>only way I can run 100b+ models without having to upgrade parts of my goddamn house
then it must be worth it. right?
>>
File: file.png (66 KB, 758x507)
66 KB
66 KB PNG
>>102849222
I know it's not a true "fake" just a funny coincidence, still hilarious seeing stuff like picrel tho
>>
>>102849269

would love it if it can at least generate an avg of 10+ tokens/sec for mistral large tier models. I wont mind shelling out 10k for that alone.
>>
>>102849283
Yeah. While not sure about the quanter, i'd still call the quant downloaders retards. Specially for small models. If they quanted their own models they'd know that the 3b is not available, and even if it was, the conversion script wouldn't work for it yet.
>>
This guy has allegedly now converted ministral to HF
prince-canuma/Ministral-8B-Instruct-2410-HF
can't download it to test it right this moment since I'm downloading something else.
>>
>>102849359
>prince-canuma/Ministral-8B-Instruct-2410-HF
oops
proper link
https://huggingface.co/prince-canuma/Ministral-8B-Instruct-2410-HF/tree/main
>>
>>102847827
threesome with sally
>>
>>
File: nemo70slallytest.png (88 KB, 783x620)
88 KB
88 KB PNG
they totally cooked the sally test into Nemotron 70B. That much is obvious.
>>
>>102849387
tuesday is OVER
>>
>>102849399
Reasoning would prevent this
>>
>>102849332
And so, you ask anons to speculate about specs for a yet to be released machine and how well a 100B+ model would run in them.
Every nvidia release, hours after every release, is followed by "Is it worth buying the X090 or is it better to wait for the (X+1)090?" type of questions.
>>
>>102849399
In theory it should be baked into most models now that it has been on the internet for quite a while.
>>
>>102849399
Change the numbers and names instead. The model being able to decipher your cock-typing is a positive.
>>
File: johnette test.png (106 KB, 702x794)
106 KB
106 KB PNG
>>102849490
I changed more than that. (Just in case they anticipated this)
and it reverts to the retarded logic that is expected.
>>
File: 1708821120128948.png (641 KB, 1745x795)
641 KB
641 KB PNG
>>
>>102849370
>https://huggingface.co/prince-canuma/Ministral-8B-Instruct-2410-HF/tree/main

>"torch_dtype": "float32",
What a waste
>>
>>102849567
You can still load an fp32 model in 16 bit.
>>
File: ComfyUI_06237_.png (580 KB, 720x1280)
580 KB
580 KB PNG
>>102849406
Tuesday is a state of mind
>>
>>102849042
The point is prove that it works and don't expect people to prove it doesn't work. RETARD
>>
>>102849457

chill out nerd. knowing how fast m2 ultra mac studio machines run mistral large is worth knowing too. I rarely see discussion about it, and would give a good baseline if apple does decide to actually come out with a m4 version of the same line.
>>
>>102849603
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
There. Extrapolate your balls off.
>>
>>102849588
that's a stupid point and you should feel embarrassed
>>
Anyone here ever use DnD style ability checks in RP, or, more generally, stat systems? I've been trying this out with mistral large and it's kinda fun. It's loosely smart enough to demand sensible checks. That plus an arousal meter makes for a fun little game. Although I'm sure I take for granted how much I modify responses. I've also recently abandoned the standard mistral formatting and just started writing like it's a fanfic story except the story is written as a weird text adventure game complete with a little tutorial at the top. I dunno, maybe I'm scraping the bottom of the barrel for more novelty here.
>>
>>102849685

what model do you use?
>>
>>102849685
I do that with Nemo.
It does need some handholding by injecting some shit in the context and stuff, but it generally works decently.
>>
>>102849588
Alright let's break down the timeline here. The original anon >>102848695 was simply just asking for an experiment. Then people responded to him to essentially tell him that it's stupid, as an excuse to not do it. Then he asked >>102848725 for proof that it would be bad, probably because if there isn't any, then there's no reason not to try out the experiment he was asking for other than laziness. That seems to be the implied reasoning. No one in this reply chain ever said that the sampler works well. Therefore it doesn't make sense to make the point that one must provide proof of something working before asking for proof it doesn't work well, since no one claimed it works well in the first place.
>>
>>102849638
and you should kill yourself placebo enabler. but i guess you are just baiting.
>>
>>102849756
>it doesn't make sense to make the point that one must provide proof of something working
read what you type retard
>>
What if shivers down your spine, voice barely above a whisper and ragged gasps are the objective perfection of erotic literature? As we know, everyone is impressed with it at first. The problem is that in entertainment, producing perfection every time gets boring. Basically if at some point in the future you ask an AI to produce top of the line video game, it will produce a variation of Elden Ring every time because it just scores the most points in the AI's mind, but you are already tired of it.

Basically, what we actually want from creative writing is that for each story, the model should pick a style semi-randomly to make it less boring. What we need is a model where the writing style easily falls down to one of many style paths and then keeps falling down that same path. Current models are the opposite of that, they self-correct and converge into the same style that is shivers down your spine.
>>
>>102849781
>top of the line video game
>variation of Elden Ring
Is this the core of the bait?
>>
>>102849765
An LLM understands context better than you do at this point. At least come up with better insults.
>>
>>102847670
Llama-3.1-8B-Lexi-Uncensored_V2_F16.gguf
L3-Dark-Planet-8B-D_AU-Q8_0.gguf
>>
>>102849811
it's the only game xe played in the last 5 years, give the normalfag a break...
>>
File: mc.jpg (79 KB, 730x1260)
79 KB
79 KB JPG
>>102849031
If you're referring to a "sampler for tokens" in the context of machine learning or natural language processing, the term "god" would be metaphorical. In this field, a sampler might be part of a model that generates sequences of tokens, such as words or characters, by sampling from a probability distribution. This is a crucial step in generating coherent and contextually appropriate text.

Calling a sampler a "god" in this context might highlight its significant role in determining the output of a model, as it decides which tokens to select and in what order, thus shaping the final text. However, it's important to note that this is a tool or process designed by humans and operates within the parameters set by its programming and training data. It doesn't possess consciousness, intent, or divine qualities.

The metaphor likely emphasizes the sampler's central role in creating and influencing outcomes, similar to how a deity might be seen as having control over creation and order within a particular domain
>>
File: GaCd65gbIAAhPli.jpg (154 KB, 1720x899)
154 KB
154 KB JPG
>>102849561
very neat
>>
File: 1722018186777941.png (178 KB, 1712x998)
178 KB
178 KB PNG
nvidiasirs...
>>
>>102849948
>>102849997
yuseless
>>
>>102849997
>t. i have no theory of mind
>>
>>102849995
>>102849995
>>102849995
>>
>>102849558
5 (or 6 if Johnette is gender confused) ?
Johnette has 3 brother (so 4 siblings), but being a "brother" means you are normal, and thus not gender confused.
So, you have 3 brother, and 6 gender confused sibling, Johnette being one of them or not.
Is that correct?
>>
>>102838447
holy fuck
anyone got the original pic?
>>
>>102849811
That's the joke. Because in art, you can't say one thing is the best because it depends on how you look at it and what you want. And even if one thing scores the best in a survey or sales or something, people want variety. The AI writes the same slop over and over because it scores the most points in some internal system. In a way it's probably good writing, but after reading it 3 times you have had enough of it and don't want it anymore, and this is what the AI can't provide. It can only provide the thing that scores the best, like some company recreating the same gacha game again and again because it's most effective at printing money.

What we need is a way to somehow selectively break up the parts of the scoring system that leads to the slop but without affecting other areas of intelligence (likely impossible). Or better prompts to affect the style and wording choices. Or some way to make the model fall into diverge into different style paths instead of converging to the optimal slop path.
>>
>>102850173
never fucking mind
the pixiv name is right there, thanks!!
>>
>>102850173
https://r-34.xyz/post/478064?tags=vocaloid
>>
File: nosebleedpixv.jpg (7 KB, 160x160)
7 KB
7 KB JPG
>>102850220
dear lord
THE VIDEO!!
>>
>>102850259
Original poster of the image here. I didn't even see that kek. I just picked a random funny Teto image I saw off pixiv to post.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.