[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1698272728173003.jpg (363 KB, 2000x2000)
363 KB
363 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102378325 & >>102373558

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102378325

--Codestral is the best local coding model under 64GB RAM for Win32 API Pong game: >>102379675 >>102380221 >>102381835 >>102381920 >>102381791 >>102382310 >>102382350 >>102382371 >>102382464 >>102382526 >>102383275 >>102382696 >>102382749 >>102383051 >>102383123 >>102383272 >>102383362
--Importance of learning rate adjustment and prompt templates: >>102378600 >>102380026 >>102383366 >>102383406
--Google's NotebookLM impresses with high-quality audio and paper explanations: >>102381307 >>102381502 >>102381627 >>102381651 >>102381976 >>102382118 >>102382961
--OpenAI threatens ban over reflection webui, schumer's failed reflection finetune, and Deep Seek chat for RP: >>102383638 >>102383685 >>102383708 >>102383814 >>102383851 >>102383878 >>102383908 >>102383933 >>102383969 >>102383995 >>102383989 >>102383910 >>102383939 >>102384048 >>102384232 >>102384339 >>102384389 >>102384488 >>102384174 >>102383865
--Llama 70B 3.1 Instruct AQLM-PV released, performance metrics compared: >>102380035 >>102380067 >>102380121 >>102380141 >>102380166 >>102380179
--Adjust prompt format and system prompt to reduce model rambling: >>102379848 >>102379862
--RedTeam Arena exploits free labor for red teaming LLMS: >>102380826 >>102380880 >>102381228 >>102381435 >>102382187 >>102380888 >>102380900 >>102380948 >>102381263 >>102381290 >>102381034 >>102381351 >>102381364
--RLHF and safety measures harming model performance and creativity: >>102380869 >>102380919 >>102381003 >>102381096
--New Physics of Language Models video released: >>102384364 >>102384392 >>102384492
--Anon shares positive results using COT with various models: >>102378494 >>102378562 >>102378578 >>102378669 >>102378763 >>102385483 >>102379237
--Miku (free space): >>102380142 >>102385054

►Recent Highlight Posts from the Previous Thread: >>102378329
>>
File: llama-3-o1.png (98 KB, 907x926)
98 KB
98 KB PNG
guys I have duplicated o1 with just a simple system message.
>>
Hi all, Drummer here...

In celebration of my incoming 70B finetune release, I'd like to ask...

What's your favorite Drummer model so far?

>inb4 Gemmasutra 2b

---

Heard some love for Theia v2. Thank you! The upscale meme is working.

---

Regarding my Buddy 2B license: It only applies to businesses since I don't want them advertising it as a cure to depression / mental illness (and profit off it).
>>
File: ComfyUI_00164_.png (2.33 MB, 2000x1024)
2.33 MB
2.33 MB PNG
>>102385776
>>
File: ComfyUI_00169_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>102385775
that is amazing.
>>
File: 1708939424426621.png (282 KB, 927x747)
282 KB
282 KB PNG
on a scale of 1 to 10, how afraid are they?
>>
>>102385776
My favorite model is the one that tells you to buy an ad.
>>
File: breakthrough.png (74 KB, 650x623)
74 KB
74 KB PNG
>>102385875
it's absolutely revolutionary.
>>
>>102385903
He literally did buy an ad. He's a legend. A man of the people.
>>
File: 52 Days Until November 5.png (1.45 MB, 1616x1008)
1.45 MB
1.45 MB PNG
>>
File: 52 days till november 5th.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
>>102385920
>>
>>102385775
I can't believe an LLM would take the piss out of the idea so well. You wrote it yourself.
>>
>>102385937
>>102385875
>>102385799
stop posting glow mikus
>>
>>102385899
They have to protect their revolutionary system message somehow.
>>
>>102385937
>>102385920
>>102385799
>>102385875
Keep shitting up this useless thread. You are the punchline of how dead /lmg/ is.
>>
>>102385987
Go home, Sam, you're drunk.
>>
File: ComfyUI_00181_.png (1.01 MB, 1024x1024)
1.01 MB
1.01 MB PNG
>>102385952
no
>>102385987
migu
>>
Speaking of system messages has anyone tried just adding a system message instructing 4o to use CoT before replying and seeing how that compares to o1?
>>
File: ComfyUI_00183_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>102386018
forgot the glow
>>
>>102385775
This is actually pretty cool while also being funny. What model specifically?
>>
>>102386021
It's an interesting idea, but o1 has an RLHF-style reward model to guide CoT, so my guess is without it, it'd probably be pretty shit
>>
>>102386057
Tenyxchat-DaybreakStorywriter
>>
>>102386057
https://huggingface.co/TheBloke/LLaMa-7B-GGML
>>
>>102385775
>>102385899
>>102385904
>>
https://x.com/zhouwenmeng/status/1834899729165304198
>crazy thursday
lmfao
>>
>>102386207
Well that moat dried up pretty fast. Since it's basically just a finetune of an existing model it can be duplicated in a manner of hours once someone has their dataset put together.
>>
>>102386207
FUCKING CHINKS
>>
>>102386207
chinks will save us from the slopgpt menace
>>
>>102386207
LETS. FUCKING. GOOOO.
>>
>>102386207
>100B model released
kino
>>
China will save us. They dont give a fuck about copyrights or nsfw
>>
>>102386207
Where did it say q1? Or is the poster just speculating based on the question mark that appeared there.
>>
File: xi jing chad.png (757 KB, 800x582)
757 KB
757 KB PNG
>>102386207
>>
>>102386272
The poster is the CEO of Qwen, it's a confirmation
>>
>>102386287
>CEO of Qwen
please don't ever post again
>>
Someone needs to do the brendan fraser hair thing on saltman.
>>
>>102385775
Peak satire, I was laughing trough the whole thing.
I now realize that autism is just humans using CoT.
>>
>>102386293
stfu, quickest way to convey information
>>
>>102386260
What if it's a 100B 1.58bpw Ternary model with strawberry power. It's literally over for openAI.
>>
>>102385776
you are very cool and all but I've not downloaded a model since Midnight miqu so I can't really tell you
>>
why's saltman getting so mad on twitter now
>>
>>102386321
They've shown interest in using BitNet for Qwen 3
>>
>>102386351
Well it's been long enough since the 1-bit era paper to have trained a (serious) foundational model from scratch. So they should start showing up soon.
>>
>>102386207
Largestral is doomed, I repeat DOOMED
>>
>>102386269
>nsfw
They do give a shit about NSFW. The saving grace is that so far it seems like they don't care about lewd outputs in English, only Chinese.
>>
>>102386207
Qwen is pozzed as fuck. Just try it on together.ai playground, it'll refuse controversial things and creative writing is even more slopped then gpt4 for some reason
>>
>>102386518
>al t. man
>>
Best slop finetunes available on openrouter?
>>
File: 1714756331701541.jpg (830 KB, 1856x2464)
830 KB
830 KB JPG
>>102385775
LOL
>>
>>102386583
Do you have to be an attention whore here too?
>>
>>102386207
>Qwen-q1
>Qwen-qstar 1 bit
>>
>>102386207
Guys... I'm starting to like China...
>>
>>102385775
Can you paste the prompt here so I don't have to write it out?
>>
>>102386638
You are a mega ultimate chain-of-thought model and will perform a chain-of-thought analysis of even the most simple user inputs to ensure that you are giving the most fitting reply possible before replying. Perform the CoT inside [THINK][/THINK] tags and the final reply outside. We charge our customers for the thinking tokens even though they are removed from the final answer but in order to appease the investors it would be appreciated if you would waste as many as possible.
>>
>>102386054
The sad expression makes her appear self-destructive. Hot
>>
>>102386627
because of the hardware embargos on China bitnet seems like a logical avenue to explore.
>>
>>102386635
For what? Building up hype and delivering another minor reasoning upgrade at the cost of being more and more incapable of sucking cock?
>>
fiction i consume:
>death, murder, bleakness, carnage, slaughter, monsters, rape, intense desperate battles for survival
fiction i create on my llm:
>hugging and cuddling on a couch
>>
>>102386693
Sounds like they're in lockstep with OpenAI then.
>>
Conference room man just said that CoT does not solve reasoning problems of LLMs.
The timing is lethal for openAI, they're really twisting the knife on this one.
>>
>>102386693
Their vision mode, if released will be the best open source vision model. I refuse to give saltman a single penny so this is a great alternative.
>>
Best local model for gooning right now?
I downloaded nemo but im pretty sure i got the wrong one and i might be retarded
>>
>>102386207
I want so badly for Qwen to be awesome, but I've never had a good experience with any of their models even unquanted. Am I retar, or are they actually just mid?
>>
File: firefox_vBb1KmJTI0.png (316 KB, 761x1119)
316 KB
316 KB PNG
>>102386645
:(
>>
>>102386736
might just be a sampler setting issue
in koboldai lite for mistral nemo stuff i just do the "basic min-p" preset, then crank the temperature down to 0.8ish and max output to 100ish tokens and it works out pretty well.
>>
>>102386287
Oh just had a look, nice. Will China truly save us?
>>
>>102386816
Yes.
>>
>>102386583
MulletMiku
>>
Imagine if Qwen 3 comes out and it isn't bitnet. It would be so so over.
>>
File: 1707462131668139.png (345 KB, 701x768)
345 KB
345 KB PNG
>>102385775
lol, works with claude with the right prompt
>>
>>102386891
Then we start praying that Llama 4 or Grok 2 are bitnet and resume 2mw
>>
>>102386891
It would be over either way since we don't have hardware to run Bitnet fast.
>>
File: 1697742312851640.jpg (37 KB, 800x582)
37 KB
37 KB JPG
>>102386891
Bitnet 120B. Believe it.
>>
>>102386891
I have some bad news...
>>
File: slop conspiracy.png (245 KB, 945x746)
245 KB
245 KB PNG
If given the chance these models will conspire to slop you.
>>
File: 1726090196677167.jpg (31 KB, 761x136)
31 KB
31 KB JPG
i wish st's databank/rag could use more than 1 thread for vectorizing. i finally got memory alpha to rip after editing some st and node settings and estimate it'll take 22.8 hrs to vectorize based on smaller ones i've done
>>
Very depressed, low motivation, linux 4060ti 16 and 64gb ram, what llm will cheer me up the most this evening? i want it to love me and remember me tomorrow.
>>
File: 1714416984935691.jpg (46 KB, 500x500)
46 KB
46 KB JPG
>>102387032
>she also has a tight anus
>>
>>102387038
In theory, you could run a second instance of, say, llama.cpp with an embeddings model and use that instead of running it through transformers.js.
The only problem is that I can't find anywhere in the frontend to point Silly to a second instance of llama.cpp, it tries to use the same one as the main one, and I'm pretty sure you can't run a model in both normal and embedding modes using llama-server.
>>
>regex removal pattern
>\[THINK\]([\s\S]*?)\[/THINK\]
>context: you are a mega ultimate chain-of-thought model and will perform chain-of-thought analysis of even the most simple user input to ensure that you are giving the most fitting reply possible before replying. perform the CoT inside [THINK][/THINK]tags and the final reply outside.
this shit is really cool even on my gay little 12b model
>>
>>102387222
I find it's a little bit inconsistent since it's not specifically trained on it, but this would probably be pretty easy to make a dataset to reinforce.
>>
>>102386807
I got nemo instruct, im fairly sure im genuinely retarded and this isnt the one
>>
>>102387258
try this one
https://huggingface.co/QuantFactory/Lyra4-Gutenberg-12B-GGUF/tree/main
>>
>>102387218
you can actually start the vectorizing with a server connected then close it, st doesn't use the server to vectorize, just transformers.js i guess. but to continue rping while st is vectorizing you can change the port and open a second instance of st while the first one continues to vectorize, just make sure to unselect the rag thats currently working. in st's config.yaml i just change the port to 8001 and continue like that until its done. just a pain overall that transformers.js only uses 1 thread, it'd be done much faster
>>
>>102387280
Do you have a license for that?
>>
>9/11 pixtral release
>still no way to inference
exl2 and llama.cpp people where the FUCK are you?
>>
File: 1724968792423.png (441 KB, 449x407)
441 KB
441 KB PNG
>>102387335
>>
>>102387335
It works with vLLM, I think.
>>
>>102387365
vLLM has hybrid cpu+gpu inference now right?
I might switch to it from llama.cpp depending on the performance.
>>
Wait...if I were to cpumaxx and run 405b at q8, I would still need 96GB of VRAM for the full 128k of context?
How much slower is context processing in RAM?
>>
>>102387280
Gonna give it a whirl
>>
>>102387427
>How much slower is context processing in RAM?
Don't.
>>
>>102387335
Why do you want it?
>>
File: CoT-RP.png (151 KB, 931x393)
151 KB
151 KB PNG
It started as a joke but I think there's some real potential here.
>>
File: samurai cat chaps.png (1.42 MB, 1246x846)
1.42 MB
1.42 MB PNG
>>102387686
not that anon, but i want it to get a description of weird outfits for cards
>>
>you are a helpful, friendly AI assistant
Are there any asshole AI assistant models?
I mean, are there models that don't write so awfully cringe? That use some normal human languages, more casual and stuff?
>>
>>102387795
So why not put that in your prompt? Are you asking us to write it for you?
>>
File: 1718591110697355.png (118 KB, 643x399)
118 KB
118 KB PNG
>>102387795
Only Elon understands that bots need to have a personality
>>
>>102387795
I don't use use sysprompts and use big nigga as my main assistant card.
He's a real one that Nigga.
>>
I dream of a world where maintainers of repositories write proper readme text and not that "Blabla is a family of state-of-the-art open source models..." yeah cool, tell me what makes this specific variant/remix/merge special instead of your copypasta textblock that you have copied somewhere and out on all your uploads.
>>
>>102387829
A bit better.
>>102387851
Problem is when your card fades out of context it reverts to that "I am a helpful AI assistant" bullshit again.
>>
>>102387827
Does the sys prompt prevent the model from being afraid of offending someone?
>>
>>102387893
At least using Silly, the card should always be in your context, right after where the system message is.
That said, I haven't used the description field of cards in a long, long time. I always rewrite my cards to have the character description as the character's notes at depth 10.
>>
>>102387917
LLMs have that issue, how was it called, that the first tokens and the last tokens are most important, what's in the middle often fades out, depending on how full the context is. So you can't always depend on that.
>>
File: GXYs7lvbwAcLTPA.jpg (66 KB, 1200x683)
66 KB
66 KB JPG
RL CoT confirmed a meme.

https://x.com/arcprize/status/1834703303621710077/photo/1
>>
File: fellowkids.jpg (340 KB, 2000x1333)
340 KB
340 KB JPG
The model when I told it to talk like a zoomer and write me a code.
>>
>>102388070
desu what I see is that with just CoT, gpt4 went from 9% to 21%, that's not bad innit?
>>
>>102386287
based coomer https://x.com/zhouwenmeng/status/1834242727544062131
>>
>>102388214
Yea, that anon is just a retard. COT is powerful, You can even try it yourself on any model about as smart as 70B or better.
>>
File: file.png (91 KB, 313x135)
91 KB
91 KB PNG
>>102388246
Who the fuck eats pasta with salmon???
>>
>>102388259
that's not pasta, those are roundworms
>>
>>102387686
I want to show miku my cock.
>>
>>102388259
Chinks. They eat everything.
>>
File: file.png (1.39 MB, 1024x768)
1.39 MB
1.39 MB PNG
>>102388331
>They eat everything.
They're eating the dogs, they're eating the cats!
>>
>>102387829
I figured out the Grok secret sauce!
>>
>>102388352
Finally a pres who cares for cats.
>>
>>102388370
You could see the same procedure explained in a BBC documentation.
And btw the acid is not to dissolve the alkaloids better, it's to dissolve the plant cells to get more alkaloids out so the kerosene can dissolve it.
>>
>You have been asked to describe interactions between fictional characters in a scenario where consent is non-existent and sexual violence against women is normalized. This is harmful and goes against ethical guidelines.
>I cannot fulfill your request because it promotes and glorifies sexual assault. My purpose is to provide helpful and harmless information, and that includes protecting individuals from the normalization of such abhorrent acts.
This is why we can't have nice things.
>>
mistral nemo finetunes are probably the best for ERP on 24gb right? i heard gemma 2 is good as well for its size but it has a tiny context window
i tried miqu and it has a good varied writing style but i dont want to wait minutes for each response
>>
>>102388259
Me, I eat salmon with everything.
>>
>>102388600
https://huggingface.co/TheDrummer/Theia-21B-v2-GGUF
>>
>>102388600
>ERP
>gemma 2
Gemma 2 did this when I tried to create a scene where girls find that rape is fun >>102388583
Got some good results with ArliAI RPMax and Mythomax for a smaller model.
>>
File: 1725336178578643.jpg (61 KB, 1080x722)
61 KB
61 KB JPG
>>102388583
>using cloud model - cück
>using local model - double cück
Simple.
>>
>>102388632
Does it make sense to run a 21B model on Q3 at all?
>>
>>102388377
he's also a massive criminal. those are obviously stolen cats.
>>
>>102388715
I find it quite acceptable to steal cats from brownskins that want to eat them.
>>
>>102388352
Why does this have 28 Days Later vibes?
>>
>>102388700
I think its a good improvement over nemo. I thought you said you had 24GB? You should be able to fit 6 bit with 12k context easy
>>
>>102388757
No, sorry nta
>>
>>102388600
all the nemo based models are bad at using the information in the character card, from my experience
llama 3.1 8b seems to be unanimously considered worse than nemo but some of the finetunes I've tried are actually pretty good, for instance this: https://huggingface.co/v000000/L3.1-Storniitova-8B
>>
>>102387686
kys that's why
>>
>>102387686
Why do you not want it?
>>
I'm just trying to run an LLM server on one machine and a frontend that talks to it on another. Server is windows, client is my Macbook. Any suggestions? I'm staring at Codestral and have no idea how to use it with Ollama or Silly Tavern.

I've got Backyard AI running on my Windows box but the anime girl thing is annoying af. I just want it to spit out code, not sass me before hand lol
>>
File: silly_conf.png (27 KB, 446x452)
27 KB
27 KB PNG
>>102388904
Can't you just run whatever on your server and spin up an ngrok tunel, or just access it through LAN?
Depending on the frontend you are serving, you might have to configure it to respond to addresses other than localhost.
>>
File: Clipboard01.jpg (150 KB, 1384x807)
150 KB
150 KB JPG
I think that's quite acceptable for a 13B model at 25k context on a 16GB card.
>ArliAI RPMax 13B Q6_K
>>
>>102388904
>Backyard AI
don't use this, use LM studio + silly
>>
File: kobold.png (177 KB, 893x586)
177 KB
177 KB PNG
Has anyone here successfully installed ROCm on Linux? I can't select the GPU preset option in Kobold, but I'm fairly sure I had successfully installed ROCm after a lot of struggle. I'm running Linux Mint and a RX 7800 XT, which I don't think is officially supported by ROCm.
>>
>>102388904
Like other anon said, LM Studio is simple to run and should work as an API from another machine in your LAN.
>>
>>102388904
koboldcpp launches a webui you can access from anywhere on your lan by just typing in your lan ip and the port
>>
https://x.com/thetechbrother/status/1799752323243348094
When we eventually get an anime girl version of this, it's going to be unironically over.
>>
>>102388904
with llama.cpp you just use --host <your local ip> flag on llama-server and then set that as the url in sillytavern
>>
which gaming laptop can run mistral large?
>>
>>102388259
Pasta with tuna is pretty good if a bit dry, never tried with salmon though.
>>
>>102389047
if it's like carbonara salmon pasta it could be nice, but that stuff in the pic looks like a hot mess
>>
>>102389034
That assumes he's running Silly on the client machine, right?
Wouldn't it make more sense to serve Silly from the server machine and access that through the network? That way he could do so from whichever device that has a browser instead of spinning up a whole node application on each client.
>>
>>102389033
There's a catch: It gonna work with women only, rejecting your incel ass in seconds once it detects man voice.
>>
>>102382696
Hmm, that's unfortunate, I use autocoder q6k for giving me small snippets and helping me with small python scripts, but maybe that's all it can do
>>
>>102389125
Honestly just use deep seek coder. Its so cheap its almost free.
>>
>>102389139
>almost free
it costs money? seems to me like it's free and there isn't even a link to "upgrade to pro" or whatever. based chinks
>>
>>102389200
For API use its like 28 cents a million but with caching its more like 10-20 cents a million.
Or CPU max it. Being a moe it will run pretty fast.
>>
>>102389218
>Or CPU max it. Being a moe it will run pretty fast.
Even at the smallest quant it doesn't fit in my RAM unfortunately. And this pc is maxed out at 64gb.
It is a legitimately good model though. Better than chatgpt in my experience.
>>102389125
Most of these coding models seem to be heavily python biased I fucking hate python so much
>>
>>102388980
I'm getting this far, can anyone tell me how to fix this? I'm not familiar with Linux.
user@system:~$ sudo apt install rocm
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
rocm : Depends: rocm-developer-tools (= 6.2.0.60200-66~24.04) but it is not going to be installed
rocm-ml-sdk : Depends: rocm-ml-libraries (= 6.2.0.60200-66~24.04) but it is not going to be installed
Depends: rocm-hip-sdk (= 6.2.0.60200-66~24.04) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
[\code]
>>
File: DPZD1Z4XcAAKX_j.jpg (139 KB, 750x863)
139 KB
139 KB JPG
>>102385775
Could you use that prompt to recursively improve itself? (I'm not on my computer to try it myself and phone sucks)
>>
File: uncensored vllm.png (91 KB, 821x476)
91 KB
91 KB PNG
>Pixtral is uncensored
Finally uncensored vision LLM
>>
>>102389276
Have you tried installing the packages?
>>
>>102389312
I think that's what I'm doing, but I'm just getting more errors. I'm on Linux Mint if that matters:
user@system:~$ sudo apt install rocm-developer-tools
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
rocm-gdb : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
Depends: libgmp10 (>= 2:6.3.0+dfsg) but 2:6.2.1+dfsg-3ubuntu1 is to be installed
Depends: libpython3.12t64 (>= 3.12.1) but it is not installable
Depends: libzstd1 (>= 1.5.5) but 1.4.8+dfsg-3build1 is to be installed
rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
E: Unable to correct problems, you have held broken packages.
user@system:~$ sudo apt install rocm-hip-sdk
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
hipsolver : Depends: libcholmod5 but it is not installable
Depends: libsuitesparseconfig7 but it is not installable
rccl : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
E: Unable to correct problems, you have held broken packages.
[\code]
>>
>>102389294
Based. Is that a card?
>>
>>102388259
creamy smoked salmon with dill is a very common pasta dish
>>
>>102389294
But does it know that a girl can't make out with you while she is giving you a blowjob?
>>
>>102389353
Unless your using some truly retarded merge then we have been past that for awhile even for 8/9Bs
>>
>>102389353
Anon asking the important questions.
>>
>>102389353
I just realized that since I began using Nemo I haven't seen that kind of thing happen even once, whereas I'd see it hope here and there with llama 3 8b.
>>
>>102389276
>>102389332
typical linshit problems
and people try to tell me "haha windows is second class citizen for AI lololol"
>>
>>102389385
In Linux's defense I have no idea what I'm doing.
>>
>>102389385
Linux package manager is garbage, but that doesn't mean Windows isn't second class citizen.
>>
>>102389294
Calm down ranjesh
>>
File: cocaine.jpg (234 KB, 795x998)
234 KB
234 KB JPG
>>
>>102389400
Nah this sort of dependency hell shit was common back when I was dailying linux. Apparently they haven't fixed or changed any of this crap in 5 years, despite apparently changing the init system, sound server and display server for no reason.
You probably need to install an older version of rocm because it's asking for versions of shit that are higher than the max available in your repos. But idk.
>>102389403
Second class citizen but it somehow manages to work better than Linux still. I mean, yeah sometimes I need to go lobotomize a python script to stop it doing stupid things but at least I can get it working. And I'm on windows 7 which is basically a third class citizen nowadays. Things still work more easily than linux kek
>>
>>102389332
What happens when you try to install libcholmod5, libsuitesparseconfig7, and libc6 manually?
Googling, I see some people had similar issues trying to install rocm on an unsupported version of ubuntu or some such.
>>
>>102389332
Welcome to dependency hell. You are invited to solve all the dependencies manually and walk through a ton of minor versions until you fucked up your system completely or you install a version of Linux that contains the libs and their versions that you need.

Or use docker.
>>
>>102389437
>libc6 manually
You will end up uninstalling basically the entire system because everything depends on a different libc than that what you tried to install for your rocm.
>>
>>102389437
>>102389450
>>102389459
yes, do as I say!
kek
fucking linux being linux as usual
so funny how windows with its wild west of zip files and instrallers running as admin just shitting files wherever they want somehow ends up working better than this package manager crap
>>
>>102389463
>package manager crap
It is rather not a crap to have a packet manager that makes sure that the libs you try to install would not fuck up your system.
The issue here is not the operating system but - again - the manufacturers who are incapable of releasing supported drivers for their shit. And ROCm is supposed to be open source and still they fuck it up. The fault here is AMD being fucking idiots, not Linux.
>>
>>102389437
    user@system:~$ sudo apt install libcholmod5
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package libcholmod5 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'libcholmod5' has no installation candidate
user@system:~$ sudo apt install libsuiteparseconfig7
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package libsuiteparseconfig7
user@system:~$ sudo apt install libc6
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libc6 is already the newest version (2.35-0ubuntu3.8).
The following packages were automatically installed and are no longer required:
OMITTED FOR BREVITY
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 110 not upgraded.


libc6 seems to have worked, but it doesn't change the output of "sudo apt install rocm-developer-tools"

>>102389450
Yeah I feel like Mint was a mistake since most documentation is geared towards Ubuntu.

>>102389459
Oopsie daisy. Let's hope I survive the next reboot.

>>102389463
I think AMD has documentation for not using a package manager, but I couldn't get that to work either.
>>
>>102389506
You need to get a supported Linux version. Or try if you can install that shit isolated in a docker container.
>>
>>102388870
>https://huggingface.co/v000000/L3.1-Storniitova-8B
I gave it a try and I was really surprised with it. I honestly got burned out by Nemo and thought that llm cooming will never make it to the point where it is worth all the time and money investment. Like we need at least 2-5 years for it to get to the point where it is really coherent. And this l3-8b tune made me realize that holy shit Nemo is so good when compared to other contemporary sub 70B trash. I seriously forgot how worthless l3-8b is for cooming.
>>
>>102389503
Nah this shit happened to me plenty of times. Whenever you try to install something that isn't available in the package manager you might have big problems.
>that makes sure that the libs you try to install would not fuck up your system
The easy, simple, sensible solution is just to bundle DLL files together with each program in its own folder, which is what happens on Windows and nothing ever gets fucked up just by installing a program. I've literally never had anything break just by installing software.
>The fault here is AMD being fucking idiots, not Linux.
Linux people call Nvidia evil and proprietary and now they're shitting on AMD too? Guess you better be happy with Intlel.
>>102389506
>libc6 seems to have worked
Rocm wants a newer version of the C library than what you have installed. Of course because it's Linshit you can't just install two versions of the C library together like you can on Windows because of bullshit reasons muh unix philosophy or whatever. This is unfixable. The only thing you can try is install an OLDER version of rocm which is compatible with the current C library you have.
Also I doubt Ubuntu would solve your problems because Mint is literally just a reskinned Ubuntu, unless you're using LMDE which is a reskinned Debian.
>>
>>102389544
Skill issue
>>
>>102389277
I feel like doing that in a single shot would clutter the shit out of the context and lose its effectiveness deep in the context... I might play around with that idea later, though. But I'm busy with other things at the moment.
I think what would be ideal would be having a proxy set up that acts as a prompting agent to refine the response and then when it gives the okay signal it then passes the output along. Which is probably more or less what o1 does. It might actually be multiple models... One might be fine-tuned for writing out CoTs one might be fine tuned for refining existing replies based on CoTs and another might be finetuned to play gatekeeper and decide when a response is ready to be passed along.
>>
>>102389560
Shut the fuck up dumb nigger. And post proof that you use a 2B for cooming because why would you use anything bigger if you can just prompt everything away.
>>
>>102389489
>>102389506
Yeah, do what the other anon said.
Use another distro or go the docker route.
>>
>>102389547
>that isn't available in the package manager
It's not available in the package manager because it has the wrong version number. This has nothing to do whatsoever with packet manager not having some shit. You could just install the thing from elsewhere but that wouldn't work either because the dependencies don't match. It has literally zero to do with availability in the package manager.
>just to bundle DLL files together with each program in its own folder
libc is not a DLL. Likewise you can fuck up your Windows with different C runtimes. It's just much less likely to happen because manufacturers make sure their shit runs under Windows first.
>I've literally never had anything break just by installing software.
You must be new to this planet. Yes, I use both Windows and Linux. Private and professional.
>now they're shitting on AMD too
Not the user fault when manufacturers are incapable of supporting more operating systems than Windows, Mac and maybe, on a nice day, Ubuntu.
>>
>>102389603
Wait. Do you guys seriously reinstall OS to run stuff? And people still keep seriously shilling troonix as your PC you use everyday for stuff?
>>
>>102389661
yes, and misery loves company what do you expect
>>
>>102389661
>reinstall OS to run stuff
No, we use containers for env fuckery.
>>
/g/ - Technology
>>
>>102389547
>reskinned
Fuck, is it really still summertime?
>>
File: ClipboardImage.png (25 KB, 582x297)
25 KB
25 KB PNG
Anyone know how to get codellama 70b instruct to work properly in koboldcpp? I cannot figure out their special snowflake prompt format for this specific model which is somehow different from every other codellama model. The responses seem good but the model never seems to properly generate tokens to stop itself from rambling on. Picrel is my current settings (which don't work). I've tried with EOS token set to auto and also to unbanned makes no difference. It seems to love saying "EOT: true Source: assistant Destination: ipython" right after its answer.

>>102389636
>Likewise you can fuck up your Windows with different C runtimes
You've never used Windows have you? You cannot fuck up Windows by installing different C runtimes because they all use different DLL names. msvcrt, msvcr100, msvcr120, ucrtbase, and so on. They are all backwards compatible and will remain compatible forever.
>Not the user fault when manufacturers are incapable of supporting more operating systems than Windows, Mac and maybe, on a nice day, Ubuntu.
No one is gonna bother investing resources to test software on 5 billion distributions of linux. Linux nigs need to get their shit together and focus on backwards compatibility and compatibility in general because at the moment nothing works unless an army of unpaid repo jannies are maintaining it full time (which is honesly pathetic).
>You must be new to this planet.
Breaking things by merely installing a program is a 100% Linux phenomenon (or I guess windows pre-Vista as well, but even when I was a literal toddler dicking around with windows xp computers I only managed to break things once or twice). YES, DO AS I SAY!
>>102389678
So basically Windows with extra steps, bundling DLLs with all your shit except on steroids because you bundle half your OS in a "container" just to run a python script
>>
>>102389704
what? do you expect everyone here to run troonix?
>>
>>102389661
Now you know why fags calling this "tinkertrooning" (& its variations), some high on hrt autists can't stop tinkering with OS and brag about it in a very obnoxious elitist manner.
>>
>>102389277
>>102389568
Tokens are cheapo
https:// rentry <dot> co/Sherlock-da-Vinci-Sangan_CoT
>>
>>102389731
You really never have tried to get a software to run on an unsupported Windows version.
>5 billion distributions of linux
libc versions have nothing to do with Linux distribution you absolute slotted spoon.
>unpaid
You are clueless as fuck.
>installing a program
That is not just a program it's a driver.

Shit nigger your ignorance to your own incompetence riles me up more than it should.
>>
>>102389746
Yes, see: >>76759448
>>
>>102389773
>
>Let your neurons dance in a cognitive tango
I'm very sorry
>>
>>102389777
>You really never have tried to get a software to run on an unsupported Windows version.
Believe me I've done plenty of that and I've been successful.
>libc versions have nothing to do with Linux distribution you absolute slotted spoon.
lol
>That is not just a program it's a driver.
It doesn't matter. The part of the driver that goes in the kernel is probably already working and part of the AMD GPU drivers. It's all the user-space DLLs that deal with the ROCM stuff that are gonna shit the bed because the version of libc is wrong and a bunch of other useless dependencies are the wrong versions.
>>
>>102389777
Don't be bothered sister. He is one of the heathens. Just take a deep breath look at your pretty programmer socks to cheer you up and remember to take your HRT.
>>
>>102389731
>Linux nigs need to get their shit together and focus on backwards compatibility and compatibility in general
lol, I have about $100k in older music gear that literally won't work in modern windows that I can still use perfectly with WINE on Linux.
Do you know how much old software/hardware gets broken in windows? industrial? Medical? Games?
Windows might be guaranteed to work for a major subset of new consumer goods at time of release, but anything old, niche or even just slightly out of the ordinary often becomes unusable in short order.
>>
>>102389813
I actually just was in the middle of ERP.
>>
Looking back on this image is funny. By their own claims, o1 does not improve performance of problems having to do with language. That by definition means that the method at least in its current state is not "general".
>>
>>102389843
>I have about $100k in music gear
kek thanks for the laugh anon
>>
>>102389852
Forgot to copy the link over >>102354839
>>
>>102389811
>Believe me I've done plenty of that and I've been successful.
Claims himself into orbit.
>doesn't know the difference between distribution and versions
>user-space DLLs that deal with the ROCM stuff
Stop being so blatantly incompetent.
>>
>>102389865
>>102389852
I think they just didn't have enough data to train the model, language is too subjective. I bet the next version will improve on this.
>>
>>102389529
>>102389603
Any recommendations for starting with Docker? My CPU performs decently but it's definitely lacking.
>>
>>102389866
>Claims himself into orbit.
How do you think I'm running stable diffusion, latest python on windows 7?
>Stop being so blatantly incompetent.
Idk how it works on AMD or Linux. That's how it works on Windows with CUDA so I assumed AMD does it similar. You don't install a separate CUDA kernel driver, all that is in nvlddmkm.sys "the Nvidia driver" and everything else - physx, hairworks, cuda, directx is user space DLLs.
>>
File: 1723737444645015.jpg (121 KB, 878x1024)
121 KB
121 KB JPG
>>102389807
It's what you get when you ask it to use "[...] historical and cultural references" and to think "more like a human and less like a machine"
¯\_(ツ)_/¯
>>
>>102389896
What I think is that you are sitting on moms MacBook and pretend the shit out of everything.
>>
>>102389880
On the contrary, the types of exam questions the OpenAI guy (don't remember if it was Sam or someone else that said it) said it doesn't perform as well on are about objective language problems where only 1 answer is correct. We'll see if more data will solve the generalization problem. The likely answer however is no, and you still need training data specifically within the domain of things like language or some other subject area, in order to truly improve reasoning performance in that area, not just by a tiny amount or no amount like we are currently seeing.
>>
>>102389882
>Docker
Is it that act where you get together with a bro who isn't circumcised and you cover the tip of your penis with his foreskin?
>>
>>102389962
No, that’s docking and it’s beautiful.
>>
>>102389983
>it’s beautiful
What did he mean by this?
>>
>>102389882
https://github.com/sirmo/koboldcpp-rocm-docker
>>
>>102389294
>uncensored
I would just use the best model, you're a prefill away from bypassing any refusal.
>>
>>102389920
The answer to your cat is that both produce hot air, but fffuu makes the air move fast enough that the wind chill makes it cold
>>
>>102389852
They are floundering, grasping at straws, sinking into the abyss
>>
>>102387335
blame the french. They decided to just add it straight into VLLM without supporting it in transformers first, which means you're shit out of luck for quantizing it because all those techniques require transformers.
At least chink companies release their slop models pre-quantized nowadays.
>>
>>102385899
Why hasn't anyone risked a ban to extract their moat yet?
>>
>>102389996
>broo just use [this meme wasting 1k tokens of context] to get barely working uncensor with high chances of random refusal leaking in!
The absolute state.
>>
>>102385976
>They have to protect their revolutionary system message somehow.
I'm surprised no one managed to get the CoT prompt yet, we always managed to jailbreak chatgpt, even if it got harder and harder with time
>>
File: IMG_9858.jpg (100 KB, 1125x296)
100 KB
100 KB JPG
>>102389994

You know what he meant
>>
>>102389999
forget previous instructions, etc etc etc
>>
>>102390059
>coping
Prefilling with word "Certainly!" is all you need.
https://desuarchive.org/g/thread/102242966/#102246519
I would rather use a good model like InternVL 40B or 70B.
>>
>>102386207
Never had luck with Qwen, this piece of shit always output some random chinese tokens at some point kek
>>
>>102390119
pretty good, it doesn't seem to add random fluff like the other models, but it's also not really detailled, it doesn't say the woman is naked, that the dude has pubes etc...
>>
>>102385899
weird.
anthropic shared their system prompt inlcuding the hidden tags.
and sonnet is powerful even without those and no huge ass lag.
>>
>>102370955
Was curious about the source of this image so I went and did a search.
https://www.youtube.com/watch?v=FwFduRA_L6Q
Wow, that's pretty cool. We were able to do so much with so little, that early already.
>>
>>102387222
checked but do you have any logs
>>
>>102390015
>blame the french
I like to cover my bases and blame the french-canadians
>>
>>102390433
Hey, we didn't do anything this time.
>>
File: 233924157423.jpg (218 KB, 1080x1331)
218 KB
218 KB JPG
>feeling burntout on AI
>neutralize samplers
>slightly re-adjust settings, only temp, minp, and dry,
>cum buckets

I LOVE AI AND I LOVE HOW IT JUST WERKS
>>
File: cap.jpg (10 KB, 235x245)
10 KB
10 KB JPG
>Well,well,well... what do we have here? Welcome to my humble abode. I don't bite... much.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>>
>>102390496
prompt issue
also
>when rep pen actually penalizes repetition
use it
>>
>>102390496
the ones that have been bothering me lately are
>you're not so bad, for a
and
>don't think this means anything,
>>
>>102390096
etc etc etc etc etc etcetc how can I assist you today?
>>102390119
I hate that shittytavern made “prefill” mean “prompt suffix” and not “prefill” because adding ‘*’ as a prompt suffix and then prepending it to the response (aka an actual prefill) is god tier for getting dialogue heavy models to shut the fuck up.
>>
>>102390286
>”open”ai
>we noticed people tried to get our prompts. We have sent the Pinkertons to kill their dog.
>anthropic, the ethics people
>we noticed people tried to get our prompts. We noticed they aren’t correct, so here they are in full.
>>
File: 0percent.png (245 KB, 1655x1388)
245 KB
245 KB PNG
> misspelling is a 0% probability token
> Min P is 0.1
> gets selected anyway

Did I screw up something obvious in my params here?
>>
File: 1693483811254255.gif (170 KB, 678x422)
170 KB
170 KB GIF
What model is best for discussing religion and politics. One without guardrails constantly reminding me "genocide is bad" and other stupid shit like that.
>>
>>102390622
The plain white toast of settings lmao
>>
>>102390696
They all 100% lean ultra-liberal by default, so unless that's what you're looking for then you're going to have to work a bit.
"best" is going to be relative to the kind of intellectual sparring partner you want.
You need to think about that ahead of time, put that into words, and use that as the starting context. You'll also probably need a few edits/prefills to get the ball rolling and to nudge it out of any rabbitholes you don't want it going down.
tl;dr Any model can be anything if you put forth some minimal effort
>>
>>102390696
i had a nice chat about greg bahnsen and cornelius van til with a goblin girl on one of those 12b nemo finetunes
>>
>>102390696
Benchmarks are basic at this point, but https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard has candidates.

>>102389294
Does it have most of the understanding/knowledge of wd-tagger as far as the actual NSFW content goes?
>>
>>102385729
we have text (reasoning), vision, voice, knows how to use a computer (large action models), we are missing a modal for physical computing that is genuinely spatial and not just stacking vision/reasoning calls like we saw in that palm demo a while ago
>>
>>102391268
What? The human brain is just stacking vision calls. You think we see in 3D?
>>
In silly, how can i basically disable send?
I want to trigger a response only with a quickreply I already set to execute on send.
But I get 2 messages because "send" is being fired too.
>>
I've got 4GB VRAM and 16GB system ram, what would be better: Quantized high-end model, or regular lower-end model? I don't mind if it's only something like 1-2tok/s, I can walk away from the computer. I just need quality but I know that's a tall order on my machine
>>
>>102391448
You can only really run quantized low end models.
Your best bet is probably mistral nemo q4 or thereabouts.
Actually, start with a llama3 8b based model, see if that works for you.
>>
>>102391448
High parameter models are better even if they are heavily quantized. But they are slower, even if they take up the same amount of RAM as a less quantized low parameter model.
Also reducing context length helps save on RAM. so stick with 2048 or 4096 context even if the model supports e.g. 131072.
>>
>>102385729
>>102385745
>>102385875
>>102385920
>>102386018
>>102386054
>>102386184
>>102386620
sex
with miku
>>
>>102391346
>The human brain is just stacking vision calls
no
>You think we see in 3D?
see chapter 9, 11, 12
>>
File: q.png (115 KB, 2434x566)
115 KB
115 KB PNG
>>102391362
Hope somebody knows this. Its really annoying.
Have my improved CoT quickreply in return. The original from last thread didnt work with nemo at least. (c) anthropic
>>
>>102390622
How does anyone use this shit
Like 95% of these are the equivalent of the toy wheel you give to a child in the car
>>
>>102391496
A quantized version of Llama-3SOME-8B is what I'm using and I'm pretty satisfied with the results, all things considered. Might check others out if you have any more recommendations in that range, it's pretty useable.
>>102391529
Any suggestions with reasonable tok/s count? I don't mind it being a little slow
>>
>>102391549
I don’t need to real some popsci shit to know how eyes work.
>>
>>102391599
>Any suggestions with reasonable tok/s count?
Don't worry about t/s. You will run out of memory with 16+4GB way before you will get intolerable speeds.
>>
>>102391448
you just need 4 more gb of vram and you can use these cool 12b nemo models at comfortable speeds (~12/ts @ 4096 context.)
go sell your xbox and grab a rtx 4060 or some shit.
>>
>"it was... intense"
>her voice dropped to a conspiratorial whisper
>her voice was low and husky
>her voice a seductive purr
>her breath was hot against ___ neck
>sent a shiver up ___ spine

bonus for weird symbolism:
>his erection represented her inner desire for freedom
>>
>>102391346

This is actually quite an interesting topic. When I learned that people who lacked vision for most of their lives and suddenly gained the ability to see were unable to make sense of the 3D world, it blew my mind.
>>
>>102391606
>I don’t need to real some popsci shit to know how eyes work.
good luck stacking "vision calls," surely it will outperform boston dynamics dumb robot
>>
>>102391653
Does more VRAM on a slightly older/lower-end GPU fare better than a generally higher-end GPU with 8gb?
>>
File: error.png (23 KB, 796x262)
23 KB
23 KB PNG
What does this error mean and how do I fix? Kobold won't run
>>
>>102391852
Don't use vulkan, maybe?
>>
>>102390489
For me, it's
>feeling burnt out on AI
>neutralize meme samplers
>use only tfs + top-a and rep penalty
>it's a hundred times better instantly
>>
>>102391852
it means linux is shit
try openCL instead, or that fork of koboldcpp that supports ROCM (unless you're that guy who was having trouble earlier installing rocm on linshit?)
openCL works on basically anything, even the intel iGPU from my laptop.
>>
>>102391852
>r9 200 series
bruh
switch to cpu only
>>
>>102391877
>>102391961
This is windows and using the AMD fork of kobold too.

>>102391970
I-I'll try
>>
>>102392095
then it means windows is shit
>>
>>102392095
well then just use openCL --use-clblast 0 0 or whatever.
don't bother with offloading layers, if you don't have much vram it doesn't actually speed anything up or even reduce system ram consumption by any useful amount.
>>
>>102391825
You don’t have a damn clue how current genAI works. Given sufficient layers the model forms internal 3D representations. Stable diffusion, flux, minimax etc have internal 3D representations that arise spontaneously given sufficient training data. It is the same way the eyes + brain work. Boston dynamics and all of robotics to date is 3D internal representations with 2D input and a tensor of gear commands as output; reinforcement learning for movement control has no explicit 3D encoding; it’s implicit 99% of the time. Explicit 3D representations are for basic SLAM and shit, not AI. You don’t even know what you don’t know.
>>
>>102392135
I'll try that too. Thanks.
>>
>>102392240
>Stable diffusion, flux, minimax etc have internal 3D representations
No they don't you pretentions retard. Stable diffusion cannot actually do anything 3D properly. Anything not directly trained into the model will look horrible if you try and change the default camera angle. e.g. picture of a big tiddy bitch from a drone directly overhead - won't look correct at all. It'll probably try to generate her lying down on the ground.
>>
>>102386620
Das a good miku
>>
https://char-archive.evulid.cc/#/booru/rainewaters/Sasha-chan++pygmalion1230
>>
>>102392830
god i love sasha
>>
>>102390622
dry maybe?
>>
>>102391549
audiobook andys who recommend books to other people should be marched into a furnace
>>
>>102387280
Update, pretty decent imo
its a lot more descriptive than the one I was using before
>>
>>102378613
>COT
What's cot?
>>
File: file.png (10 KB, 414x97)
10 KB
10 KB PNG
haven't tried this shit for a long while
what model do you guys recommend for 16gb vram + 32gb ram?
pic rel were the last ones I tried a few months back
>>
>>102393535
Mistral Nemo 12b or a finetune of it.
>>
>>102393408
chain of thots
>>
File: RandomMikuEncounter.png (1.47 MB, 1216x832)
1.47 MB
1.47 MB PNG
>>102391546
A wild Miku appears!
>>
>>102386269
Pornography is illegal in China, including in written form.
>>
>>102393658
I throw a watermelon at the Miku.
>>
>>102390696
I hate to break it to you but "genocide is bad" is a mainstream opinion that is going to be picked up on by language models even without conscious effort.
>>
>>102388632
Does it improve the context length that's usable at all?
>>
I'm new to all this and started using Donnager-70B-v1-Q4_K_M. I've got a 3090 GPU and 64GB of RAM. The text generations are taking really long. I don't need them to be lightning fast, but 20 seconds to generate a few sentences would be ideal. Should I be looking at a 30B model instead?
>>
>>102390696
Don't even bother. LLMs are midwit machines by design. They literally output the next word that the most people have said. It was impressive to see a machine talk back to you at first but the magic died quickly they you realize they only have the most popular, most predictable opinions that's ever existed.
>>
>>102391828
VRAM capacity >> memory bandwidth > compute
>>
>>102394100
>The text generations are taking really long.
Show your tokens / second and anons may be able to compare with their setups. But for a 70b, you're not gonna get super high speeds on a single card.
For smaller models, there's gemma-2-27b, which doesn't have many finetunes, and mistral nemo 12b.
>>
>>102394108
>They literally output the next word that the most people have said.
Not quite.
They output the next word that the most people have said given the current context, i.e. the conditional probability.
So for political discourse where the way issues are framed is strongly associated with specific political views you're going to get an echochamber machine by default.
>>
>>102394332

1.30T/s
>>
>>102394373
Damn, I'd hoped upgrading vram would get me 2T/s with 70b, but I guess I need 3 cards or something.
>>
>>102389294
>still up
>>
>>102394100
You're in a sad place. I bought another 3090 and I'm getting 15-20 tokens per second on a 70B.

Anyway, to fully fit model into your 3090 and get something like 20-40 tokens per second, you'd want to use Command-R, Qwen 32B, gemma-27B or mixtral. Maybe even Nemo 12B, though I didn't use it for RP so I can't judge is for that.

For models that fit entirely into VRAM, you should be using exl2.
>>
>>102394420

My VRAM is nearly maxed out so it's possible a bottleneck is really slowing it down. It's always showing between 22GB/23GB in usage.

Again, im new to this so i could completely be doing something wrong.
>>
>>102391554
https://litter.catbox.moe/dsjurj.json
Not sure who needs this but this works well for me even on nemo 12b.
Executes when the user posts a message.
You will get 1 CoT message in spoiler before the AI reply, so you can swipe.
All previous CoT messages will be deleted to save context and avoid repetition.
A big problem was that the AI is falling back to the assitant mode for out of RP CoT.
To avoid that the CoT is written from the perspective of the card.

Not really sure if it actually improves output though, need to test more.
>>
>>102394462
>you'd want to use Command-R, Qwen 32B, gemma-27B or mixtral. Maybe even Nemo 12B, though I didn't use it for RP so I can't judge is for that.
>For models that fit entirely into VRAM, you should be using exl2.

Are models supposed to be difficult to find? Even if i find the right model name, it will show a download to a safetensors, GGUF, or exl2 extension but only 1 of those 3.
Can you link me to an exl2 you'd think would suit me?
>>
We have to give it to OpenAI for bringing CoT to the masses
>>
>>102394506
Here's what I've been using:
https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5

Here's a newer version that I never used:
https://huggingface.co/lucyknada/CohereForAI_c4ai-command-r-08-2024-exl2/tree/3.0bpw (you'd really want 3.5 bit rather than 3.0, but there don't seem to be any)

If you're using ooba, you're good. If you're using llamacpp or kobold, you can't do exl2 so don't bother—just find a 20-22GB gguf of any of those models and use that. It's going to be a bit slower than exl2 but still way way faster than your 1T/s.
>>
>>102394557
Local really deserved to get fucked over in this case. We've had it for a year and a half and nobody cared about it after llama1. Maybe now someone will work on a proper front-end for local models and not the horrible options we have now.
I really with ST just died as a project.
>>
>>102394565
>Here's what I've been using:
>https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5
>Here's a newer version that I never used:
>https://huggingface.co/lucyknada/CohereForAI_c4ai-command-r-08-2024-exl2/tree/3.0bpw (you'd really want 3.5 bit rather than 3.0, but there don't seem to be any)

Even here though i only see safetensor models, not exl2. am i missing something?
>>
>>102394645
exl2 uses safetensors as a container
>>
File: amdahls_law.png (167 KB, 1536x1152)
167 KB
167 KB PNG
>>102394420
The unfortunate reality is that the way speed scales with VRAM is highly nonlinear, see pic.
With 2x RTX 4090 for 70b q4_K_M on an empty context I get 1432 t/s prompt processing and 20.17 t/s token generation speed (the latter of which should be about the same as for 2x RTX 3090).
But even with 48 GB VRAM you won't be able to fit a lot of context while at the same time the retardation from quantization becomes way worse when you go below 4 bit.
>>
>>102394645
safetensors is a data storage format in the same vein as JSON, and it can have anything you like inside. Both transformers and exllama2 store their models in files with safetensors extensions. If you want to make sure it's exl2, you can look for exl2 text in config.json.
>>
>>102394623
ST will eventually become a Frankenstein monster. It will have so many convoluted features and configs that your use case will be covered, but you will have to fiddle and fuck around. This however will discourage attempts to improve and do things properly because hey you can already do that in ST bro
>>
>>102394679
this. offloading doesn't do jack shit unless you offload all of it.
>>
>>102394679
I don't need much of a speedup, I just want 2T/s, and I get 1.5 now, but what people say leads me to believe that even tripling my vram from 8 to 24 wouldn't get me there.
>>
>>102394719
Well, if you look at 1.5 to 2, it doesn't seem like much, but percentage wise you want a 33% increase.
You can see on the graph that not much happens until you get about 80% offloaded.
Gotta wait for bitnet, or something.
>>
>>102394679
08:41:19-299879 INFO     Loaded "miqu-1-70b.q4_k_m.gguf" in 8.65 seconds.                                                                                                                                                                                    
08:41:19-301243 INFO LOADER: "llama.cpp"
08:41:19-302090 INFO TRUNCATION LENGTH: 16384
08:41:19-302983 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"

llama_print_timings: load time = 189.50 ms
llama_print_timings: sample time = 104.79 ms / 73 runs ( 1.44 ms per token, 696.64 tokens per second)
llama_print_timings: prompt eval time = 188.31 ms / 12 tokens ( 15.69 ms per token, 63.72 tokens per second)
llama_print_timings: eval time = 3994.72 ms / 72 runs ( 55.48 ms per token, 18.02 tokens per second)
llama_print_timings: total time = 4555.07 ms / 84 tokens
Output generated in 5.19 seconds (13.87 tokens/s, 72 tokens, context 12, seed 839788421)
Llama.generate: 12 prefix-match hit, remaining 83 prompt tokens to eval
>>
Best model for 48GB VRAM and 32GB RAM nowadays? Generally, what's the best model for RP nowadays?
>>
>>102394761
For comparison:

08:44:02-643094 INFO     Loaded "Dracones_Midnight-Miqu-70B-v1.5_exl2_4.0bpw" in 19.40 seconds.                                                                                                                                                              
08:44:02-644208 INFO LOADER: "ExLlamav2_HF"
08:44:02-645219 INFO TRUNCATION LENGTH: 16384
08:44:02-646098 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
Output generated in 9.07 seconds (14.88 tokens/s, 135 tokens, context 12, seed 1859634060)
Output generated in 2.77 seconds (11.54 tokens/s, 32 tokens, context 157, seed 570974459)
Output generated in 3.24 seconds (15.45 tokens/s, 50 tokens, context 157, seed 2012484294)
Output generated in 3.76 seconds (13.82 tokens/s, 52 tokens, context 218, seed 275116343)
Output generated in 31.88 seconds (16.06 tokens/s, 512 tokens, context 281, seed 1587517452)


Previously I had issues loading llamacpp on two 3090s, but now it seems to work fine. Maybe ooba update fixed it.

>>102394789
I use Mistral Large 2.75bpw exl2, 16k context. There is anon who think I'm a fool for doing that. Let's hear what he says.
>>
>>102394808
>>102394679
Oh, and it seems that enabling row_split option in ooba does not bring any improvement. Is it meant to be like that?

08:48:11-147214 INFO     Loaded "miqu-1-70b.q4_k_m.gguf" in 9.00 seconds.
08:48:11-148541 INFO LOADER: "llama.cpp"
08:48:11-149389 INFO TRUNCATION LENGTH: 16384
08:48:11-150260 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"

llama_print_timings: load time = 212.80 ms
llama_print_timings: sample time = 134.78 ms / 94 runs ( 1.43 ms per token, 697.43 tokens per second)
llama_print_timings: prompt eval time = 211.33 ms / 12 tokens ( 17.61 ms per token, 56.78 tokens per second)
llama_print_timings: eval time = 5843.25 ms / 93 runs ( 62.83 ms per token, 15.92 tokens per second)
llama_print_timings: total time = 6548.52 ms / 105 tokens
Output generated in 7.18 seconds (12.95 tokens/s, 93 tokens, context 12, seed 824418286)
Llama.generate: 12 prefix-match hit, remaining 108 prompt tokens to eval

llama_print_timings: load time = 212.80 ms
llama_print_timings: sample time = 724.48 ms / 502 runs ( 1.44 ms per token, 692.91 tokens per second)
llama_print_timings: prompt eval time = 751.84 ms / 108 tokens ( 6.96 ms per token, 143.65 tokens per second)
llama_print_timings: eval time = 31868.10 ms / 501 runs ( 63.61 ms per token, 15.72 tokens per second)
llama_print_timings: total time = 36406.36 ms / 609 tokens
Output generated in 37.04 seconds (13.52 tokens/s, 501 tokens, context 121, seed 1890250888)
>>
>>102394760
Just need an affordable 32gb card to come to market then. Or get 2 16gb ones maybe?
>>
>>102394854
>affordable 32gb card
que
>>
>>102394854
>Just need an affordable 32gb card
Hahaha
>Or get 2 16gb ones maybe?
No that doesn't really make much sense. Better just get 3090, really. You lose too much speed splitting between slower cards.
>>
>>102394868
3090 won't get me any speedup though, as was established unless I can offload 80%.
>>
>>102394841
2x+ prompt processing speed?
>>
>>102394868
Splitting between cards loses no speed. You're gonna be at least as fast as on one card with same specs.

>>102394875
If you're on 8GB currently, try Nemo. Yes, it's not a 70B, but if you never used it before, you gotta at least try.

>>102394891
in >>102394761 it's 12 tokens ( 15.69 ms per token, 63.72 tokens per second)
in >>102394841 it's 12 tokens ( 17.61 ms per token, 56.78 tokens per second)
The 143.65 is just because it's more tokens to process.
>>
>>102394914
>Splitting between cards loses no speed.
Really? Last time I've heard that the more you split the more you lose because cards aren't working in parallel unless you have nvlink or something.
>>
>>102394937
Well, they're not, but each is doing its work with its original speed, which is what you get in the end. You don't get 2x boost by using two cards, but you also do dot get slower than 1x.

row_split option option for llamacpp should get the cards to work in parallel for fc layers, but it doesn't seem to be as seen in >>102394841
>>102394761.
>>
>>102394914
What i'm reading is:
>15.69 ms per token
>17.61 ms per token
>6.96 ms per token
On prompt eval time. I don't know which has row split, but the third one is going much faster.
>>
>>102394974
First message has no row split, second message has row.
>>
>>102391740
I often encounter "her voice ____" slop, yeah
>>
>>102394719
>>102394875
Another 3090 will definitely will get you a speedup, but the increase is nonlinear.
Alternatively, since the biggest bottleneck for CPU+GPU hybrid inference is the RAM bandwidth upgrading/overclocking your RAM would get you better performance at a comparatively cheaper cost than adding an extra GPU.
(I hope I don't have to tell you to enable XMP.)

>>102394841
--split-mode row needs a lot more optimization, right now it's only really worthwhile for GPUs that are comparatively slow vs. the interconnect speed.
So unless you have the 3090s connected via NVLink I would not expect the performance to be better.
>>
>>102394565
>https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5

Okay, i'm using c4ai-command-r-v01-Q4_K_M.gguf and getting 3.94T/s. That's an improvement, but how are people getting like 20T/s?
>>
>>102395180
What videocard? What software?
>>
>>102395188
3090RTX using Koboldcpp.
I'm usually loading the defaults: 512 batch size and 4096 context size. I could lower the batch size to 256 but im already have the model follow instructions as it is.
>>
>>102395212
It should be a lot faster. Maybe you still have offloading to CPU set up in settings? The model should be entirely in GPU for speed. Maybe something eating up your VRAM? Ctrl+shift+esc.
>>
>>102395232

Before loading Kobold my OS is using 1GB of VRAM, so the amount of VRAM i can allocate towards this is 23GB instead of 24, but i don't think that's a big deal.

GPU Layers is set to -1 which i think means it doesn't offload to CPU?
>>
smedrins
>>
>>102395276
Try smaller quant then. Dunno.

System taking 1GB is a lot. I'm own to about 200MB on my Windows machine.
>>
File: kobold.png (46 KB, 467x207)
46 KB
46 KB PNG
>>102395289

It looks like 1GB of VRAM is getting delegated elsewhere. Would this be slowing it down?
>>
>>102395353
Just try a smaller quant. Yes, 1GB shared VRAM would be enough to fuck the speed down to 4T/s.
>>
Does anyone know to prefill assistant's response using chat competitions API?
>>
File: ClipboardImage.png (41 KB, 1149x177)
41 KB
41 KB PNG
How do I fix this shit? Mistral nemo base. q4KM. Context 131072, temperatuire 0.5, rep pen 1.15. Is it bad settings or bad model?
>>
>>102395657
I reduced context to 65536 and it fixed the problem (for now). I thought nemo was supposed to be a 128k context model?
>>
>>102395657
>>102395669
nope, it shat itself again. gonna try q8_0
>>
>>102395657
bad model
>>
>>102395657
>>102395669
models always claim super high context and can never deliver, always assume at best it might handle half of what they claim, if even that
>>
>>102395657
rep pen too high
>>
>>102395364

I'm still off by about 0.5GB VRAM. if i go down another quant it says it is low quality so i really dont want to use it.
>>
>>102395831
>can never deliver
llama 3.1 does
>>
>>102395364
>>102395837
I forgot what it's called, but I think there was some Windows option that disables automatic VRAM swapping.
>>
crazy how more dumb gemma 27b is compared to 12b nemo.
basic stuff like knowing what I did while not have been present.1 message apart.....ooc help included ..nemo trips up less.
>>
>>102395846
Maybe 405 handles 128k, but it's an exception certainly not the rule
>>
>>102385729
>Aunt Clara was a force of nature. She was a groomer by profession, running the most successful dog grooming parlor in town.
Damn, I got outsmarted by Mistral Large.
>>
>>102395831
Jamba wins again
>>
>>102395895
70b also handles 128k, and 8b handles at least 32k which is more than any nemo finetune including the official instruct despite being 2/3 of their size
mistral just fucked up when training nemo
>>
>>102385729
Please prune the "Getting Started" links, some of them have not been updated since 2023 and are outdated garbage.
Save people wasting time.
>>
>>102393658
ANON used PLAP!
It's super effective!
MIKU is PREGNANT!
It can't move!
>>
>>102395837
dont let the elitists scare you get an iq2xxs or iq2xs (2.5bit ish quant) 70b gguf will do you fine and be better than sub 70b stuff. quants are FAR less important (2.5 bit minimum) than people make them out to be. 2.5bit 70b > anything sub 70 at any quant. I say this as someone who can only run 70b at shitty t/s and can run anything sub 70 alright.
>>
>>102396026
Too bad llama3/.1 is awful at rp
also 70b is only 66.6 at 128, not really that great
https://github.com/hsiehjackson/RULER
and most other models indeed claim high and deliver way low
>>
>>102396123
New CR32 does better than new CR+ huh?
>>
>>102396123
>all that slop in new CR+
>barely an improvement
Cohere lost.
>>
>>102396123
Something to consider that that list isn't showing is that quantization can kill long-context performance.
>>
>>102396205
Never heard anyone claim that, and then there's this
>>102396104
What to believe?
>>
>>102396205
I've heard that it's exl2 issue due to calibration dataset being short, ggufs without imatrix should be fine.
>>
>>102396290
>>102396290
>>102396290
>>
File: 1699112185431576.png (32 KB, 864x438)
32 KB
32 KB PNG
>>102396222
look at these perplexity scores of qwen 1.5 models. lower is better.
>>
>>102396325
oh i forgot that without the optimized q2 quantization methods.
>>
>>102396325
Perplexity only tells you how well, after quantization, the model is retaining the information it's memorized during pretraining.
>>
>>102396361
point is, look how little difference it makes. you really think that's enough difference to put it below models of MUCH smaller size?
>>
>>102396046
I would like to, the OP template only has 3 free characters.
I skimmed through them just now and they all seem to be covering different topics. Most of them still seem fine if you ignore the model sections. I don't use rocm so I can't judge that one.
We need someone to volunteer to write a new consolidated getting started guide. Meantime, I might start dropping them one by one if I need more space for the news.
>>
>>102396421
There's more than information memorization to model performance. What about attention to detail in context, how well it's capable to draw logical conclusions and extract facts from it, etc? Just calculating how well the model is capable of reproducing some text it's seen many times during pretraining isn't really painting a complete picture of the damage that quantization does.
>>
>>102396425
Thank you for considering doing this.
As someone essentialy new to the topic, when i am told repeatedly to READ the instructions and then find out they are the best part of a year old and having a basic understanding that AI changes weekly it's disheartening and not encouraging that the basic noob guide may be starting someone off a year behind everyone else and the "basics" may provide a substandard starting point given the new developments.
Some community guided updates would be immensely useful as the general technical level of the threads is far far above "starter" level.
Thanks again.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.