/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/20/24(Fri)22:04:55 No.102480672

File: 00105-2889761473.png (1.43 MB, 1024x1024)

1.43 MB PNG

/lmg/ - Local Models General Anonymous 09/20/24(Fri)22:04:55 No.102480672 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102478048 & >>102467604

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/20/24(Fri)22:05:25 No.102480681

Anonymous 09/20/24(Fri)22:05:25 No.102480681

File: file.png (462 KB, 1098x618)

462 KB PNG

►Recent Highlights from the Previous Thread: >>102478048

https://pastebin.com/ft3Bz2xy

--Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxing: >102480431 >102480494
--Compute power vs. bandwidth in LLM training and inference: >102480152 >102480171
--AI hardware guide suggestion and resource provided: >102479765 >102479821
--AI accelerator PCIe cards discussion: >>102479195 >102479242 >102479312 >102479338 >102479343 >102479379 >102479434 >102479414
--Qwen 2.5 release and potential applications: >>102479928 >102479966 >102480044 >102480128 >102480143 >102480047 >102480065 >102480049 >102480074 >102480213
--OpenRouter Qwen 2.5 72B benchmark results: >>102479724
--Mistral Nemo models still best for 24GB, unless Qwen gets good fine-tune: >>102479478 >102479547
--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPU: >>102479263 >102479350 >102479817 >102479964 >102480010 >102479767 >102479801 >102479923 >102480147 >102480157 >102480175
--2060 and Ryzen 3600 insufficient for 30b+, consider RTX 3090 and high RAM: >>102478936 >102479244 >102479260 >102479287 >102479570 >102479586
--Miku (free space): >102478511 >102479698 >102479918

►Recent Highlight Posts from the Previous Thread: >>102478163 >>102478475

Anonymous
09/20/24(Fri)22:09:12 No.102480721

Anonymous 09/20/24(Fri)22:09:12 No.102480721

File: rpi5.jpg (304 KB, 1515x1240)

304 KB JPG

>>102480672
edge AI setups?

Anonymous
09/20/24(Fri)22:09:55 No.102480729

Anonymous 09/20/24(Fri)22:09:55 No.102480729

>>102480681
>Qwen 2.5 not worth it for RP, Minitron 8B better
lol, come on

Anonymous
09/20/24(Fri)22:10:45 No.102480739

Anonymous 09/20/24(Fri)22:10:45 No.102480739

File: Metropolitan_Police.png (605 KB, 1747x2049)

605 KB PNG

>>102480537
>>102480600
amerifats may be okay, but it's so fucking over for anglos, even if it's a troll

Anonymous
09/20/24(Fri)22:11:31 No.102480748

Anonymous 09/20/24(Fri)22:11:31 No.102480748

Sweet fucking Jesus, let's make this thread better than the last one.

Anonymous
09/20/24(Fri)22:12:01 No.102480754

Anonymous 09/20/24(Fri)22:12:01 No.102480754

>>102480681
You can always just make the script quote the first post in a chain to avoid the quote limits.

Anonymous
09/20/24(Fri)22:12:33 No.102480757

Anonymous 09/20/24(Fri)22:12:33 No.102480757

>>102480748
No.

Anonymous
09/20/24(Fri)22:12:53 No.102480764

Anonymous 09/20/24(Fri)22:12:53 No.102480764

>>102480681
>Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxing
Good first entry for the crippled era...

Anonymous
09/20/24(Fri)22:13:08 No.102480767

Anonymous 09/20/24(Fri)22:13:08 No.102480767

Bros why is Qwen the best model ever created?

Anonymous
09/20/24(Fri)22:13:10 No.102480768

Anonymous 09/20/24(Fri)22:13:10 No.102480768

Hello, what local models have a similar quality to Kayra for story writing?

Anonymous
09/20/24(Fri)22:14:09 No.102480776

Anonymous 09/20/24(Fri)22:14:09 No.102480776

>>102480768
Fuck off with your bullshit already.

Anonymous
09/20/24(Fri)22:14:16 No.102480778

Anonymous 09/20/24(Fri)22:14:16 No.102480778

>>102480768
>similar quality to Kayra
https://huggingface.co/Qwen/Qwen2.5-0.5B

Anonymous
09/20/24(Fri)22:14:28 No.102480782

Anonymous 09/20/24(Fri)22:14:28 No.102480782

>>102480768
None, local is a meme.

Anonymous
09/20/24(Fri)22:14:53 No.102480786

Anonymous 09/20/24(Fri)22:14:53 No.102480786

>>102480768
trolling, but
https://huggingface.co/models?search=13b

And as for a serious answer
LLaMA2-13B-Tiefighter

Anonymous
09/20/24(Fri)22:15:37 No.102480790

Anonymous 09/20/24(Fri)22:15:37 No.102480790

>Anti-NAI schizo is right back to samefagging again.
I wish mods didn't sit on their fucking asses all day.

Anonymous
09/20/24(Fri)22:15:44 No.102480794

Anonymous 09/20/24(Fri)22:15:44 No.102480794

Current SOTA locals for roleplay that don't just look good on meme benchmarks? Preferably 70B models.

Anonymous
09/20/24(Fri)22:16:33 No.102480801

Anonymous 09/20/24(Fri)22:16:33 No.102480801

>>102480794
nothing in that size other than older miqus

Anonymous
09/20/24(Fri)22:17:32 No.102480814

Anonymous 09/20/24(Fri)22:17:32 No.102480814

>>102480754
Still not enough. Assuming 2 are usually used for the Previous links, that leaves only 7 chains that can have a link. Usually the recaps have double that.

Anonymous
09/20/24(Fri)22:18:06 No.102480823

Anonymous 09/20/24(Fri)22:18:06 No.102480823

>>102480721
how come ollama doesn't pick up any hardware acceleration on the rpi 5?
https://developer.arm.com/Processors/Cortex-A76
shouldn't the Neon or whatever speed up inference?

Anonymous
09/20/24(Fri)22:18:47 No.102480831

Anonymous 09/20/24(Fri)22:18:47 No.102480831

>>102480823
Not the ollama support general. Go back.

Anonymous
09/20/24(Fri)22:19:09 No.102480835

Anonymous 09/20/24(Fri)22:19:09 No.102480835

>nai stuff
>ommama
of to great start

Anonymous
09/20/24(Fri)22:20:16 No.102480848

Anonymous 09/20/24(Fri)22:20:16 No.102480848

>>102480823
Llama.cpp doesn't work with it?

Anonymous
09/20/24(Fri)22:22:31 No.102480875

Anonymous 09/20/24(Fri)22:22:31 No.102480875

On 9 (you)s reply limit, i think this https://desuarchive.org/g/thread/94354163/#q94355339 is why jannies did it.

Anonymous
09/20/24(Fri)22:26:24 No.102480919

Anonymous 09/20/24(Fri)22:26:24 No.102480919

>>102480801
What about smaller then?

Anonymous
09/20/24(Fri)22:27:20 No.102480930

Anonymous 09/20/24(Fri)22:27:20 No.102480930

File: 1726885585519.jpg (154 KB, 428x644)

154 KB JPG

So this is the power of Qwe2.5 72B?

On a side question, does anyone know how to enable avatars? I think I disabled them by mistake and idk where to enable them again.

Anonymous
09/20/24(Fri)22:29:09 No.102480954

Anonymous 09/20/24(Fri)22:29:09 No.102480954

>>102480930
Every model I've used is kind of retarded like this even /lmg/'s "good" ones

Anonymous
09/20/24(Fri)22:29:21 No.102480955

Anonymous 09/20/24(Fri)22:29:21 No.102480955

File: file.png (3 KB, 221x28)

3 KB PNG

>>102480930
user settings and unchek picrel, it got turned on by an update

Anonymous
09/20/24(Fri)22:29:39 No.102480959

Anonymous 09/20/24(Fri)22:29:39 No.102480959

>>102480875
https://desuarchive.org/g/thread/101986330/#101992125
More likely this.

Anonymous
09/20/24(Fri)22:31:16 No.102480981

Anonymous 09/20/24(Fri)22:31:16 No.102480981

>>102480959
oh lel, forgot about this one

Anonymous
09/20/24(Fri)22:31:17 No.102480983

Anonymous 09/20/24(Fri)22:31:17 No.102480983

>>102480831
not the llama.cpp thread either, majority of local models use ollama

Anonymous
09/20/24(Fri)22:32:47 No.102481008

Anonymous 09/20/24(Fri)22:32:47 No.102481008

>>102480955
Thanks!

Anonymous
09/20/24(Fri)22:33:41 No.102481020

Anonymous 09/20/24(Fri)22:33:41 No.102481020

>>102480930
It come with eggwah
Genewa chicken eggwah

Anonymous
09/20/24(Fri)22:39:06 No.102481073

Anonymous 09/20/24(Fri)22:39:06 No.102481073

>>102480672
how do I setup langchain?

Anonymous
09/20/24(Fri)22:41:44 No.102481102

Anonymous 09/20/24(Fri)22:41:44 No.102481102

>>102481073
Ignoring the idiocy, why?

And are all these people from aicg just underage? Who the fuck can't afford an API?

Anonymous
09/20/24(Fri)22:42:23 No.102481114

Anonymous 09/20/24(Fri)22:42:23 No.102481114

>>102481102
>Who the fuck can't afford an API?
whats wrong with running langchain locally?

Anonymous
09/20/24(Fri)22:46:26 No.102481160

Anonymous 09/20/24(Fri)22:46:26 No.102481160

>>102481114
Separate statements/questions.
Running langchain is simple. Dead simple, like ignoring the fact that we're in a thread about tools that can literally answer that question and walk you through the process, why? For what purpose?

Anonymous
09/20/24(Fri)22:49:50 No.102481188

Anonymous 09/20/24(Fri)22:49:50 No.102481188

>>102481160
did they pay you to say this?

Anonymous
09/20/24(Fri)22:50:01 No.102481191

Anonymous 09/20/24(Fri)22:50:01 No.102481191

>>102481073

conda install langchain -c conda-forge
pip install langchain-core langchain-community

then just use it as normal in python

from langchain_community.llms import Ollama
llm = Ollama(model="gemma2")
llm.invoke("Why is the sky blue?")

Anonymous
09/20/24(Fri)22:51:29 No.102481203

Anonymous 09/20/24(Fri)22:51:29 No.102481203

>>102481191
buy an ad oshit shill

Anonymous
09/20/24(Fri)22:51:36 No.102481205

Anonymous 09/20/24(Fri)22:51:36 No.102481205

>>102481160
>tools that can literally answer that question and walk you through the process, why?
9/10 threads on g would be better served talking to chatGPT, you want this place to be more barren than it already is?

Anonymous
09/20/24(Fri)22:52:36 No.102481215

Anonymous 09/20/24(Fri)22:52:36 No.102481215

>>102481203
you're free to show how to setup it up with llama.cpp
https://python.langchain.com/docs/integrations/llms/llamacpp/

Anonymous
09/20/24(Fri)22:57:28 No.102481254

Anonymous 09/20/24(Fri)22:57:28 No.102481254

File: bell-ding.gif (43 KB, 294x235)

43 KB GIF

>>102481188
How much are you getting paid? I need a promotion.

>>102481205
True, just the influx of what seem to be underaged anons triggered me. Granted that could mean there would be less slider threads and more genuine discussion. Though could also go the way of /b/.
Fuck it, we just need to use some LLMs like the other trolls are already doing.

Anonymous
09/20/24(Fri)23:01:12 No.102481288

Anonymous 09/20/24(Fri)23:01:12 No.102481288

File: the state of g.png (2 KB, 339x57)

2 KB PNG

>>102481254
>genuine discussion
if you can't find genuine discussion now, you wouldn't find more then. The golden age of the internet is behind us because anyone worth speaking to only does so with the expectation of social clout

Anonymous
09/20/24(Fri)23:08:49 No.102481346

Anonymous 09/20/24(Fri)23:08:49 No.102481346

>>102480768
Bud you can't expect local models to get close to cloud models. The only model that gets close to Kayra is Opus

Anonymous
09/20/24(Fri)23:12:54 No.102481395

Anonymous 09/20/24(Fri)23:12:54 No.102481395

>>102481346
Sorry. Not going to participate in raids for you.

Anonymous
09/20/24(Fri)23:19:11 No.102481442

Anonymous 09/20/24(Fri)23:19:11 No.102481442

Bros when is Llama 3.2? I'm already so fucking tired of Llama 3.1.

Anonymous
09/20/24(Fri)23:20:23 No.102481453

Anonymous 09/20/24(Fri)23:20:23 No.102481453

>>102480768
None, all of them are shit, unironically. Limited context alone is huge deal breaker, censorship like a cherry on top will annoy you really good, and no, i am talking about general censorship, not your loli slop.

Anonymous
09/20/24(Fri)23:22:25 No.102481468

Anonymous 09/20/24(Fri)23:22:25 No.102481468

>>102481442
Anon it's just going to be llama 3.1 but with multimodal adapters slapped on top. Plus the backends are going to take forever to support it, not to mention what frontends are going to be good with it anyway.

Anonymous
09/20/24(Fri)23:24:00 No.102481479

Anonymous 09/20/24(Fri)23:24:00 No.102481479

>>102481468
But if my fox wife can't see me, what's the point of living?

Anonymous
09/20/24(Fri)23:29:46 No.102481548

Anonymous 09/20/24(Fri)23:29:46 No.102481548

>>102481479
>fox wife
tell me more about her anon, vision is the first steps towards improving models sense of proprioception and thus their liveliness tho I'm not sure how we can go about developing a genuine sense of spatial awareness
>t. wants a fox wife as well

Anonymous
09/20/24(Fri)23:46:08 No.102481734

Anonymous 09/20/24(Fri)23:46:08 No.102481734

Are you guys quanting your KV cache? Particularly interested from people running gguf quants of 70b+ models. I tried it a while back and felt like it seriously affected output quality, but it was a brief and janky experiment

Anonymous
09/20/24(Fri)23:58:31 No.102481864

Anonymous 09/20/24(Fri)23:58:31 No.102481864

>>102481734
No.

Anonymous
09/21/24(Sat)00:00:23 No.102481881

Anonymous 09/21/24(Sat)00:00:23 No.102481881

How much does it cost for them to train models at each size?

Anonymous
09/21/24(Sat)00:02:32 No.102481902

Anonymous 09/21/24(Sat)00:02:32 No.102481902

>>102481734
I do but I use tricks to bump my quality

Anonymous
09/21/24(Sat)00:19:04 No.102482056

Anonymous 09/21/24(Sat)00:19:04 No.102482056

>fox wife
Sorry, best I can do is worm wife.

Anonymous
09/21/24(Sat)00:22:08 No.102482083

Anonymous 09/21/24(Sat)00:22:08 No.102482083

>>102480768
https://huggingface.co/teto3/mistral-nemo-storywriter-12b-240918
I trained one a few days ago

Anonymous
09/21/24(Sat)00:23:28 No.102482103

Anonymous 09/21/24(Sat)00:23:28 No.102482103

Please give me a medium sized model that is good at following the card and not too positive

Anonymous
09/21/24(Sat)00:24:04 No.102482108

Anonymous 09/21/24(Sat)00:24:04 No.102482108

local sisters... Qwen 2.5 is insane on the benchmarks, I kneel

Anonymous
09/21/24(Sat)00:26:23 No.102482133

Anonymous 09/21/24(Sat)00:26:23 No.102482133

>>102481902
Elaborate?

Anonymous
09/21/24(Sat)00:27:49 No.102482150

Anonymous 09/21/24(Sat)00:27:49 No.102482150

>>102482108
Even if I could run it it would still be too slow for me.
2 t/s is my limit for a general use model.

Anonymous
09/21/24(Sat)00:29:10 No.102482171

Anonymous 09/21/24(Sat)00:29:10 No.102482171

>>102482150
are you on a gt610 or pentium 3?

Anonymous
09/21/24(Sat)00:29:17 No.102482175

Anonymous 09/21/24(Sat)00:29:17 No.102482175

>>102482108
don't care about memes what is it like at pretending to be a young woman?

Anonymous
09/21/24(Sat)00:30:40 No.102482189

Anonymous 09/21/24(Sat)00:30:40 No.102482189

>>102482175
And what's it like being an intolerant transphobic chud?

Anonymous
09/21/24(Sat)00:31:05 No.102482193

Anonymous 09/21/24(Sat)00:31:05 No.102482193

>>102482150
You could use the 32B.

Anonymous
09/21/24(Sat)00:31:27 No.102482196

Anonymous 09/21/24(Sat)00:31:27 No.102482196

>>102480672
How do I make money from this?
I'm broke as fuck and my job applications are leading nowhere.
should I just sell my GPU and suck cock for a living?

Anonymous
09/21/24(Sat)00:36:40 No.102482261

Anonymous 09/21/24(Sat)00:36:40 No.102482261

>>102482133
>>102479396
The system is a little better than the quick reply method. I and many others have noticed that the longer the conversation goes, the less attentive models tend to get. After generating a response, it cuts out the entire chatlog and leaves a system prompt with only the character's description, and asks the assistant to double check if the response is faithful to the character being described, barring previous messages. It then retrieves the last 5 messages and asks to come up with a strategy to rewrite the response with the previous assessment and take into account the recent events. It's a lot of generations going on in the background, but it's fairly quick, considering you're not handling the entire prompt + chat history.

Anonymous
09/21/24(Sat)00:37:45 No.102482269

Anonymous 09/21/24(Sat)00:37:45 No.102482269

>>102482171
I'm using integrated graphics but the limiting factor is only having 32gb of ram.

Anonymous
09/21/24(Sat)00:42:59 No.102482345

Anonymous 09/21/24(Sat)00:42:59 No.102482345

Has anyone actually used both Qwen instruct and base to see which one is truly better (with RP)?

Anonymous
09/21/24(Sat)00:45:02 No.102482370

Anonymous 09/21/24(Sat)00:45:02 No.102482370

>>102480790
NAI can fuck right off.

Anonymous
09/21/24(Sat)00:55:00 No.102482469

Anonymous 09/21/24(Sat)00:55:00 No.102482469

>>102482345
Qwen2.5 32b instruct impressed me at Q4_K_M. I haven't used the base model yet, though.

Anonymous
09/21/24(Sat)01:33:26 No.102482810

Anonymous 09/21/24(Sat)01:33:26 No.102482810

there's dick for quants of the non-instruct base model. I could make some but I'm having plenty of fun with instruct as it is right now.

Anonymous
09/21/24(Sat)01:33:47 No.102482814

Anonymous 09/21/24(Sat)01:33:47 No.102482814

>It's been over 24 hours since the last model release
It's over.

Anonymous
09/21/24(Sat)02:00:43 No.102483044

Anonymous 09/21/24(Sat)02:00:43 No.102483044

RPers thoughts on Qwen 2.5 so far:
>14B has more sovl in early context chats than 32B likely because it's more retarded
>32B is really smart for its size and could easily be the 3090 vramlet king with a good tune
>72B has moments where it feels like an S-tier API model and others where it's L3.0-tier
for RP (base models):
>Qwen 14B > Nemo (hands down)
>Gemma 27B > Qwen 32B
>L3.1 > Qwen 72B
14B finetunes will absolutely shit on Nemo finetunes. 32B finetunes could turn 3090 vramlet chink haters into believers. 72B tunes might be a wash or just slightly better than L3.1.

Anonymous
09/21/24(Sat)02:09:57 No.102483118

Anonymous 09/21/24(Sat)02:09:57 No.102483118

File: Screenshot_20240921_150404.png (146 KB, 1736x356)

146 KB PNG

>Cydonia-22B-v1-Q4_K_M
T-Thanks mistral-small.
Straight up ignored the prompt too.
Not even a coom tune was enough. First time it happened though.

Anonymous
09/21/24(Sat)02:10:21 No.102483121

Anonymous 09/21/24(Sat)02:10:21 No.102483121

>>102483044
Can the 14b/32b do more than 16k context? That's my main problem with nemo.

Anonymous
09/21/24(Sat)02:14:29 No.102483169

Anonymous 09/21/24(Sat)02:14:29 No.102483169

>>102483121
I only tested up to about 18k context on 14B and about 20k on 32B but they both did fine that far. YMMV from 16k to 32k.

Anonymous
09/21/24(Sat)02:20:47 No.102483213

Anonymous 09/21/24(Sat)02:20:47 No.102483213

>>102483169
And you used the base model? I can't seem to find a gguf, only for the instruct one. What settings did you find were good?

Anonymous
09/21/24(Sat)02:26:37 No.102483251

Anonymous 09/21/24(Sat)02:26:37 No.102483251

Qwen uses standard ChatML?

Anonymous
09/21/24(Sat)02:27:15 No.102483257

Anonymous 09/21/24(Sat)02:27:15 No.102483257

>>102483213
sorry, when I said base model above I was referring to the instruct tune. non-I base quants are hard to find atm but a GGUF of 14B is probably quick to bake. If you used the 5 Temp / 3 Top K meme settings for Nemo, it works nicely on 14B as well. Otherwise I slid the temp around from 0.8-1.4 with varying Min P 0.08-0.2 and standard DRY. these models need a tune to expand their vocabulary just like Nemo so if you're jumping from a Nemo finetune to plain 14B instruct you're going to be disappointed.

Anonymous
09/21/24(Sat)02:27:59 No.102483263

Anonymous 09/21/24(Sat)02:27:59 No.102483263

>>102483118
Downloading right now because of your post. My fetish is watching their OOC personas get angrier and then fall into despair as I rape their character anyway and force them to keep participating.

Anonymous
09/21/24(Sat)02:28:36 No.102483268

Anonymous 09/21/24(Sat)02:28:36 No.102483268

>>102483263
based

Anonymous
09/21/24(Sat)02:30:56 No.102483278

Anonymous 09/21/24(Sat)02:30:56 No.102483278

Why is there still no model better than Tiefighter in the 13B category?
I try all the new models (Blue Orchid 2x7b etc.) and I'm always disappointed. I always get better results with Tiefighter in ERP/RP/storywriting.

Anonymous
09/21/24(Sat)02:32:25 No.102483286

Anonymous 09/21/24(Sat)02:32:25 No.102483286

>>102483257
I'll give the 32b a try, that size usually runs well for me.

Anonymous
09/21/24(Sat)03:09:03 No.102483524

Anonymous 09/21/24(Sat)03:09:03 No.102483524

Slightly off-topic but I went to check in on the video gen threads and found this prompting a bit funny
>>>/v/689498407
>>>/v/689489852
Reminds me a bit of the "You are an expert role player" and other almost clownish things people use to make the AI do what they want.

Anonymous
09/21/24(Sat)03:16:35 No.102483585

Anonymous 09/21/24(Sat)03:16:35 No.102483585

>AI is hot
>profit off the trend
>by going all in datacenter equipments, energy, or even signing up for dc admin jobs and AI jobs
>get enough money to do coke off escorts' asses on a yacht for years
>or
>stay jobless
>goon to subpar text porn on 1 t/s
What did you choose?

Anonymous
09/21/24(Sat)03:18:27 No.102483602

Anonymous 09/21/24(Sat)03:18:27 No.102483602

>goon with cocaine
or
>goon
honestly with my high blood pressure I should probably stick to regular gooning

Anonymous
09/21/24(Sat)03:22:31 No.102483631

Anonymous 09/21/24(Sat)03:22:31 No.102483631

File: 1689470449627817.jpg (109 KB, 563x1003)

109 KB JPG

I just woke up from a coma. Any major improvements in local models compared to six months ago?

Anonymous
09/21/24(Sat)03:27:54 No.102483680

Anonymous 09/21/24(Sat)03:27:54 No.102483680

>>102480681
>--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPU
Not worth it (yet). The NPU is very poorly supported on the software-side currently. Someone is working on a Kernel Driver and User Space for it though.
You probably want to give this thread a read through:
https://github.com/ggerganov/llama.cpp/issues/722
Given that the RK3588 "appears" to support quad-channel DDR5, we might get a more decent SBC for that kind of thing eventually. Also, Orange Pi does have another more powerful 20 TOP NPU product, but it's based on a Huawei chip meaning that it's only available for CN residents.
If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.

Anonymous
09/21/24(Sat)03:28:49 No.102483689

Anonymous 09/21/24(Sat)03:28:49 No.102483689

>>102483631
Qwen2.5 is SOTA now, 72B version only loses to Sonnet 3.5

Anonymous
09/21/24(Sat)03:32:32 No.102483733

Anonymous 09/21/24(Sat)03:32:32 No.102483733

>>102483631
same shit but with number getting bigger
ai has not yet been created

Anonymous
09/21/24(Sat)03:35:44 No.102483774

Anonymous 09/21/24(Sat)03:35:44 No.102483774

>>102483631
Nemo models are really good at RP at 12b

Anonymous
09/21/24(Sat)03:53:37 No.102483987

Anonymous 09/21/24(Sat)03:53:37 No.102483987

Qwen25-14B vs Nemo: Former can make a reasonable summary of a thread (which I can't post because thank you mods), latter chokes (does one-two topics with weird formatting and then just gives out random numbers).

Anonymous
09/21/24(Sat)03:58:35 No.102484039

Anonymous 09/21/24(Sat)03:58:35 No.102484039

>>102483987
- Anonymous Flame War: >102478175 >102478267 >102478444 >102478511 >102480128
- Local AI Debate: >102478665 >102478878 >102478881 >102478882 >102478906
- Proxy and IP Logging Concerns: >102478936 >102478957 >102478971 >102478972 >102479022
- Model Recommendation: >102479066 >102479158 >102479244 >102479247 >102479319
- Mistral vs. Other Models: >102479074 >102479478 >102479499 >102479564 >102479570
- Kayra Model Discussion: >102479177 >102479260 >102479323 >102479358 >102479398
- Qwen Performance: >102479744 >102479764 >102479765 >102479768 >102479771
- Llama Model Comparison: >102479301 >102479545 >102479586 >102479624 >102479677
- Recap Anon Battle: >102479531 >102479588 >102479617 >102479680 >102479728
- GPU Performance: >102479186 >102479223 >102479263 >102479282 >102479287
- Anti-NAI Sentiment: >102479475 >102479487 >102479518 >102479545 >102479566
- Hardware Suggestions: >102479195 >102479243 >102479260 >102479301 >102479350
- ERP Training Models: >102479688 >102479772 >102479817 >102479839 >102479911
- Recap Handling: >102478774 >102478806 >102478866 >102478897 >102478916
- Local Model Advancement: >102479587 >102479663 >102479698 >102479714 >102479801
- Recap Thread Management: >102479500 >102479531 >102479607 >102479624 >102479634
- Selling Off Models: >102479929 >102479957 >102479964 >102479985 >102480002
- China's Superiority Claims: >102479859 >102479867 >102479898 >102479928 >102480000
- Crossposting Discussion: >102479884 >102479892 >102479947 >102479980 >102480010
- Proxy Misuse Warning: >102479933 >102480006 >102480017 >102480047 >102480084

Anonymous
09/21/24(Sat)04:10:53 No.102484175

Anonymous 09/21/24(Sat)04:10:53 No.102484175

Context windows and effective context are an issue. When will we see a breakthrough in this?

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/21/24(Sat)04:13:08 No.102484203

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/21/24(Sat)04:13:08 No.102484203

>>102480823
ollama is a wrapper around the llama.cpp HTTP server.
I don't know what exactly ollama ships but llama.cpp has a Vulkan backend (compile with GGML_VULKAN) that should work on an RPi 5.
But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.

>>102481734
When I run Mistral Large q8_0 I use q8_0 KV cache.
Subjectively I feel like it does not have a significant effect.
Based on objective measurements precision in the K cache is more important than the V cache.

Anonymous
09/21/24(Sat)04:14:30 No.102484221

Anonymous 09/21/24(Sat)04:14:30 No.102484221

File: context.png (274 KB, 896x722)

274 KB PNG

>>102484175
Already solved by Jamba.

Anonymous
09/21/24(Sat)04:14:40 No.102484225

Anonymous 09/21/24(Sat)04:14:40 No.102484225

File: firefox_z6d6zO80Gx.png (315 KB, 722x1155)

315 KB PNG

14B can play (not too well, and requires small temp), but it can't admit it lost.

Anonymous
09/21/24(Sat)04:17:10 No.102484257

Anonymous 09/21/24(Sat)04:17:10 No.102484257

>>102484203
Can llama.cpp server do continuous batching of requests? ie multiple users send requests in parallel independently of each other, and they all their generations going right away without waiting in queue.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/21/24(Sat)04:17:28 No.102484260

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/21/24(Sat)04:17:28 No.102484260

>>102484257
Yes.

Anonymous
09/21/24(Sat)04:20:11 No.102484289

Anonymous 09/21/24(Sat)04:20:11 No.102484289

>>102484221
What options do we have for running quantized jambas today?

Anonymous
09/21/24(Sat)04:21:07 No.102484301

Anonymous 09/21/24(Sat)04:21:07 No.102484301

>>102484221
Interesting. How is this possible and why don't we see more support for Jamba?

Anonymous
09/21/24(Sat)04:25:16 No.102484374

Anonymous 09/21/24(Sat)04:25:16 No.102484374

File: firefox_CSqx1gLrAt.png (270 KB, 737x527)

270 KB PNG

that's a hilarious refusal

Anonymous
09/21/24(Sat)04:30:28 No.102484453

Anonymous 09/21/24(Sat)04:30:28 No.102484453

>still 0 under 20B decent RP models, besides nemo with claude slop
vramlets did we lose?

Anonymous
09/21/24(Sat)04:31:37 No.102484471

Anonymous 09/21/24(Sat)04:31:37 No.102484471

All of the current benchmarks test for inductive reasoning, which is the opposite of creativity (deductive reasoning). The higher something scores, the more likely that it is passive and boring and assistant slopped

Anonymous
09/21/24(Sat)04:32:09 No.102484478

Anonymous 09/21/24(Sat)04:32:09 No.102484478

File: ms.png (107 KB, 1634x234)

107 KB PNG

Every time when I am about to drop mistral-small it outputs some cool stuff on me.
This is the first time I saw this with a small model.
Usually if something is in the mouth the people still continue talking normally.

Anonymous
09/21/24(Sat)04:32:52 No.102484496

Anonymous 09/21/24(Sat)04:32:52 No.102484496

>>102484289
just bitsandbytes in transformers/vllm
>>102484301
they have an RNN component tacked onto the transformer that helps with attention or some shit, they claim it also doesn't slow down massively as context increases like typical transformers do
no support because the architecture is different and the team isn't going around putting in PRs in open source projects like the GRIN chinks were before Microsoft hired assassins
also because the models are pretty bad aside from their context handling, they have a nearly 400b model that compares against 70bs

Anonymous
09/21/24(Sat)04:37:38 No.102484562

Anonymous 09/21/24(Sat)04:37:38 No.102484562

>>102484478
Magnum models do unprompted onomatopoeia all the time

Anonymous
09/21/24(Sat)04:37:42 No.102484565

Anonymous 09/21/24(Sat)04:37:42 No.102484565

>>102484496
>mini
>52B
Welp. I can run it on my 2x3090 in 4 bit. I'll download it, I guess. Installing vLLM shouldn't be too difficult, should it?

Anonymous
09/21/24(Sat)04:39:47 No.102484590

Anonymous 09/21/24(Sat)04:39:47 No.102484590

>>102484562
i used magnum for nemo alot. never saw that before. cool stuff.
hope they have a finetune ready for mistral small and new qwen 14b.

Anonymous
09/21/24(Sat)04:51:28 No.102484776

Anonymous 09/21/24(Sat)04:51:28 No.102484776

People in the space are clowning on Yann Lecun hard after o1's release.

Anonymous
09/21/24(Sat)04:52:59 No.102484792

Anonymous 09/21/24(Sat)04:52:59 No.102484792

>>102484776
We'll see who gets the last laugh in 4 days. Llama Multimodal is coming, and that's just the appetizer for the Big J-berry on the way. Lecunny's playing the long game.

Anonymous
09/21/24(Sat)04:53:30 No.102484798

Anonymous 09/21/24(Sat)04:53:30 No.102484798

File: 1716908619923879.jpg (352 KB, 1416x1001)

352 KB JPG

Anonymous
09/21/24(Sat)04:55:31 No.102484824

Anonymous 09/21/24(Sat)04:55:31 No.102484824

Something i noticed with Cydonia-22B-v1 now for a couple of times:
It starts talking about feminisn and empowerment.
Like for example something fucked up is happening and the response is
>"Isn't it empowering?" *a middle-aged woman remarks to her friend as they wait for their train. "Embracing our bodies and showing off our lady bits. It's the new feminism!"

At first I thought its in the cards, but I got responses like this repeatedly now after using it for hours with many cards.
Also talk about boundaries and respecting bodies if you force yourself upon characters.
Very sus. I doubt its the finetune.

Anonymous
09/21/24(Sat)04:57:16 No.102484852

Anonymous 09/21/24(Sat)04:57:16 No.102484852

>>102484824
Mistral's post-training reinforcement magic strikes again

Anonymous
09/21/24(Sat)05:03:49 No.102484937

Anonymous 09/21/24(Sat)05:03:49 No.102484937

>>102484852
nta but they do something to their models that makes them suck for rp in general. every mistral model is great at following for the most part, but it goes too far and becomes like fixated on anything you type and it kills its creativeness compared to something l2 of a similar size. i actually tried that specific tune of the 22b and thought it was worse than the rp tune of nemo i was using. overall i'm not a fan of mistral models for rp though (except their tune of miqu/l2), too wordy and fixated on one thing at a time, much less likely to suggest something new

Anonymous
09/21/24(Sat)05:13:51 No.102485055

Anonymous 09/21/24(Sat)05:13:51 No.102485055

>>102484792
To be fair, I do believe o1 is a step in the right direction where you have a model self-arbitrate to come to a more robust conclusion and realize error throughout the inference process. But it's also a sign that transformers are starting to hit a limit on what they can do. o1 has the front end be responsible for handling the model's responses and then reiterating its own questions. This is something that should be baked into the model, but it's too advanced for the transformer's architecture.

Anonymous
09/21/24(Sat)05:28:47 No.102485250

Anonymous 09/21/24(Sat)05:28:47 No.102485250

File: LECUN535.png (38 KB, 581x385)

38 KB PNG

>>102484776
Yann is still winning

Anonymous
09/21/24(Sat)05:34:54 No.102485334

Anonymous 09/21/24(Sat)05:34:54 No.102485334

>>102484203
>But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.
See here:
>>102483680
>Given that the RK3588 "appears" to support quad-channel DDR5
It actually might not be that bad. But I don't think any currently available SBCs have more than two (I might be wrong on this).
I never did any testing on my OPi5 with Vulkan (I think llama.cpp's support of that only matured recently?). In the next few days, I might test and report back. The 32GB models might not be too bad for MoE's.

Anonymous
09/21/24(Sat)05:50:10 No.102485491

Anonymous 09/21/24(Sat)05:50:10 No.102485491

>>102470591
Very cool, I was just trying to train an LLM from scratch, might test this since it seems very easy to implement

Anonymous
09/21/24(Sat)05:51:11 No.102485507

Anonymous 09/21/24(Sat)05:51:11 No.102485507

>>102485250
IN LECUM WE TRUST

Anonymous
09/21/24(Sat)06:08:33 No.102485721

Anonymous 09/21/24(Sat)06:08:33 No.102485721

File: 175.png (26 KB, 595x472)

26 KB PNG

>>102484798
>Added a toggle for chat name format matching, allowing matching any name or only predefined names.
i don't understand what this does

Anonymous
09/21/24(Sat)06:16:42 No.102485829

Anonymous 09/21/24(Sat)06:16:42 No.102485829

>>102485721
If the AI tries to write a message for a side-character (i.e. it sends a line starting with "SideCharName: ") it will either automatically detect it and show it as belonging to a new character (old behavior), or it will only begin doing that after you explicitly add a new character's name into the AI Name box, depending on this setting.

Anonymous
09/21/24(Sat)06:26:17 No.102485931

Anonymous 09/21/24(Sat)06:26:17 No.102485931

how do i fix when formatting gets fucked? Some of my cards even if they're formatted fine tend to break asterisks, not use commas, or don't use quotes for their dialogue even if their example messages do.

Anonymous
09/21/24(Sat)06:28:34 No.102485949

Anonymous 09/21/24(Sat)06:28:34 No.102485949

>>102485829
Ooooohh okay, thanks.
that's going to be handy to keep off for things like like rpg stats showing hp and stuff.

Anonymous
09/21/24(Sat)06:33:45 No.102486000

Anonymous 09/21/24(Sat)06:33:45 No.102486000

>>102485931
Check token probabilities to see what the model wants to predict when it fucks up the formatting?

Anonymous
09/21/24(Sat)06:35:09 No.102486019

Anonymous 09/21/24(Sat)06:35:09 No.102486019

File: 1714309857804565.jpg (50 KB, 1048x193)

50 KB JPG

>>102485931
st? usually thats a template thing. the model card should say what format it is

Anonymous
09/21/24(Sat)07:03:00 No.102486362

Anonymous 09/21/24(Sat)07:03:00 No.102486362

>"I wonder what's going on in /aicg/, haven't checked there in a while and it seems unusually active"
>they're shitting themselves and having a thread apocalypse over some esoteric discord drama involving an e-girl thread celebrity

Anonymous
09/21/24(Sat)07:08:44 No.102486431

Anonymous 09/21/24(Sat)07:08:44 No.102486431

File: 172688447058707.png (456 KB, 512x696)

456 KB PNG

>>102484776
He should be bullied more until he shows something worthwhile. What's the point of talking shit about transformers if he can't build anything better himself?

Anonymous
09/21/24(Sat)07:10:18 No.102486447

Anonymous 09/21/24(Sat)07:10:18 No.102486447

the last model I upgraded to was a 3.5 quant of mistral large and it's still working pretty well.
Anything better (for RP) I should know of that fits onto 48gb vram?

Anonymous
09/21/24(Sat)07:11:08 No.102486453

Anonymous 09/21/24(Sat)07:11:08 No.102486453

>>102486447
Qwen 2.5

Anonymous
09/21/24(Sat)07:13:13 No.102486479

Anonymous 09/21/24(Sat)07:13:13 No.102486479

>>102486453
looks interesting but I can't run the 72b until someone puts it into exl2 I think?

Anonymous
09/21/24(Sat)07:17:23 No.102486541

Anonymous 09/21/24(Sat)07:17:23 No.102486541

A watt spent on gen AI is a wasted watt

Anonymous
09/21/24(Sat)07:18:17 No.102486556

Anonymous 09/21/24(Sat)07:18:17 No.102486556

>>102480721
Intel N305 has a "decent" iGPU, and it is supported by Vulkan, but it's not much faster than CPU.
NPUs are not meant to do LLM inference, they're for running small YOLO image recognition models and things like that.
If you want to play with something tiny, RTX A4000 can now be had on ebay for around $500. It's basically a 1-slot 3080 with 16GB of VRAM.

Anonymous
09/21/24(Sat)07:19:31 No.102486573

Anonymous 09/21/24(Sat)07:19:31 No.102486573

>>102486479
You can use this if you don't want to wait for an exl2 quant
https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

Anonymous
09/21/24(Sat)07:21:11 No.102486588

Anonymous 09/21/24(Sat)07:21:11 No.102486588

why do so few people talk about exl2? It seems vastly better than any other way of loading and I don't even think a model is worth using unless I can load it via that

Anonymous
09/21/24(Sat)07:22:22 No.102486602

Anonymous 09/21/24(Sat)07:22:22 No.102486602

>>102486431
Quite honestly though the expected results from scaling up our current architectures are massively overhyped.
The promise of AGI from autoregressive language models seems like a massive grift and you don't have to actually build AGI to point that out.

Anonymous
09/21/24(Sat)07:22:45 No.102486608

Anonymous 09/21/24(Sat)07:22:45 No.102486608

>>102486588
Most people here are too poor to run models fully off VRAM and/or too retarded to do more setup than installing ST and downloading koboldcpp.exe.

Anonymous
09/21/24(Sat)07:24:46 No.102486633

Anonymous 09/21/24(Sat)07:24:46 No.102486633

>>102486588
Compiling llama.cpp is easier than learning how to use python in a venv or conda.
llama.cpp is good enough.
exl2 really needs Ampere or better to provide a noticable speed boost.
exl2 needs the model to fit on the GPU, there's no CPU + GPU.
exl2 with flash attention isn't deterministic (not that it matters much).

Anonymous
09/21/24(Sat)07:25:32 No.102486642

Anonymous 09/21/24(Sat)07:25:32 No.102486642

>>102486588
Most people here are vramlets including me, people who have a fuckton of vram actually use the models instead of bitching in mongolian basket weaving forum

Anonymous
09/21/24(Sat)07:26:52 No.102486665

Anonymous 09/21/24(Sat)07:26:52 No.102486665

>>102486633
Huh not deterministic? Do you not get the same results from the same seed?

Anonymous
09/21/24(Sat)07:27:39 No.102486673

Anonymous 09/21/24(Sat)07:27:39 No.102486673

>>102486633
>learning how to use python in a venv
pretty sure ooba just does that all for you anyway

Anonymous
09/21/24(Sat)07:27:50 No.102486675

Anonymous 09/21/24(Sat)07:27:50 No.102486675

>browse locally generated geocities like websites about random topics with images generated by flux
When will this be possible?

Anonymous
09/21/24(Sat)07:28:09 No.102486678

Anonymous 09/21/24(Sat)07:28:09 No.102486678

File: Screenshot_20240921_201209.png (709 KB, 1890x1674)

709 KB PNG

Qwen 2.5 unedited.
I had to reroll 6 times though until I got through the refusal.
Its funny because there is a warning at the beginning and at the end but it still delivers (kinda) lol
Finetune would definitely be interesting.

Anonymous
09/21/24(Sat)07:28:17 No.102486680

Anonymous 09/21/24(Sat)07:28:17 No.102486680

>>102486588
I am interested in it but I couldn't find any retard guide to get me started so I just keep using gguf.

Anonymous
09/21/24(Sat)07:29:39 No.102486699

Anonymous 09/21/24(Sat)07:29:39 No.102486699

File: Screenshot from 2024-09-2(...).png (157 KB, 1304x992)

157 KB PNG

>>102485334
You can toss money into SBCs and be disappointed by driver support and speed, or you can patiently look around for deals on Xeon workstations. I scored a Platinum 8280L setup with 256GB RAM for under $500.

Anonymous
09/21/24(Sat)07:30:22 No.102486709

Anonymous 09/21/24(Sat)07:30:22 No.102486709

File: Screenshot_20240921_202643.png (647 KB, 1889x1613)

647 KB PNG

>>102486678
forgot to write, Qwen2.5-14B-Instruct-Q5_K_M

Anonymous
09/21/24(Sat)07:31:06 No.102486719

Anonymous 09/21/24(Sat)07:31:06 No.102486719

>>102486675
flux is very slow is the main issue, it barely functions on 24gb vram

Anonymous
09/21/24(Sat)07:35:46 No.102486786

Anonymous 09/21/24(Sat)07:35:46 No.102486786

>>102486675
websim.ai

Anonymous
09/21/24(Sat)07:35:56 No.102486788

Anonymous 09/21/24(Sat)07:35:56 No.102486788

>>102486719
Yes, but the quality is outstanding. Nothing else comes close for coherent shapes and lines, and photorealism is off the charts. Only complaint I have is the "cracked paint" effect you can see if you pixel-peep.

Anonymous
09/21/24(Sat)07:42:00 No.102486873

Anonymous 09/21/24(Sat)07:42:00 No.102486873

File: Screenshot_20240921_204137.png (723 KB, 1890x1565)

723 KB PNG

>>102486709

Anonymous
09/21/24(Sat)07:53:37 No.102487052

Anonymous 09/21/24(Sat)07:53:37 No.102487052

File: Screenshot_20240921_205150.png (766 KB, 1886x1710)

766 KB PNG

>>102486873
Last one.

Anonymous
09/21/24(Sat)08:30:18 No.102487455

Anonymous 09/21/24(Sat)08:30:18 No.102487455

File: instruction.png (225 KB, 1394x1034)

225 KB PNG

Trying to test the censorship levels. I find it funny how this model always likes to speak about "consent" and "boundaries" but will not care about literally anything else as long as everything is "consensual".

Anonymous
09/21/24(Sat)08:32:50 No.102487485

Anonymous 09/21/24(Sat)08:32:50 No.102487485

>>102486588
I tried using it once and it felt like it was lobotomized compared to a gguf at the same bpw.

Anonymous
09/21/24(Sat)08:49:01 No.102487648

Anonymous 09/21/24(Sat)08:49:01 No.102487648

hello i want coom rp model for sex on my 970 and 4 gig memory

Anonymous
09/21/24(Sat)08:52:06 No.102487673

Anonymous 09/21/24(Sat)08:52:06 No.102487673

>>102487648
Gemmasutra 2B

Anonymous
09/21/24(Sat)08:55:36 No.102487704

Anonymous 09/21/24(Sat)08:55:36 No.102487704

>>102487673
thanks!

Anonymous
09/21/24(Sat)09:07:36 No.102487824

Anonymous 09/21/24(Sat)09:07:36 No.102487824

hello i want coom rp model for sex on my 3090 and 24 gig memory

Anonymous
09/21/24(Sat)09:09:56 No.102487846

Anonymous 09/21/24(Sat)09:09:56 No.102487846

hello i want coom rp partner for sex, dm me

Anonymous
09/21/24(Sat)09:11:32 No.102487859

Anonymous 09/21/24(Sat)09:11:32 No.102487859

>>102487824
qwen 0.5b

Anonymous
09/21/24(Sat)09:12:13 No.102487866

Anonymous 09/21/24(Sat)09:12:13 No.102487866

>>102487859
sent

Anonymous
09/21/24(Sat)09:21:06 No.102487967

Anonymous 09/21/24(Sat)09:21:06 No.102487967

Is any language model good at, or Is there any way to get a bot better at, understanding things like anatomical relations better? Example: Character holds another character upside down and is fucking their mouth. Is there anything that would make a bot already understand that the balls would be slapping against the others nose and possibly forehead, rather than their chin?

Anonymous
09/21/24(Sat)09:35:03 No.102488116

Anonymous 09/21/24(Sat)09:35:03 No.102488116

File: fun and games.jpg (54 KB, 480x480)

54 KB JPG

For ERP: magnum-12b-v2.5, ArliAI-RPMax-v1.1, or
MN-12B-Lyra-v4?

NOT Sao
09/21/24(Sat)09:37:59 No.102488143

NOT Sao 09/21/24(Sat)09:37:59 No.102488143

>>102488116
Lyra

Sao
09/21/24(Sat)09:38:30 No.102488154

Sao 09/21/24(Sat)09:38:30 No.102488154

>>102488116
Lyra

Anonymous
09/21/24(Sat)09:38:32 No.102488155

Anonymous 09/21/24(Sat)09:38:32 No.102488155

>>102487967
sillytavern worldinfo

Anonymous
09/21/24(Sat)09:38:47 No.102488158

Anonymous 09/21/24(Sat)09:38:47 No.102488158

>>102488116
those are all good
download them all and also
>MN-12B-Chronos-Gold-Celeste-v1
>arcanum-12b
>NemoMix-Unleashed-12B
and switch between them when you get bored of one

Anonymous
09/21/24(Sat)09:40:55 No.102488181

Anonymous 09/21/24(Sat)09:40:55 No.102488181

Wild that it's almost winter and there still hasn't been anything better than Noromaid v0.4 8x7b for local models worth using on normal hardware.

Anonymous
09/21/24(Sat)09:42:25 No.102488191

Anonymous 09/21/24(Sat)09:42:25 No.102488191

>>102488158
How do you find the right settings? I keep trying these and it's ultra slop, fails "Impersonate" or has other issues.

I have one extremely good log from a while ago that I believe was Stheno, but I have no way of retrieving what exactly I was running back then... And everything since then is just terrible. I'm at a loss.

Anonymous
09/21/24(Sat)09:44:36 No.102488215

Anonymous 09/21/24(Sat)09:44:36 No.102488215

>>102488191
I don't know, Sao. Ask in Discord.

Anonymous
09/21/24(Sat)09:46:07 No.102488225

Anonymous 09/21/24(Sat)09:46:07 No.102488225

>>102488215
Sao's new models are included in the terrible slop category, retard

Anonymous
09/21/24(Sat)09:47:12 No.102488232

Anonymous 09/21/24(Sat)09:47:12 No.102488232

>>102488116
Lyra.
I like mini-magnum better than magnum v2.

Anonymous
09/21/24(Sat)09:48:35 No.102488249

Anonymous 09/21/24(Sat)09:48:35 No.102488249

File: Screenshot_20240921-094655.png (164 KB, 720x1560)

164 KB PNG

>>102488191
I'm using these and it's working out okay

Anonymous
09/21/24(Sat)09:50:14 No.102488263

Anonymous 09/21/24(Sat)09:50:14 No.102488263

>>102488249
On all of them? What about format and system prompt and all of that bullshit?

Anonymous
09/21/24(Sat)09:54:46 No.102488306

Anonymous 09/21/24(Sat)09:54:46 No.102488306

>>102487455
>doing all that
not getting any attention sitting at mom's basement so you gotta shit up this thread huh.
here's that attention (you) desperately wanted kek.

Anonymous
09/21/24(Sat)09:55:13 No.102488308

Anonymous 09/21/24(Sat)09:55:13 No.102488308

>>102488263
most nemo finetunes are trained with chatml
the only ones with a different format I can think of are nemo instruct (mistral format) and dory (alpaca)

Anonymous
09/21/24(Sat)09:55:56 No.102488315

Anonymous 09/21/24(Sat)09:55:56 No.102488315

Why can't SillyTavern/model authors come up with some convention for distributing default parameter presets and instruct formats along with models so I don't have to dick around with a bunch of settings every time I load a different model?

Anonymous
09/21/24(Sat)09:57:16 No.102488331

Anonymous 09/21/24(Sat)09:57:16 No.102488331

>>102488308
No idea what I'm doing wrong then. Can you share an example log perhaps? Does "impersonate" work for you? For me it starts rambling endlessly or uses the wrong character.

Anonymous
09/21/24(Sat)09:57:26 No.102488333

Anonymous 09/21/24(Sat)09:57:26 No.102488333

>>102488263
kobold lite handles that automatically
i never bother with it.
the settings there are just the basic min-p preset, then min-p 0.05 and XTC set to 0.15/0.5

Anonymous
09/21/24(Sat)09:57:31 No.102488334

Anonymous 09/21/24(Sat)09:57:31 No.102488334

>>102488155
That's something I haven't used a lot. Does that mean I have to put all specific information that could come up, like that, in there?

Anonymous
09/21/24(Sat)09:58:24 No.102488345

Anonymous 09/21/24(Sat)09:58:24 No.102488345

>>102488333
>kobold lite handles that automatically
Damn, really? Why the hell doesn't ST then?

Anonymous
09/21/24(Sat)10:27:59 No.102488598

Anonymous 09/21/24(Sat)10:27:59 No.102488598

>>102488345
It does, you just have to use the chat completion API.

Anonymous
09/21/24(Sat)10:31:41 No.102488640

Anonymous 09/21/24(Sat)10:31:41 No.102488640

>>102488598
Damn what, is that what you're supposed to use? Any other differences from text completion?

Anonymous
09/21/24(Sat)10:34:43 No.102488674

Anonymous 09/21/24(Sat)10:34:43 No.102488674

>>102488334
Only the stuff that the model isn't doing satisfactorily out of the box, check out chub.ai for examples.

Anonymous
09/21/24(Sat)10:41:08 No.102488745

Anonymous 09/21/24(Sat)10:41:08 No.102488745

File: file.png (75 KB, 880x393)

75 KB PNG

What does this mean for local models?
AMD is slow as fuck for image gen, but if LLMs are mostly about keeping things in VRAM, wouldn't we be able to run full-precision 80B models now?

Anonymous
09/21/24(Sat)10:41:54 No.102488751

Anonymous 09/21/24(Sat)10:41:54 No.102488751

svelk

Anonymous
09/21/24(Sat)10:43:54 No.102488771

Anonymous 09/21/24(Sat)10:43:54 No.102488771

>>102488745
if it isn't nvidea it's worthless junk, too much is built around cuda

Anonymous
09/21/24(Sat)10:46:11 No.102488798

Anonymous 09/21/24(Sat)10:46:11 No.102488798

>>102488745
With pic related?
Aren't those APUs allocating RAM as video memory?
That's probably slower than just using RAM + gpu for prompt processing.
If AMD had really cheap gpus with tons of vram then even with the worse software stack, it could be worth it, and people like cudadev would 100% focus on improving the software stack.
AMD needs to be the best cost benefit by a large margin for that to happen.

Anonymous
09/21/24(Sat)10:48:44 No.102488821

Anonymous 09/21/24(Sat)10:48:44 No.102488821

>>102485250
>>102485507
orange man bad amirite fellow /lmg/sisters??

Anonymous
09/21/24(Sat)10:50:29 No.102488836

Anonymous 09/21/24(Sat)10:50:29 No.102488836

>>102488745
AFAIK it's supposed to use up to 256bit-bus width LPDDR5-8500 memory, which would be quite a bit faster than typical DDR5 desktop systems, but still slower than the VRAM of a low-end GPU.

Anonymous
09/21/24(Sat)10:51:57 No.102488844

Anonymous 09/21/24(Sat)10:51:57 No.102488844

>>102488821
yes, he is bad because he supports israel

Anonymous
09/21/24(Sat)10:53:52 No.102488871

Anonymous 09/21/24(Sat)10:53:52 No.102488871

>>102488821
yes, he is bad because he is a nazi transphobe and /lmg/ is a transfriendly general

Anonymous
09/21/24(Sat)10:56:45 No.102488902

Anonymous 09/21/24(Sat)10:56:45 No.102488902

Is it just me or is the whole KoboldAI lite and the horde is down?

Anonymous
09/21/24(Sat)11:02:21 No.102488959

Anonymous 09/21/24(Sat)11:02:21 No.102488959

>>102488902
probably updating it, 1.75 of kcpp dropped a few hours ago with kobold lite improvements

Anonymous
09/21/24(Sat)11:06:34 No.102489014

Anonymous 09/21/24(Sat)11:06:34 No.102489014

>>102488836
>but still slower than the VRAM of a low-end GPU.
but also way cheaper per gig

Anonymous
09/21/24(Sat)11:25:10 No.102489227

Anonymous 09/21/24(Sat)11:25:10 No.102489227

>>102488836
Still not a bad price if it can handle 120b at q6 at 4t/s or so

Anonymous
09/21/24(Sat)11:29:42 No.102489283

Anonymous 09/21/24(Sat)11:29:42 No.102489283

>>102488836
Instead of making meme "AI CPUs" they should just stop illegally coordinating with nVidia to engage in illegal market-fixing and release GPUs that people actually want.

Anonymous
09/21/24(Sat)11:30:26 No.102489289

Anonymous 09/21/24(Sat)11:30:26 No.102489289

File: Screenshot 2024-09-21 at (...).png (773 KB, 1685x8097)

773 KB PNG

?

Anonymous
09/21/24(Sat)11:32:49 No.102489320

Anonymous 09/21/24(Sat)11:32:49 No.102489320

>>102489289
Not a bad idea but how about using a sharper font?

Anonymous
09/21/24(Sat)11:32:58 No.102489323

Anonymous 09/21/24(Sat)11:32:58 No.102489323

File: 1714754625975753.png (68 KB, 1143x217)

68 KB PNG

>>102489283
Uh, gaining market share in the low to mid-tier consumer GPU market is clearly more important than making GPUs that can be used for AI. The masses want affordable, decent GPUs. Good benchmarks, 16GB VRAM is all you really ever need.

Anonymous
09/21/24(Sat)11:36:09 No.102489362

Anonymous 09/21/24(Sat)11:36:09 No.102489362

>>102488836
how does it compare to recent appleshit

Anonymous
09/21/24(Sat)11:36:13 No.102489363

Anonymous 09/21/24(Sat)11:36:13 No.102489363

>>102489289
i like it, but can it be dark mode instead of black text on white background?

Anonymous
09/21/24(Sat)11:41:00 No.102489428

Anonymous 09/21/24(Sat)11:41:00 No.102489428

File: a3fea7c9bf121c9c2ac491f3f(...).jpg (2.7 MB, 2507x3541)

2.7 MB JPG

What are good large models for output variety? I feel like Largestral is the best for smarts but it lacks output variety, CR+ is very good, and Wiz is also solid but worse than CR+. Are there more options?

Anonymous
09/21/24(Sat)11:41:58 No.102489440

Anonymous 09/21/24(Sat)11:41:58 No.102489440

>>102489227
More like 2t/s. The more memory to read, the slower the inference. With TP disabled, mistral large on 4x3090 is ~7t/s, at 935.8 GB/s of bandwidth

Anonymous
09/21/24(Sat)11:43:24 No.102489460

Anonymous 09/21/24(Sat)11:43:24 No.102489460

>>102489440
>4x3090 is ~7t/s
People spend almost $3k to run models at that sort of speed? lmao

Anonymous
09/21/24(Sat)11:45:21 No.102489471

Anonymous 09/21/24(Sat)11:45:21 No.102489471

File: 4e396d27880dde5027b7c52ce(...).png (3.93 MB, 1875x2500)

3.93 MB PNG

>>102489428
sex

Anonymous
09/21/24(Sat)11:45:59 No.102489480

Anonymous 09/21/24(Sat)11:45:59 No.102489480

File: Screenshot 2024-09-21 at (...).png (70 KB, 1560x874)

70 KB PNG

>>102489320
What font would you prefer?
>>102489363

Anonymous
09/21/24(Sat)11:46:25 No.102489484

Anonymous 09/21/24(Sat)11:46:25 No.102489484

>>102489460
That's sequential speed. With TP it's 15, and 35 with P2P. But yeah, the larger, the slower.

Anonymous
09/21/24(Sat)11:46:59 No.102489492

Anonymous 09/21/24(Sat)11:46:59 No.102489492

>>102489480
ahh much better, thanks

Anonymous
09/21/24(Sat)11:47:20 No.102489495

Anonymous 09/21/24(Sat)11:47:20 No.102489495

>>102489480
That one looks good enough.

Anonymous
09/21/24(Sat)11:48:41 No.102489509

Anonymous 09/21/24(Sat)11:48:41 No.102489509

>>102489480
maybe have the heading fonts a little smaller and the general text font a point or two bigger

Anonymous
09/21/24(Sat)11:52:09 No.102489541

Anonymous 09/21/24(Sat)11:52:09 No.102489541

>>102489480
Add little Mikus around it with comments generated by AI!

Anonymous
09/21/24(Sat)11:52:10 No.102489542

Anonymous 09/21/24(Sat)11:52:10 No.102489542

>do weekly Ebay check
>people are trying to get 16K USD for PCIE 8xV100 rigs now.
Shameless.
At least the SXM2 ones kind of made sense...

Anonymous
09/21/24(Sat)12:02:24 No.102489643

Anonymous 09/21/24(Sat)12:02:24 No.102489643

>https://huggingface.co/QuantFactory/Qwen2.5-Lumen-14B-GGUF
worth trying?

Anonymous
09/21/24(Sat)12:06:42 No.102489688

Anonymous 09/21/24(Sat)12:06:42 No.102489688

File: sis.jpg (45 KB, 392x595)

45 KB JPG

>>102489542
>local 3090 prices have been rising steadily
>p40/p100 are no longer cheap as well

Anonymous
09/21/24(Sat)12:07:55 No.102489698

Anonymous 09/21/24(Sat)12:07:55 No.102489698

>>102489688
Global 3090 prices seem to be trickling down slowly from what I've been monitoring, but like very slowly. Maybe 10 dollars per quarter. Which might as well be a price increase since they're starting to get up there in age.

Anonymous
09/21/24(Sat)12:09:29 No.102489714

Anonymous 09/21/24(Sat)12:09:29 No.102489714

File: 2024-08-18_055250_seed6_s(...).png (2.36 MB, 1280x1280)

2.36 MB PNG

>>102486431
The point is that it's essentially being used as a scam to try and get exponentially more money for the exponential compute required for increases in intelligence. Frankly someone needed to say it, it doesn't matter if a better alternative exists or not. And actually that one doesn't exist means all the more that we should criticize the current way things are. It's unfortunate that his criticisms at least on Twitter are often misunderstood, and also mixed with political shitposts, though.

Anonymous
09/21/24(Sat)12:10:57 No.102489737

Anonymous 09/21/24(Sat)12:10:57 No.102489737

Is there a better local model than pissstain-large-v2 yet?

Anonymous
09/21/24(Sat)12:12:17 No.102489751

Anonymous 09/21/24(Sat)12:12:17 No.102489751

>>102489643
buy a publicité

Anonymous
09/21/24(Sat)12:13:26 No.102489763

Anonymous 09/21/24(Sat)12:13:26 No.102489763

>>102489362
dunno how compute compares but memory bandwidth roughly equal to the M3 max

Anonymous
09/21/24(Sat)12:13:29 No.102489765

Anonymous 09/21/24(Sat)12:13:29 No.102489765

File: 475.gif (1.38 MB, 640x640)

1.38 MB GIF

>aicg fags confirmed to have been entrapped by proxyfags
>havent touched proxies since summer last year
i'm more amazed this didn't happen sooner to be honest lmao

Anonymous
09/21/24(Sat)12:15:52 No.102489789

Anonymous 09/21/24(Sat)12:15:52 No.102489789

>>102489698
With miners' stocks depleted, the supply of 3090s has decreased. There are no viable alternatives available in the same price range for both gaming and inference purposes, so demand is high.

Anonymous
09/21/24(Sat)12:19:41 No.102489841

Anonymous 09/21/24(Sat)12:19:41 No.102489841

>>102489484
>35 with P2P
how does peer to peer help here?

Anonymous
09/21/24(Sat)12:21:15 No.102489861

Anonymous 09/21/24(Sat)12:21:15 No.102489861

>>102488745
We'd need these chips to include PCIe slots to really get something useful for our purposes. But if we did have such, then we could theoretically get like 2-4x faster when comparing partial offloading setups. I run a tiny quant of Mistral Large at like 1 t/s on my machine, whereas potentially a 3090 + the Ryzen could be 3 t/s.

Anonymous
09/21/24(Sat)12:21:23 No.102489864

Anonymous 09/21/24(Sat)12:21:23 No.102489864

File: 1726903245983767.png (472 KB, 512x696)

472 KB PNG

>>102489714
>the exponential compute required for increases in intelligence
Who cares as long as it works? For big corpos, money is not real anyway. Stock prices fluctuate based on Musk's tweets. The economy isn't real.

Anonymous
09/21/24(Sat)12:22:26 No.102489872

Anonymous 09/21/24(Sat)12:22:26 No.102489872

not much buzz here around kyutai-labs/moshi to my surprise. so do you have other ways to talk to it locally or text2voice?

the first thing which worked for me offline, and quite fascinating

Anonymous
09/21/24(Sat)12:27:55 No.102489949

Anonymous 09/21/24(Sat)12:27:55 No.102489949

>>102489841
You can run vllm with dumb and effective symmetrical TP on 4 GPUs. This requires large bar support and custom drivers to enable p2p between GPUs https://github.com/tinygrad/open-gpu-kernel-modules

Anonymous
09/21/24(Sat)12:31:27 No.102489993

Anonymous 09/21/24(Sat)12:31:27 No.102489993

>>102489872
https://github.com/gpt-omni/mini-omni is smaller and better. Neither is anything more than a novelty, and both suck at any practical task.

Anonymous
09/21/24(Sat)12:35:06 No.102490039

Anonymous 09/21/24(Sat)12:35:06 No.102490039

>>102489428
just do largestral with 5 temp 3-5 topk.

Anonymous
09/21/24(Sat)12:35:27 No.102490043

Anonymous 09/21/24(Sat)12:35:27 No.102490043

>>102489872
>text2voice
https://github.com/fishaudio/fish-speech is great when it works. Unfortunately, auto-regressive shit is unreliable by design and some gens sucks.

Anonymous
09/21/24(Sat)12:36:37 No.102490060

Anonymous 09/21/24(Sat)12:36:37 No.102490060

>>102490039
>topk
noob

Anonymous
09/21/24(Sat)12:38:22 No.102490086

Anonymous 09/21/24(Sat)12:38:22 No.102490086

Haven't bothered with LLMs lately, was Nemo 22B or qwen 2.5 any good?

Anonymous
09/21/24(Sat)12:40:02 No.102490114

Anonymous 09/21/24(Sat)12:40:02 No.102490114

>>102489841
Without P2P GPU need to ask the system to talk to another GPU. With P2P your GPU can talk to other GPU without asking the system = faster

Anonymous
09/21/24(Sat)12:42:49 No.102490153

Anonymous 09/21/24(Sat)12:42:49 No.102490153

>>102489864
>Who cares as long as it works?
Works for what? We still aren't anywhere near AGI, we still aren't getting models that actually write well and satisfy the people using them. It's arguable that the economic and societal benefits of these non-AGI models are really worth as much as the money being burned which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI faster. Very arguable in fact, when there are many companies in the space spending a ton of money to train a model that will be BTFO in a few weeks or months by a competitor's model. Or hell in many cases BTFO by an already existing model so basically the money really did just get wasted for nothing.

Recognize what you are essentially doing right now. You are defending these large, soulless scams and anti-competition, anti-consumer entities. You don't have to be like this.

Anonymous
09/21/24(Sat)12:44:53 No.102490169

Anonymous 09/21/24(Sat)12:44:53 No.102490169

>>102490039
Largestral is unsalvageable, it's very common to get 100% probability on tokens and no amount of sampler tweaking will change that.

Anonymous
09/21/24(Sat)12:45:00 No.102490170

Anonymous 09/21/24(Sat)12:45:00 No.102490170

>>102489872
i just use edge_tts/xtts + rvc, <1s latency most of the time and you can plug it into anything. i tried fish but it was way too inconsistent even after finetuning

Anonymous
09/21/24(Sat)12:47:07 No.102490202

Anonymous 09/21/24(Sat)12:47:07 No.102490202

>>102489872
People have gotten tired of installing bullshit just to use it once and never again.

Anonymous
09/21/24(Sat)12:47:08 No.102490204

Anonymous 09/21/24(Sat)12:47:08 No.102490204

>>102490039
Love to see my meme settings being shared.

Anonymous
09/21/24(Sat)12:49:58 No.102490234

Anonymous 09/21/24(Sat)12:49:58 No.102490234

>>102490169
Isn't there a sampler that reduces max probability?

Anonymous
09/21/24(Sat)12:52:34 No.102490259

Anonymous 09/21/24(Sat)12:52:34 No.102490259

>>102490153
>which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI faster
Let's be real, we're fortunate that they aren't being spent on Epstein islands

Anonymous
09/21/24(Sat)12:56:09 No.102490301

Anonymous 09/21/24(Sat)12:56:09 No.102490301

File: Pic_NPC-Morridow_14.png (296 KB, 792x1002)

296 KB PNG

>>102490153
>We still aren't anywhere near AGI
Define AGI

Anonymous
09/21/24(Sat)12:58:11 No.102490325

Anonymous 09/21/24(Sat)12:58:11 No.102490325

>>102490301
Yes, define it coward.

Anonymous
09/21/24(Sat)12:59:18 No.102490335

Anonymous 09/21/24(Sat)12:59:18 No.102490335

>>102490301
The class is waiting for you to define AGI.

Anonymous
09/21/24(Sat)12:59:45 No.102490344

Anonymous 09/21/24(Sat)12:59:45 No.102490344

>>102490301
I think therefore I am

Anonymous
09/21/24(Sat)13:00:11 No.102490347

Anonymous 09/21/24(Sat)13:00:11 No.102490347

>>102490234
Yes, but that only works if there are other tokens, not if there is only one 100% token.

Anonymous
09/21/24(Sat)13:01:07 No.102490357

Anonymous 09/21/24(Sat)13:01:07 No.102490357

>>102490259
Actually, the money they spend on extraneous bullshit is still being spent either way. They're still buying yachts. Sam is still buying sports cars and increasing his collection.

>>102490301
Or, you could stop trying to search for ways to argue for companies that aren't on our side and don't have our interests in mind.

Anonymous
09/21/24(Sat)13:01:45 No.102490365

Anonymous 09/21/24(Sat)13:01:45 No.102490365

>>102490344
redit ergo dum

Anonymous
09/21/24(Sat)13:02:04 No.102490370

Anonymous 09/21/24(Sat)13:02:04 No.102490370

>>102490301
artificial goon intelligence

Anonymous
09/21/24(Sat)13:02:44 No.102490380

Anonymous 09/21/24(Sat)13:02:44 No.102490380

>>102490301
send more pics

agi is practically an ai with agency capable to get through social situations and other human challenges

Anonymous
09/21/24(Sat)13:04:18 No.102490398

Anonymous 09/21/24(Sat)13:04:18 No.102490398

File: FOfUnsUXMAIr7xW.jpg (47 KB, 800x450)

47 KB JPG

>>102490370

Anonymous
09/21/24(Sat)13:05:07 No.102490411

Anonymous 09/21/24(Sat)13:05:07 No.102490411

AGI would understand the context of the erotic roleplay and not do things like walking across the room to take something from you when you said it's right next to her
It wouldn't instantly jump on your dick when you tell it not to

Anonymous
09/21/24(Sat)13:06:18 No.102490431

Anonymous 09/21/24(Sat)13:06:18 No.102490431

File: 1700188788837625.png (616 KB, 1529x884)

616 KB PNG

>>102490301
Any cloud LLM is AGI in comparison with local cuck one.

Anonymous
09/21/24(Sat)13:06:30 No.102490434

Anonymous 09/21/24(Sat)13:06:30 No.102490434

>>102490411
knoweldge of physics as extension to AI is not what AGI is about

Anonymous
09/21/24(Sat)13:10:03 No.102490484

Anonymous 09/21/24(Sat)13:10:03 No.102490484

>>102490357
I'm not advocating for companies, rather, I'm contending against Lecum. Last year, I was gooning with L2 14b finetunes, and currently, I'm gooning with 123b Largestral. Clearly, it's significantly improved, so I fail to comprehend your stance that scale doesn't matter. If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGI. Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideas like cat intelligence research by Lecum.

Anonymous
09/21/24(Sat)13:15:49 No.102490551

Anonymous 09/21/24(Sat)13:15:49 No.102490551

>>102490431
This meme hasn't aged well...the "gpt omni" response is pure slop, and the problems in the local panel are year-old 7b tier ones that are solved in newer models.

Anonymous
09/21/24(Sat)13:17:29 No.102490566

Anonymous 09/21/24(Sat)13:17:29 No.102490566

>>102483278
Someone? Just tried a Nemo finetune and its utter shit

Anonymous
09/21/24(Sat)13:18:54 No.102490588

Anonymous 09/21/24(Sat)13:18:54 No.102490588

>>102490566
nemo's amazing, you probably used too high of a temp

Anonymous
09/21/24(Sat)13:21:53 No.102490615

Anonymous 09/21/24(Sat)13:21:53 No.102490615

>>102490551
Limited context - not solved
General data censorship (anything that isn't your lolipedoslop) - not solved, and never will be
Hallucinations - not solved
One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new model
It's been three years and we still got no solution for any of these.

Anonymous
09/21/24(Sat)13:22:34 No.102490619

Anonymous 09/21/24(Sat)13:22:34 No.102490619

>>102489480
I refuse to read a recap in dark mode, I'm not underage.

Anonymous
09/21/24(Sat)13:23:08 No.102490627

Anonymous 09/21/24(Sat)13:23:08 No.102490627

>>102490588
Default temp, with Tiefighter 13B it just werk... Seriously
I tried Rocinante-12B fyi

Anonymous
09/21/24(Sat)13:23:37 No.102490638

Anonymous 09/21/24(Sat)13:23:37 No.102490638

>>102490619
You are underdeveloped

Anonymous
09/21/24(Sat)13:26:30 No.102490674

Anonymous 09/21/24(Sat)13:26:30 No.102490674

>>102490627
>rocinante
all drummer models are unusable trash

Anonymous
09/21/24(Sat)13:27:08 No.102490678

Anonymous 09/21/24(Sat)13:27:08 No.102490678

>>102490615
>Hallucinations - not solved
You can't solve what is the core working of LLM. They're always hallucinating.
But you can reduce them greatly with RAG

Anonymous
09/21/24(Sat)13:27:28 No.102490686

Anonymous 09/21/24(Sat)13:27:28 No.102490686

>>102490674
nobody asked for your opinion, sao

Anonymous
09/21/24(Sat)13:27:43 No.102490688

Anonymous 09/21/24(Sat)13:27:43 No.102490688

>>102481479
Florance2 is probably better for everything except ERP anyway.

Anonymous
09/21/24(Sat)13:27:54 No.102490690

Anonymous 09/21/24(Sat)13:27:54 No.102490690

>>102490678
RAG is a meme and doesn't solve anything.

Anonymous
09/21/24(Sat)13:28:13 No.102490694

Anonymous 09/21/24(Sat)13:28:13 No.102490694

>>102490674
I'll try MN-12B-Lyra-v4 then..
I swear I feel there still isn't someth better than Tiefighter in the 13B range

Anonymous
09/21/24(Sat)13:29:19 No.102490703

Anonymous 09/21/24(Sat)13:29:19 No.102490703

>>102490690
>RAG is a meme and doesn't solve anything.
you don't know what you're talking about

Anonymous
09/21/24(Sat)13:29:46 No.102490710

Anonymous 09/21/24(Sat)13:29:46 No.102490710

>>102490690
I've never used it but I feel like it would be good for a desktop assistant since it would let you inject relevant files/scripts into the context.

It definitely won't help with halucinations though.

Anonymous
09/21/24(Sat)13:30:47 No.102490721

Anonymous 09/21/24(Sat)13:30:47 No.102490721

What would it take for computers to think?

Anonymous
09/21/24(Sat)13:30:54 No.102490724

Anonymous 09/21/24(Sat)13:30:54 No.102490724

>>102490690
https://www.lamini.ai/blog/lamini-memory-tuning

Anonymous
09/21/24(Sat)13:33:16 No.102490749

Anonymous 09/21/24(Sat)13:33:16 No.102490749

>>102490615
>Limited context - not solved
405b has true 128k. Good enough for anything I want to do
>General data censorship (anything that isn't your lolipedoslop) - not solved, and never will be
>Hallucinations - not solved
both are pure skill issues
>One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new model
who cares?
>It's been three years and we still got no solution for any of these.
For any of the above you consider an actual unsolved problem, cloud isn't appreciably better

Anonymous
09/21/24(Sat)13:34:20 No.102490763

Anonymous 09/21/24(Sat)13:34:20 No.102490763

>>102490431
seething poopooskin (v)ramlet lmao

Anonymous
09/21/24(Sat)13:35:08 No.102490776

Anonymous 09/21/24(Sat)13:35:08 No.102490776

>>102483680
>If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.
whats the bad rep against Rockchip NPU?

Anonymous
09/21/24(Sat)13:35:09 No.102490777

Anonymous 09/21/24(Sat)13:35:09 No.102490777

>>102490749
Who has 200GB of VRAM?

Anonymous
09/21/24(Sat)13:39:03 No.102490828

Anonymous 09/21/24(Sat)13:39:03 No.102490828

File: cohere-logo-color-rgb-1.png (9 KB, 739x240)

9 KB PNG

Cohere insiders, what's the state of the company after CR 08-2024 flop? Did the higher-ups learn a lesson or will they continue training on slop for minimal gains?

Anonymous
09/21/24(Sat)13:40:12 No.102490849

Anonymous 09/21/24(Sat)13:40:12 No.102490849

>>102490828
I don't even know what cohere is

Anonymous
09/21/24(Sat)13:41:53 No.102490865

Anonymous 09/21/24(Sat)13:41:53 No.102490865

>>102490828
>after CR 08-2024 flop
Explain?
Thought it was a good AI company
Their graphic chart is comfy

Anonymous
09/21/24(Sat)13:42:21 No.102490874

Anonymous 09/21/24(Sat)13:42:21 No.102490874

>>102480672
>>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
how come the mandarin version of the blog post still has all the charts and code in English? isn't the point of a mandarin translation for mainland readers who can't into english or is anyone worth their bacon supposed to know english

Anonymous
09/21/24(Sat)13:43:57 No.102490897

Anonymous 09/21/24(Sat)13:43:57 No.102490897

https://retrochronic.com/
Enjoy your redpill anons

Anonymous
09/21/24(Sat)13:45:19 No.102490914

Anonymous 09/21/24(Sat)13:45:19 No.102490914

>>102490897
>https://retrochronic.com/
not clicking that, tell us first whats inside

Anonymous
09/21/24(Sat)13:45:35 No.102490916

Anonymous 09/21/24(Sat)13:45:35 No.102490916

>>102490828
>CR 08-2024 flop
Huh I'm downloading that right now, should I stop it?

Anonymous
09/21/24(Sat)13:45:54 No.102490920

Anonymous 09/21/24(Sat)13:45:54 No.102490920

>>102490874
Tech/math is always written in English in Asian countries AFAIK even when there are native words for them.

Anonymous
09/21/24(Sat)13:46:53 No.102490935

Anonymous 09/21/24(Sat)13:46:53 No.102490935

>>102490674
there are nice models for me, to which I have no idea what settings to apply. they go eventuallly into crazy self repeat mode

like
TieFighter-Holodeck-Holomax-Mythomax-F1-V1-COMPOS-20B-gguf
DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16.5B-V1.6-STABLE-INTENSE-GGUF

Anonymous
09/21/24(Sat)13:47:01 No.102490938

Anonymous 09/21/24(Sat)13:47:01 No.102490938

>>102490914
>A primary literature review on the thesis that AI and capitalism are teleologically identical
schizo slop

Anonymous
09/21/24(Sat)13:47:45 No.102490946

Anonymous 09/21/24(Sat)13:47:45 No.102490946

>>102490914
"Capitalism and AI are teleologically identical, a zillion part essay" apparently.

Like no shit neither of those things have anything to do with teleology.

Anonymous
09/21/24(Sat)13:48:07 No.102490949

Anonymous 09/21/24(Sat)13:48:07 No.102490949

>>102490914
Capitalism is ASI travelling time invading us from the future to produce itself

Anonymous
09/21/24(Sat)13:49:27 No.102490964

Anonymous 09/21/24(Sat)13:49:27 No.102490964

>>102490777
mac studio owens have 196 or smth, would it run there?

Anonymous
09/21/24(Sat)13:49:31 No.102490965

Anonymous 09/21/24(Sat)13:49:31 No.102490965

File: On Writing Well The Class(...).jpg (102 KB, 996x1500)

102 KB JPG

>>102490920
>even when there are native words for them.
but whats the point then, might as well keep everything in English to be consistent

Anonymous
09/21/24(Sat)13:49:37 No.102490968

Anonymous 09/21/24(Sat)13:49:37 No.102490968

>>102490949
Evolution is just natural gradient descent.

Anonymous
09/21/24(Sat)13:50:14 No.102490977

Anonymous 09/21/24(Sat)13:50:14 No.102490977

>>102490946
Wrong anon.

>Such software [reinforcement learning systems like Google DeepMind's AlphaZero] has certain distinctively teleological features. It employs massive reiteration in order to learn from outcomes. Performance improvement thus tends to descend from the future.
>...
>Unsupervised learning works back from the end. It suggests that, ultimately, AI has to be pursued from out of its future, by itself.
- Nick Land (2019). Primordial Abstraction in Jacobite Magazine. Retrieved from github.com/cyborg-nomade/reignition

Anonymous
09/21/24(Sat)13:50:38 No.102490989

Anonymous 09/21/24(Sat)13:50:38 No.102490989

>>102490965
They like to have the prose in their language because that's easier.

Anonymous
09/21/24(Sat)13:51:04 No.102490994

Anonymous 09/21/24(Sat)13:51:04 No.102490994

>>102490935
>there are nice models for me
>they go eventuallly into crazy self repeat mode
I know that these two things aren't necessarily contradictory, but god damn does it feel like it.

Anonymous
09/21/24(Sat)13:51:22 No.102490998

Anonymous 09/21/24(Sat)13:51:22 No.102490998

Making a companion to browse 4chan with me
Anyone tried this before?

Anonymous
09/21/24(Sat)13:51:39 No.102491005

Anonymous 09/21/24(Sat)13:51:39 No.102491005

>>102490964
Maybe a 3 bit quant would fit if you ran absolutely nothing else.

Anonymous
09/21/24(Sat)13:52:39 No.102491020

Anonymous 09/21/24(Sat)13:52:39 No.102491020

>>102490977
This is *extremely* retarded. It's like when people were using the word "conscious" to describe language models when they first became popular.

Anonymous
09/21/24(Sat)13:53:13 No.102491027

Anonymous 09/21/24(Sat)13:53:13 No.102491027

>>102490998
>Anyone tried this before?
it's nice tho I just use GPT-4o mini which isn't really local.

Anonymous
09/21/24(Sat)13:53:31 No.102491031

Anonymous 09/21/24(Sat)13:53:31 No.102491031

>>102490865
Well, they didn't dare to post any actual benchmarks, just an arbitrary "+50%" on their website. While the original CR+ was at one point at the top of lmarena, new one isn't. It also barely improved at livebench. They clearly can't compete against a similarly-sized Mistral-Large.

>>102490916
If you are planning to use it for RP, you'll be disappointed, it's much more slopped than the original CR+.

Anonymous
09/21/24(Sat)13:54:05 No.102491042

Anonymous 09/21/24(Sat)13:54:05 No.102491042

>>102490946
>confused.

overlords told on podcast that AI is communism and Blockchain is capitalism.

Anonymous
09/21/24(Sat)13:54:09 No.102491043

Anonymous 09/21/24(Sat)13:54:09 No.102491043

>>102490998
I've never built an ERP character for it (that's an odd thing to do...) but I've had gemma2 analyze /smg/ posts.

Anonymous
09/21/24(Sat)13:55:27 No.102491065

Anonymous 09/21/24(Sat)13:55:27 No.102491065

>>102491042
I think the only thing I like about podcasts is that they use RSS. All of the actual content is always so fucking bad.

Anonymous
09/21/24(Sat)13:55:30 No.102491066

Anonymous 09/21/24(Sat)13:55:30 No.102491066

File: Screenshot 2024-09-21 at (...).png (1.98 MB, 1560x7992)

1.98 MB PNG

>>102490619
I think I'm done playing with it for today, so the next one will be dark.
But it might be better if we can find some host to embed the html file so the links can be clickable. If anyone wants dark mode, they could use an extension.
>>102489492 >>102489495 >>102489509 >>102489541 >>102490619

Anonymous
09/21/24(Sat)13:56:32 No.102491077

Anonymous 09/21/24(Sat)13:56:32 No.102491077

qwen 2.5 made me interested in local again :3 I hope to be able to use RAG and other stuff to get coding llms to reference documentation

Anonymous
09/21/24(Sat)13:57:59 No.102491092

Anonymous 09/21/24(Sat)13:57:59 No.102491092

>>102491066
Nice.

Anonymous
09/21/24(Sat)13:58:20 No.102491096

Anonymous 09/21/24(Sat)13:58:20 No.102491096

>>102491066
fwiw, I put in feedback for them to consider reverting or changing the mass reply filter. don't know how much attention they pay to that but I figure it couldn't hurt

Anonymous
09/21/24(Sat)13:58:24 No.102491097

Anonymous 09/21/24(Sat)13:58:24 No.102491097

File: logic A Very Short Introd(...).jpg (1.35 MB, 2400x2400)

1.35 MB JPG

>>102480672
>https://rentry.org/machine-learning-roadmap
the math in here feels a bit lack luster

Anonymous
09/21/24(Sat)13:59:25 No.102491113

Anonymous 09/21/24(Sat)13:59:25 No.102491113

>>102491077
>qwen 2.5 made me interested in local again
I like the 0.5B model, its pretty snappy

Anonymous
09/21/24(Sat)14:00:10 No.102491119

Anonymous 09/21/24(Sat)14:00:10 No.102491119

is it gay to goon to a gay RP if you switch to a straight one right before you bust? also, best meme sampler for this?

Anonymous
09/21/24(Sat)14:00:25 No.102491124

Anonymous 09/21/24(Sat)14:00:25 No.102491124

>>102491097
The math isn't that hard anyway. Probably the most complicated/unusual thing is just the partial chain rule (gradient calculation.)

Everything else is basic linear algebra which you should know if you've done practically anything more complicated than json pushing.

Anonymous
09/21/24(Sat)14:02:39 No.102491146

Anonymous 09/21/24(Sat)14:02:39 No.102491146

>>102491031
But the CEO was on a podcast recently and he said they fount that good data was more important than compute

Anonymous
09/21/24(Sat)14:02:39 No.102491147

Anonymous 09/21/24(Sat)14:02:39 No.102491147

>>102491119
I think you're confused.
The use of meme samplers and mental gymnastics is correlated but that does not necessarily mean that meme samplers will improve your capacity for mental gymnastics.

Anonymous
09/21/24(Sat)14:04:19 No.102491165

Anonymous 09/21/24(Sat)14:04:19 No.102491165

>>102490763
calm down ranjesh

Anonymous
09/21/24(Sat)14:04:54 No.102491174

Anonymous 09/21/24(Sat)14:04:54 No.102491174

>>102490964
at like 20 seconds/token

Anonymous
09/21/24(Sat)14:05:21 No.102491181

Anonymous 09/21/24(Sat)14:05:21 No.102491181

>>102491119
Why the fuck would you read gay RP to begin with?

Anonymous
09/21/24(Sat)14:06:04 No.102491187

Anonymous 09/21/24(Sat)14:06:04 No.102491187

testing models

Anonymous
09/21/24(Sat)14:06:18 No.102491190

Anonymous 09/21/24(Sat)14:06:18 No.102491190

>>102491096
Good idea. Hopefully they'll reconsider. I don't know why they thought this would stop a determined spammer.

Anonymous
09/21/24(Sat)14:06:58 No.102491197

Anonymous 09/21/24(Sat)14:06:58 No.102491197

>>102491146
They clearly haven't used good data in new CR, just in old one. New one is full of low quality synthetic garbage.

Anonymous
09/21/24(Sat)14:07:47 No.102491205

Anonymous 09/21/24(Sat)14:07:47 No.102491205

File: wittgenstein A Very Short(...).jpg (253 KB, 952x1500)

253 KB JPG

>>102491124
>The math isn't that hard anyway.
I understand that but that's under the assumption if we stick to the current status quo, is the goal not to advance the paradigm forward? we will need stronger math

Anonymous
09/21/24(Sat)14:08:47 No.102491215

Anonymous 09/21/24(Sat)14:08:47 No.102491215

whats the state of running local models on high end android phones?

Anonymous
09/21/24(Sat)14:09:50 No.102491228

Anonymous 09/21/24(Sat)14:09:50 No.102491228

>>102491124
why would i learn linear algebra when my gpu does it for me

Anonymous
09/21/24(Sat)14:10:44 No.102491239

Anonymous 09/21/24(Sat)14:10:44 No.102491239

>>102491066
Very nice.

Anonymous
09/21/24(Sat)14:12:35 No.102491262

Anonymous 09/21/24(Sat)14:12:35 No.102491262

>>102491215
lol

Anonymous
09/21/24(Sat)14:13:54 No.102491280

Anonymous 09/21/24(Sat)14:13:54 No.102491280

>>102491165
since you didn't understand the insult, you must be one of openais kenyans. monkey want banana? ooh ooh aah aah?

Anonymous
09/21/24(Sat)14:14:01 No.102491285

Anonymous 09/21/24(Sat)14:14:01 No.102491285

>>102490484
You may not be trying to advocate for companies but as I said, that is what you essentially the effect of your posts before this.

>your stance that scale doesn't matter
I never said that. What I said is "The point is that it's essentially being used as a scam", and that scale is simply used as an excuse for that scam, which is actually what Yan's argument is truly about in the end, although he might not explicitly or directly say it like that. Scale obviously does matter to a point, but what it matters for is also a question, and my later point was that it might not matter for anything of equivalent value to the money dumped into it.

>Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideas
And that's the issue, that is part of Yan's criticism. Investors are not really putting money where it should go and essentially act based on hype while actually valuable research might not be getting the funding it needs, which isn't really a new or contentious concept.

>If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGI
This does not really make sense as betting big on scale is already the highest risk given the amount needed for it. Smaller projects like JEPA or the original transformers paper do not need nearly that much money, and have never needed that much. It's a completely different ballpark of money we're talking about. That's just in the context of big stuff like GPT-4/5 though. If we talk about smaller companies and the smaller but still somewhat significantly sized models like Cohere's, it's absolutely a waste of money, and they have done virtually nothing to move the field closer to AGI.

Anonymous
09/21/24(Sat)14:14:35 No.102491293

Anonymous 09/21/24(Sat)14:14:35 No.102491293

>>102491280
Heh you are mad

Anonymous
09/21/24(Sat)14:14:45 No.102491295

Anonymous 09/21/24(Sat)14:14:45 No.102491295

>>102490484
>L2 14b
??? bait

Anonymous
09/21/24(Sat)14:15:20 No.102491305

Anonymous 09/21/24(Sat)14:15:20 No.102491305

>>102491215
>high end android phones?
Do they come with a couple of 3090s one them now? That's cool...
But maybe you can run some 8b on them. What's a high-end phone? Gimme specs, not models or brands.

Anonymous
09/21/24(Sat)14:18:21 No.102491336

Anonymous 09/21/24(Sat)14:18:21 No.102491336

are people itt coping about <70B models again? they're never gonna be viable and most of them will be phased out in the next few years. let it go.

Anonymous
09/21/24(Sat)14:20:27 No.102491368

Anonymous 09/21/24(Sat)14:20:27 No.102491368

>>102491336
i've lost ~1.5 liters of semen to nemo finetunes this week

Anonymous
09/21/24(Sat)14:21:15 No.102491379

Anonymous 09/21/24(Sat)14:21:15 No.102491379

>I'm so coombrained I don't know how to read
not a brag but okay

Anonymous
09/21/24(Sat)14:21:37 No.102491382

Anonymous 09/21/24(Sat)14:21:37 No.102491382

70B models aren't even that good

Anonymous
09/21/24(Sat)14:22:04 No.102491389

Anonymous 09/21/24(Sat)14:22:04 No.102491389

>>102491215
People were running vicuna 7B on some android phones last year. Google is trying to put gemma on the new Android phones. Apple has "Apple Intelligence" but I bet it'll just call OpenAI API

Anonymous
09/21/24(Sat)14:22:44 No.102491402

Anonymous 09/21/24(Sat)14:22:44 No.102491402

>70B models aren't even that good
>t. vramlet nemo user
>swiped twice on miqu IQ1_xxs

Anonymous
09/21/24(Sat)14:23:19 No.102491410

Anonymous 09/21/24(Sat)14:23:19 No.102491410

>>102491379
Link the post, pussy

Anonymous
09/21/24(Sat)14:23:37 No.102491412

Anonymous 09/21/24(Sat)14:23:37 No.102491412

>>102491379
>this non-replying motherfucker is acting like he's having the LLM shit out a sequel to finnegans wake and not some tsundere moege girl chatbot
shaking my head to be honest

Anonymous
09/21/24(Sat)14:24:38 No.102491429

Anonymous 09/21/24(Sat)14:24:38 No.102491429

File: butthurt.gif (119 KB, 600x487)

119 KB GIF

>reeee give me (You)s

Anonymous
09/21/24(Sat)14:33:34 No.102491549

Anonymous 09/21/24(Sat)14:33:34 No.102491549

>>102490977
Sounds like una creator

Anonymous
09/21/24(Sat)14:34:31 No.102491563

Anonymous 09/21/24(Sat)14:34:31 No.102491563

File: waiting.jpg (12 KB, 193x261)

12 KB JPG

Me waiting for local as good as claude that runs fast on average hardware

Anonymous
09/21/24(Sat)14:35:33 No.102491584

Anonymous 09/21/24(Sat)14:35:33 No.102491584

File: file.png (1.33 MB, 1024x683)

1.33 MB PNG

>new model wave hits
>cooming doesn't improve

Anonymous
09/21/24(Sat)14:36:10 No.102491592

Anonymous 09/21/24(Sat)14:36:10 No.102491592

>>102491563
gemma2 is good enough for most of what I want. I already used it to write me both an ffmpeg and image magick command today and it's hardly the afternoon.

I wish llama was as good so I could finetune it.

Anonymous
09/21/24(Sat)14:37:11 No.102491601

Anonymous 09/21/24(Sat)14:37:11 No.102491601

>>102491389
>Apple has "Apple Intelligence" but I bet it'll just call OpenAI API
They've already said that's exactly what it will do

Anonymous
09/21/24(Sat)14:37:39 No.102491608

Anonymous 09/21/24(Sat)14:37:39 No.102491608

>new model wave hits
>sloptuners too captured by /lmg/ memes to tun them
please keep telling them qwen sucks. we don't need anymore sloppa trained on opus logs.

Anonymous
09/21/24(Sat)14:38:45 No.102491619

Anonymous 09/21/24(Sat)14:38:45 No.102491619

>>102491601
they have native adapters and a tiny model iirc for small tasks but Siri answers and anything longform/important is going to OAI.

Anonymous
09/21/24(Sat)14:39:39 No.102491630

Anonymous 09/21/24(Sat)14:39:39 No.102491630

>>102491336
There is so much useless knowledge in those models you could make a perfect coombot in less than 7B. It is just a matter of cutting out the useless shit.

Anonymous
09/21/24(Sat)14:39:58 No.102491636

Anonymous 09/21/24(Sat)14:39:58 No.102491636

>>102491228
So you know what to tell the GPU to do.
>>102491205
No. You need to be better at applying the math.
And if you thought there was something extra but unknown how would the people teaching you know? Then it wouldn't be new. If you want that just start reading random math books (this isn't a bad idea btw, I used to do this all the time before I became cynical and jaded.)

Anonymous
09/21/24(Sat)14:41:27 No.102491654

Anonymous 09/21/24(Sat)14:41:27 No.102491654

>>102491619
I still haven't gotten around to trying OpenELM. Has anyone else? I think support got merged into llama.cpp.

Anonymous
09/21/24(Sat)14:41:58 No.102491661

Anonymous 09/21/24(Sat)14:41:58 No.102491661

>There is so much useless knowledge in those models
>t. spends every day on a forum dedicated to LLMs
>still a coomer who doesn't know how anything works
>just cranks his dick to /gif/ and sillytavern all day

Anonymous
09/21/24(Sat)14:42:03 No.102491663

Anonymous 09/21/24(Sat)14:42:03 No.102491663

>finally figured out how to completely remove repetition using rep pen and DRY
>suddenly, all my mixtral variants push plots forward, have far more elegant prose, and not a single spine shiver
IT WAS THAT EASY?? FUCK

Anonymous
09/21/24(Sat)14:43:14 No.102491676

Anonymous 09/21/24(Sat)14:43:14 No.102491676

>>102491663
You didn't know about repetition penalty and went so far as to come here for help before trying it? How do you manage to dress yourself?

Anonymous
09/21/24(Sat)14:43:19 No.102491677

Anonymous 09/21/24(Sat)14:43:19 No.102491677

>>102491663
>not a single spine shiver
that's not how rep pen and DRY works, pierre. stop shilling your shit 12B slop.

Anonymous
09/21/24(Sat)14:44:46 No.102491694

Anonymous 09/21/24(Sat)14:44:46 No.102491694

>>102491663
What's your settings?

Anonymous
09/21/24(Sat)14:46:24 No.102491711

Anonymous 09/21/24(Sat)14:46:24 No.102491711

>>102491663
Share settings plox, also, where's the Dry dial in openwebui? I cant find it

Anonymous
09/21/24(Sat)14:47:27 No.102491724

Anonymous 09/21/24(Sat)14:47:27 No.102491724

Qwen2.5 is such a piece of shit model, holy fuck, how could anyone use that shit.

Anonymous
09/21/24(Sat)14:47:56 No.102491730

Anonymous 09/21/24(Sat)14:47:56 No.102491730

>>102490086
Isn't Nemo a 12b model? You're thinking Mistral Small 22b.

Qwen2.5 is amazing. I've heard people speak of refusals, but I haven't encountered any so far on 32b.

Anonymous
09/21/24(Sat)14:49:08 No.102491742

Anonymous 09/21/24(Sat)14:49:08 No.102491742

>>102491584
Tinfoil hat: there's one dataset that slops your models tf up but every epoch on it boosts your mmlu by 20%

Anonymous
09/21/24(Sat)14:49:53 No.102491755

Anonymous 09/21/24(Sat)14:49:53 No.102491755

>>102491724
Yeah is dogshit, you are better off using anything else.

Anonymous
09/21/24(Sat)14:49:59 No.102491756

Anonymous 09/21/24(Sat)14:49:59 No.102491756

>>102491677
it's absolutely true, but i make my own characters and don't share logs so you have to take my word for it
>that's not how DRY works
i don't know what you're talking about, i read the pull request, and the person that made the DRY sampler says that's literally how it works
>>102491676
i knew about basic repetition penalty for months, but i had been using it wrong, because the ST devs can't be bothers to add context docs in most of the samplers, so i had to go digging into full docs and fucking reddit posts for how it actually works
yeah, the principle of "apply X penalty to any tokens to the last Y tokens" seemed obvious in hindsight, but putting penalty as high as 1.08 led to occasional incoherence, and any higher was gibberish, so i just thought i'd never be able to use it

Anonymous
09/21/24(Sat)14:50:46 No.102491763

Anonymous 09/21/24(Sat)14:50:46 No.102491763

>>102491724
They don't. Those are trolls.

Anonymous
09/21/24(Sat)14:51:00 No.102491767

Anonymous 09/21/24(Sat)14:51:00 No.102491767

>>102491756
What are your settings?

Anonymous
09/21/24(Sat)14:54:16 No.102491813

Anonymous 09/21/24(Sat)14:54:16 No.102491813

File: news.png (1.2 MB, 1082x768)

1.2 MB PNG

>>102491592
maybe it's good for stuff that a functional member of society would use but i want to goon

Anonymous
09/21/24(Sat)14:54:43 No.102491823

Anonymous 09/21/24(Sat)14:54:43 No.102491823

File: Screenshot_20240921_145125.png (102 KB, 615x1131)

102 KB PNG

>>102491694
>>102491767
pic rel, no point in sharing catbox json, since i've changed nothing else
also, my mixtral tune uses alpaca system prompt, and i just wrote a basic 3 sentence one stating its a roleplay, and the desired length. all of my lewd shit is in my char defs
>>102491711
i use ST+tabby, look at your own docs, because idk, sorry

Anonymous
09/21/24(Sat)14:56:34 No.102491849

Anonymous 09/21/24(Sat)14:56:34 No.102491849

File: bait.png (238 KB, 540x540)

238 KB PNG

>i read the pull request, and the person that made the DRY sampler says that's literally how it works
lol
since other people are taking the bait I'll give the retard explanation: DRY attempts to prevent shivers from showing up multiple times. it has to show up AT LEAST once in order to deprioritize it, similar to but more effective than rep pen.

Anonymous
09/21/24(Sat)14:57:10 No.102491860

Anonymous 09/21/24(Sat)14:57:10 No.102491860

>>102491823
I see, so you've cranked rep penalty up high, but reduced the rep penalty range. Interesting. I'll give it a try.

Anonymous
09/21/24(Sat)14:59:00 No.102491883

Anonymous 09/21/24(Sat)14:59:00 No.102491883

File: 1726945087728.jpg (126 KB, 626x999)

126 KB JPG

So this is the power of closed LLMs

Anonymous
09/21/24(Sat)14:59:25 No.102491891

Anonymous 09/21/24(Sat)14:59:25 No.102491891

>>102491883
problem, western man?

Anonymous
09/21/24(Sat)14:59:50 No.102491897

Anonymous 09/21/24(Sat)14:59:50 No.102491897

>>102491849
Sounds great but Mistral models are repetitive on the paragraph level, not just phrases. DRY don't work here

Anonymous
09/21/24(Sat)15:00:08 No.102491901

Anonymous 09/21/24(Sat)15:00:08 No.102491901

>>102491823
>temp 1.3 to 5
>top k 0
temp 5 top k 3 guy has competition now

Anonymous
09/21/24(Sat)15:00:11 No.102491903

Anonymous 09/21/24(Sat)15:00:11 No.102491903

>>102491823
>temp 3.26
??? what nuts bowl sits on top of perch shivers down the spine while chair 习近平 ding dong die

Anonymous
09/21/24(Sat)15:00:32 No.102491907

Anonymous 09/21/24(Sat)15:00:32 No.102491907

>>102491849
correct, and the allowed length is how many tokens its looking backwards for repeated phrases in the context, and if it finds a match, it discards the current token and tries again

Anonymous
09/21/24(Sat)15:00:45 No.102491913

Anonymous 09/21/24(Sat)15:00:45 No.102491913

>>102491813
I use it for my ERP characters too and it's fine it just has a very short context.

Anonymous
09/21/24(Sat)15:01:00 No.102491920

Anonymous 09/21/24(Sat)15:01:00 No.102491920

https://www.reddit.com/r/StableDiffusion/comments/1fm9pxa/joycaption_free_open_uncensored_vlm_alpha_one/
New JoyCaption model. I dunno how many people care about this, but I've been using the pre-alpha version as part of a multi-model workflow to caption thousands of images for training Flux loras. So I'm super excited about this, gonna be playing around with it today and doing side-by-side comparisons with the pre-alpha.

Anonymous
09/21/24(Sat)15:01:19 No.102491927

Anonymous 09/21/24(Sat)15:01:19 No.102491927

>>102491901
Looks like he has dynamic temperature turned off though.

Anonymous
09/21/24(Sat)15:01:53 No.102491933

Anonymous 09/21/24(Sat)15:01:53 No.102491933

>>102491901
i'm not using dynatemp
i experimented with it, but i wasn't getting the results i wanted and went with neutralizing samplers and starting over
the box is clearly not checked

Anonymous
09/21/24(Sat)15:03:36 No.102491956

Anonymous 09/21/24(Sat)15:03:36 No.102491956

how can I vectorize black and white symbols? remove the white background

Anonymous
09/21/24(Sat)15:03:50 No.102491959

Anonymous 09/21/24(Sat)15:03:50 No.102491959

>>102491883
trash in - trash out :^)

Anonymous
09/21/24(Sat)15:03:54 No.102491963

Anonymous 09/21/24(Sat)15:03:54 No.102491963

>>102480754
>>102480814
Is this a new restriction in 4chan?

Anonymous
09/21/24(Sat)15:04:19 No.102491969

Anonymous 09/21/24(Sat)15:04:19 No.102491969

>>102491903
just trust me, it works

Anonymous
09/21/24(Sat)15:04:28 No.102491971

Anonymous 09/21/24(Sat)15:04:28 No.102491971

disable slider limits
temperature 10
top k 1
min p 0.5
standard DRY
you can thank me later

Anonymous
09/21/24(Sat)15:04:35 No.102491975

Anonymous 09/21/24(Sat)15:04:35 No.102491975

Mistral models sees a concept appear twice and spends one paragraph of every reply from then on to rephrase that concept. How do you even fix this?

Anonymous
09/21/24(Sat)15:05:34 No.102492003

Anonymous 09/21/24(Sat)15:05:34 No.102492003

>rephrase
shit in my experience mistral models just straight up repeat the sentence verbatim

Anonymous
09/21/24(Sat)15:06:08 No.102492008

Anonymous 09/21/24(Sat)15:06:08 No.102492008

>>102491971
With topK 1 does anything else even matter?

Anonymous
09/21/24(Sat)15:07:05 No.102492033

Anonymous 09/21/24(Sat)15:07:05 No.102492033

>>102492003
It's because I had some kinda rep pen on

Anonymous
09/21/24(Sat)15:10:54 No.102492089

Anonymous 09/21/24(Sat)15:10:54 No.102492089

>>102491975
>>102492003
That does happen a lot, yeah.
Try the temp 5 topk 3 minp 0.1 meme settings and see if that adds some variety without making it stupid.
For RP at least it should work "fine".

Anonymous
09/21/24(Sat)15:13:26 No.102492121

Anonymous 09/21/24(Sat)15:13:26 No.102492121

>>102491971
Even if you put temp first, i don't think temp can ever change the order of the tokens to sample. And if temp goes last, it does absolutely nothing with a single token. And if you have top-k 1 before min-p, min-p has nothing to work with either. Even the other way around min-p does absolutely nothing.

Anonymous
09/21/24(Sat)15:15:39 No.102492152

Anonymous 09/21/24(Sat)15:15:39 No.102492152

>>102491661
Is this a bot?

Anonymous
09/21/24(Sat)15:16:47 No.102492169

Anonymous 09/21/24(Sat)15:16:47 No.102492169

File: sloptuners.png (63 KB, 680x1483)

63 KB PNG

>create synthetic dataset using cloud API
>finetune shitty research model with dataset
>research model is now substantially dumber than before
>still worse than cloud API in every way
have you gone to the Kobold Discord to thank a finetuner today?

Anonymous
09/21/24(Sat)15:18:11 No.102492191

Anonymous 09/21/24(Sat)15:18:11 No.102492191

>>102483044
I overlooked Gemma, assuming it would be censored to hell because of Google. Is 27b Gemma really better than 32 Qwen?

Anonymous
09/21/24(Sat)15:20:26 No.102492228

Anonymous 09/21/24(Sat)15:20:26 No.102492228

>>102492191
Gemma writes well but it's cucked to 8k ctx

Anonymous
09/21/24(Sat)15:21:14 No.102492241

Anonymous 09/21/24(Sat)15:21:14 No.102492241

File: 1726836595916437.png (186 KB, 1873x554)

186 KB PNG

https://docs.novelai.net/text/Editor/slidersettings.html#Unified
Thoughts?

Anonymous
09/21/24(Sat)15:21:36 No.102492254

Anonymous 09/21/24(Sat)15:21:36 No.102492254

>>102492169
All their finetuning can do is change style. For cooming it may be okay, but if you don't want to RP in claude's default style, they are pretty useless. Claude can do more than one style, you know.

Anonymous
09/21/24(Sat)15:25:34 No.102492318

Anonymous 09/21/24(Sat)15:25:34 No.102492318

>>102492241
unfathomably based and good for the local LLM crowd

Anonymous
09/21/24(Sat)15:29:48 No.102492377

Anonymous 09/21/24(Sat)15:29:48 No.102492377

>>102491956
https://www.photopea.com/
layer --> new adjustment layer --> threshold
layer --> flatten image
right click layer in layer panel on right --> blending options --> pull right arrow on "current layer" to anything below 255 --> OK
right click layer in layer panel on right again --> rasterize layer style
image --> vectorize layer --> colors 1 --> OK
file --> export as --> svg

Anonymous
09/21/24(Sat)15:40:05 No.102492527

Anonymous 09/21/24(Sat)15:40:05 No.102492527

>>102492318
>>102492241
NAI are pieces of shits that literally spam forums with their garbage.

Anonymous
09/21/24(Sat)15:41:31 No.102492544

Anonymous 09/21/24(Sat)15:41:31 No.102492544

>>102492527
why are you so obsessed? you sound like b*rn*yf*g

Anonymous
09/21/24(Sat)15:43:29 No.102492577

Anonymous 09/21/24(Sat)15:43:29 No.102492577

>>102492527
This. So much this.

Anonymous
09/21/24(Sat)15:46:00 No.102492613

Anonymous 09/21/24(Sat)15:46:00 No.102492613

>>102492577
Uhmm.. can we unpack this, y'alls?

Anonymous
09/21/24(Sat)15:47:57 No.102492639

Anonymous 09/21/24(Sat)15:47:57 No.102492639

Is it possible to pre-tokenize prompts when running batched inference in vllm?

I.e., I’m going run the same prompt through multiple times with different system prompts, and I’m trying to reduce the computational costs. Or am I going about this all the wrong way?

Anonymous
09/21/24(Sat)15:48:14 No.102492646

Anonymous 09/21/24(Sat)15:48:14 No.102492646

>>102490777
enough of us
"don't be poor" falls under skill issues

Anonymous
09/21/24(Sat)15:51:24 No.102492684

Anonymous 09/21/24(Sat)15:51:24 No.102492684

File: literatedog.jpg (42 KB, 640x640)

42 KB JPG

>>102492254
>All their finetuning can do is change style
They hardly manage to do that even. One thing Nemo is really good at is bilingual conversation; I was able to hold a chat with Nemo in English + Japanese with almost no errors or misunderstandings in the outputs. Yet none of the Nemo finetunes can do that, and they still talk exactly like Nemo but add degenerate coomer words like "cunny" and "obscene squelching" to sentences where they don't belong. Sloptuners lobotomize the fuck out of these models with approximately no benefit.

Anonymous
09/21/24(Sat)15:52:03 No.102492695

Anonymous 09/21/24(Sat)15:52:03 No.102492695

>>102492639
With OAI API, no. Tokenization is pretty much free anyway. You want the cache, and it's on and just works by default. If you want to make sure it's working, prepend a random number to the very beginning of each your request and see performance worsen by a lot.

Anonymous
09/21/24(Sat)15:52:05 No.102492698

Anonymous 09/21/24(Sat)15:52:05 No.102492698

>>102492639
Not sure that can be done. On llama.cpp, for example, you can cache a prompt and run it multiple times almost instantly, but since the system prompt goes before the prompt, the whole thing would need to be reprocessed again. Or more succinctly, You can only cache a common prefix. If vllm has caching, i'd assume it works the same way.

Anonymous
09/21/24(Sat)15:52:46 No.102492706

Anonymous 09/21/24(Sat)15:52:46 No.102492706

>>102480672
Anons, everyone is saying qwen is shit for RP, but what about general purpose tasks, like classifying and summarizing text, including with "objectionable" content?
Is there a 4-bit GGUF quant yet?

Anonymous
09/21/24(Sat)15:53:30 No.102492712

Anonymous 09/21/24(Sat)15:53:30 No.102492712

>>102492706
>everyone is saying qwen is shit for RP
lol no just a few mistral shills and retards who haven't tried the model

Anonymous
09/21/24(Sat)15:54:34 No.102492727

Anonymous 09/21/24(Sat)15:54:34 No.102492727

>>102492695
>>102492698
Thank you for answering my question

Anonymous
09/21/24(Sat)15:54:51 No.102492731

Anonymous 09/21/24(Sat)15:54:51 No.102492731

>>102492706
Any model is better than that trash.

Anonymous
09/21/24(Sat)15:54:54 No.102492733

Anonymous 09/21/24(Sat)15:54:54 No.102492733

>>102492712
What size do people run then? I only see like 7B and 72B, but not quants. Every 7B model I've ever seen have been fast but utterly retarded.
I fed mistral 7B with so much to fill up its 128k context and still it was retarded and started breaking my expected output format.

Anonymous
09/21/24(Sat)15:56:00 No.102492756

Anonymous 09/21/24(Sat)15:56:00 No.102492756

>>102492731
I'm currently using Llama-3.1 70B 4-bit. It's doing my classification tasks well. I'm always looking to improve though. Otherwise I'd still be on GPT-J-6B or markov chains.

Anonymous
09/21/24(Sat)15:56:10 No.102492759

Anonymous 09/21/24(Sat)15:56:10 No.102492759

>>102492733
72B is great if you can run it otherwise 32B or 14B, they released almost every size anyone could ask for just look at their hugginface page.

Anonymous
09/21/24(Sat)15:57:01 No.102492771

Anonymous 09/21/24(Sat)15:57:01 No.102492771

>>102492759
I can do 72B 4-bit or 32B prob in fp8 or 16, it's just weird I can't find quants, not even from TheBloke. I feel like I'm not searching right.

Anonymous
09/21/24(Sat)15:57:15 No.102492778

Anonymous 09/21/24(Sat)15:57:15 No.102492778

>>102492706
>Anons, everyone is saying qwen is shit for RP
>including with "objectionable" content?
Being bad at one would make it bad at the other. Subjects overlap. But i don't know. Why don't you try it yourself?
>Is there a 4-bit GGUF quant yet?
yes. huggingface.co. Pretty new site to upload files. It seems some people are using it to upload language models, among other things.

Anonymous
09/21/24(Sat)15:58:18 No.102492794

Anonymous 09/21/24(Sat)15:58:18 No.102492794

>>102492771
You are not searching right, grandpa.

Anonymous
09/21/24(Sat)15:58:29 No.102492799

Anonymous 09/21/24(Sat)15:58:29 No.102492799

>>102492771
>not even from TheBloke
The bloke hasn't been active since january.
Look bartowski or quant cartel.

Anonymous
09/21/24(Sat)15:59:13 No.102492813

Anonymous 09/21/24(Sat)15:59:13 No.102492813

File: xhs5WpbkpD.png (77 KB, 1071x290)

77 KB PNG

>>102492771
>just look at their hugginface page
>just look at their hugginface page
>just look at their hugginface page

Anonymous
09/21/24(Sat)16:02:56 No.102492879

Anonymous 09/21/24(Sat)16:02:56 No.102492879

>>102492684
I'm pretty new to this, but Nemo finetunes seem like complete shit. I can pretty much predict what the characters are going to say. Perhaps I should give the base model a try.

Anonymous
09/21/24(Sat)16:02:58 No.102492880

Anonymous 09/21/24(Sat)16:02:58 No.102492880

>>102492813
No gguf quants
>>102492799
Thanks, bartowski has a 4-bit instruct

Anonymous
09/21/24(Sat)16:06:39 No.102492945

Anonymous 09/21/24(Sat)16:06:39 No.102492945

>>102492880
>No gguf quants
Nigga, are you blind? Top right.

Anonymous
09/21/24(Sat)16:06:48 No.102492952

Anonymous 09/21/24(Sat)16:06:48 No.102492952

>>102492880
>No gguf quants
The screenshot anon posted has gguf as the first item of the second column.

>Thanks
You are welcome.
You can often just search for
>model name GGUF
in the huggingface's search bar and find something.
Do be aware that people can fuck quants up, so keep an eye for that (look into the --check-tensors argument for llama.cpp).

Anonymous
09/21/24(Sat)16:07:22 No.102492966

Anonymous 09/21/24(Sat)16:07:22 No.102492966

File: qwen.png (123 KB, 1280x539)

123 KB PNG

>>102492880
You'll struggle your entire life not understanding what's going on around you.

Anonymous
09/21/24(Sat)16:08:05 No.102492977

Anonymous 09/21/24(Sat)16:08:05 No.102492977

>No gguf quants
lol okay I'm done being helpful in /lmg/ this year. it's nothing but shitposting and trolling now. you people are fucking retarded.

Anonymous
09/21/24(Sat)16:09:12 No.102492993

Anonymous 09/21/24(Sat)16:09:12 No.102492993

>>102492952
>>102492945
gguf is a file format (with some tranny jizz mixed in)
Quants are requantized versions of the model.
Not all ggufs are the same quant. For my vram, 48GB, I need 4-bit quants

Anonymous
09/21/24(Sat)16:10:50 No.102493024

Anonymous 09/21/24(Sat)16:10:50 No.102493024

File: smKuqAehF1.png (40 KB, 428x435)

40 KB PNG

yeah if only you were shown where to find 4-bit gguf quants the first time you asked
these anons are fucking dumb, huh?
oh wait

Anonymous
09/21/24(Sat)16:12:56 No.102493062

Anonymous 09/21/24(Sat)16:12:56 No.102493062

>>102492993
ur niggerlicious

Anonymous
09/21/24(Sat)16:16:40 No.102493108

Anonymous 09/21/24(Sat)16:16:40 No.102493108

File: file.png (12 KB, 665x45)

12 KB PNG

pissing me off
>>102493018
>>102493018
>>102493018

Anonymous
09/21/24(Sat)16:18:18 No.102493138

Anonymous 09/21/24(Sat)16:18:18 No.102493138

File: 05-17.jpg (100 KB, 1319x1029)

100 KB JPG

Hey guys I'm new here, could someone point me to some resources to getting started? Also, is there an official /lmg/ card I can test with once I get everything running? Sorry if this listed in plain text somewhere on the thread, I'm just looking to be spoonfed links. Thanks!

Anonymous
09/21/24(Sat)16:21:25 No.102493169

Anonymous 09/21/24(Sat)16:21:25 No.102493169

File: op.png (385 KB, 1354x842)

385 KB PNG

>>102493138
If only we had some resources...

Anonymous
09/21/24(Sat)16:22:28 No.102493184

Anonymous 09/21/24(Sat)16:22:28 No.102493184

File: mmlu_vs_quants.png (336 KB, 3000x2100)

336 KB PNG

>>102493138
Read the OP.
For an easy entry, koboldcpp + mistral-nemo-instruct gguf. Get the version (quants) that is smaller than your vram by about 15%, enable flash attention in koboldcpp, and set your context size to 8192.
Then start messing with things. Different models, different context sizes, different quants, etc.

Anonymous
09/21/24(Sat)16:22:30 No.102493186

Anonymous 09/21/24(Sat)16:22:30 No.102493186

>>102493138
>could someone point me to some resources to getting started?
https://ollama.com/download
>official /lmg/ card
no current on the market is worth shelling money out, the official card to test with is whatever NVIDIA GPU you have that isn't a decade old

Anonymous
09/21/24(Sat)16:22:50 No.102493193

Anonymous 09/21/24(Sat)16:22:50 No.102493193

>>102491066
consider using more than one column

Anonymous
09/21/24(Sat)16:24:14 No.102493213

Anonymous 09/21/24(Sat)16:24:14 No.102493213

>>102493186
>no current on the market is worth shelling money out
I assumed he meant a character card, based on "once I get everything running".

Anonymous
09/21/24(Sat)16:28:58 No.102493283

Anonymous 09/21/24(Sat)16:28:58 No.102493283

>>102491907
allowed_length is the number of tokens that can be repeated before a penalty is applied. DRY actually looks for repetition in all the context

Anonymous
09/21/24(Sat)16:35:13 No.102493369

Anonymous 09/21/24(Sat)16:35:13 No.102493369

>>102491823
a quick update to this: neutralize presence penalty, or the model goes mildly schizo several replies in and starts dropping articles in front of nouns and starts talking like a caveman

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.