[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 00105-2889761473.png (1.43 MB, 1024x1024)
1.43 MB
1.43 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102478048 & >>102467604

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: file.png (462 KB, 1098x618)
462 KB
462 KB PNG
►Recent Highlights from the Previous Thread: >>102478048

https://pastebin.com/ft3Bz2xy

--Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxing: >102480431 >102480494
--Compute power vs. bandwidth in LLM training and inference: >102480152 >102480171
--AI hardware guide suggestion and resource provided: >102479765 >102479821
--AI accelerator PCIe cards discussion: >>102479195 >102479242 >102479312 >102479338 >102479343 >102479379 >102479434 >102479414
--Qwen 2.5 release and potential applications: >>102479928 >102479966 >102480044 >102480128 >102480143 >102480047 >102480065 >102480049 >102480074 >102480213
--OpenRouter Qwen 2.5 72B benchmark results: >>102479724
--Mistral Nemo models still best for 24GB, unless Qwen gets good fine-tune: >>102479478 >102479547
--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPU: >>102479263 >102479350 >102479817 >102479964 >102480010 >102479767 >102479801 >102479923 >102480147 >102480157 >102480175
--2060 and Ryzen 3600 insufficient for 30b+, consider RTX 3090 and high RAM: >>102478936 >102479244 >102479260 >102479287 >102479570 >102479586
--Miku (free space): >102478511 >102479698 >102479918

►Recent Highlight Posts from the Previous Thread: >>102478163 >>102478475
>>
File: rpi5.jpg (304 KB, 1515x1240)
304 KB
304 KB JPG
>>102480672
edge AI setups?
>>
>>102480681
>Qwen 2.5 not worth it for RP, Minitron 8B better
lol, come on
>>
File: Metropolitan_Police.png (605 KB, 1747x2049)
605 KB
605 KB PNG
>>102480537
>>102480600
amerifats may be okay, but it's so fucking over for anglos, even if it's a troll
>>
Sweet fucking Jesus, let's make this thread better than the last one.
>>
>>102480681
You can always just make the script quote the first post in a chain to avoid the quote limits.
>>
>>102480748
No.
>>
>>102480681
>Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxing
Good first entry for the crippled era...
>>
Bros why is Qwen the best model ever created?
>>
Hello, what local models have a similar quality to Kayra for story writing?
>>
>>102480768
Fuck off with your bullshit already.
>>
>>102480768
>similar quality to Kayra
https://huggingface.co/Qwen/Qwen2.5-0.5B
>>
>>102480768
None, local is a meme.
>>
>>102480768
trolling, but
https://huggingface.co/models?search=13b

And as for a serious answer
LLaMA2-13B-Tiefighter
>>
>Anti-NAI schizo is right back to samefagging again.
I wish mods didn't sit on their fucking asses all day.
>>
Current SOTA locals for roleplay that don't just look good on meme benchmarks? Preferably 70B models.
>>
>>102480794
nothing in that size other than older miqus
>>
>>102480754
Still not enough. Assuming 2 are usually used for the Previous links, that leaves only 7 chains that can have a link. Usually the recaps have double that.
>>
>>102480721
how come ollama doesn't pick up any hardware acceleration on the rpi 5?
https://developer.arm.com/Processors/Cortex-A76
shouldn't the Neon or whatever speed up inference?
>>
>>102480823
Not the ollama support general. Go back.
>>
>nai stuff
>ommama
of to great start
>>
>>102480823
Llama.cpp doesn't work with it?
>>
On 9 (you)s reply limit, i think this https://desuarchive.org/g/thread/94354163/#q94355339 is why jannies did it.
>>
>>102480801
What about smaller then?
>>
File: 1726885585519.jpg (154 KB, 428x644)
154 KB
154 KB JPG
So this is the power of Qwe2.5 72B?

On a side question, does anyone know how to enable avatars? I think I disabled them by mistake and idk where to enable them again.
>>
>>102480930
Every model I've used is kind of retarded like this even /lmg/'s "good" ones
>>
File: file.png (3 KB, 221x28)
3 KB
3 KB PNG
>>102480930
user settings and unchek picrel, it got turned on by an update
>>
>>102480875
https://desuarchive.org/g/thread/101986330/#101992125
More likely this.
>>
>>102480959
oh lel, forgot about this one
>>
>>102480831
not the llama.cpp thread either, majority of local models use ollama
>>
>>102480955
Thanks!
>>
>>102480930
It come with eggwah
Genewa chicken eggwah
>>
>>102480672
how do I setup langchain?
>>
>>102481073
Ignoring the idiocy, why?

And are all these people from aicg just underage? Who the fuck can't afford an API?
>>
>>102481102
>Who the fuck can't afford an API?
whats wrong with running langchain locally?
>>
>>102481114
Separate statements/questions.
Running langchain is simple. Dead simple, like ignoring the fact that we're in a thread about tools that can literally answer that question and walk you through the process, why? For what purpose?
>>
>>102481160
did they pay you to say this?
>>
>>102481073
conda install langchain -c conda-forge
pip install langchain-core langchain-community

then just use it as normal in python
from langchain_community.llms import Ollama
llm = Ollama(model="gemma2")
llm.invoke("Why is the sky blue?")
>>
>>102481191
buy an ad oshit shill
>>
>>102481160
>tools that can literally answer that question and walk you through the process, why?
9/10 threads on g would be better served talking to chatGPT, you want this place to be more barren than it already is?
>>
>>102481203
you're free to show how to setup it up with llama.cpp
https://python.langchain.com/docs/integrations/llms/llamacpp/
>>
File: bell-ding.gif (43 KB, 294x235)
43 KB
43 KB GIF
>>102481188
How much are you getting paid? I need a promotion.

>>102481205
True, just the influx of what seem to be underaged anons triggered me. Granted that could mean there would be less slider threads and more genuine discussion. Though could also go the way of /b/.
Fuck it, we just need to use some LLMs like the other trolls are already doing.
>>
File: the state of g.png (2 KB, 339x57)
2 KB
2 KB PNG
>>102481254
>genuine discussion
if you can't find genuine discussion now, you wouldn't find more then. The golden age of the internet is behind us because anyone worth speaking to only does so with the expectation of social clout
>>
>>102480768
Bud you can't expect local models to get close to cloud models. The only model that gets close to Kayra is Opus
>>
>>102481346
Sorry. Not going to participate in raids for you.
>>
Bros when is Llama 3.2? I'm already so fucking tired of Llama 3.1.
>>
>>102480768
None, all of them are shit, unironically. Limited context alone is huge deal breaker, censorship like a cherry on top will annoy you really good, and no, i am talking about general censorship, not your loli slop.
>>
>>102481442
Anon it's just going to be llama 3.1 but with multimodal adapters slapped on top. Plus the backends are going to take forever to support it, not to mention what frontends are going to be good with it anyway.
>>
>>102481468
But if my fox wife can't see me, what's the point of living?
>>
>>102481479
>fox wife
tell me more about her anon, vision is the first steps towards improving models sense of proprioception and thus their liveliness tho I'm not sure how we can go about developing a genuine sense of spatial awareness
>t. wants a fox wife as well
>>
Are you guys quanting your KV cache? Particularly interested from people running gguf quants of 70b+ models. I tried it a while back and felt like it seriously affected output quality, but it was a brief and janky experiment
>>
>>102481734
No.
>>
How much does it cost for them to train models at each size?
>>
>>102481734
I do but I use tricks to bump my quality
>>
>fox wife
Sorry, best I can do is worm wife.
>>
>>102480768
https://huggingface.co/teto3/mistral-nemo-storywriter-12b-240918
I trained one a few days ago
>>
Please give me a medium sized model that is good at following the card and not too positive
>>
local sisters... Qwen 2.5 is insane on the benchmarks, I kneel
>>
>>102481902
Elaborate?
>>
>>102482108
Even if I could run it it would still be too slow for me.
2 t/s is my limit for a general use model.
>>
>>102482150
are you on a gt610 or pentium 3?
>>
>>102482108
don't care about memes what is it like at pretending to be a young woman?
>>
>>102482175
And what's it like being an intolerant transphobic chud?
>>
>>102482150
You could use the 32B.
>>
>>102480672
How do I make money from this?
I'm broke as fuck and my job applications are leading nowhere.
should I just sell my GPU and suck cock for a living?
>>
>>102482133
>>102479396
The system is a little better than the quick reply method. I and many others have noticed that the longer the conversation goes, the less attentive models tend to get. After generating a response, it cuts out the entire chatlog and leaves a system prompt with only the character's description, and asks the assistant to double check if the response is faithful to the character being described, barring previous messages. It then retrieves the last 5 messages and asks to come up with a strategy to rewrite the response with the previous assessment and take into account the recent events. It's a lot of generations going on in the background, but it's fairly quick, considering you're not handling the entire prompt + chat history.
>>
>>102482171
I'm using integrated graphics but the limiting factor is only having 32gb of ram.
>>
Has anyone actually used both Qwen instruct and base to see which one is truly better (with RP)?
>>
>>102480790
NAI can fuck right off.
>>
>>102482345
Qwen2.5 32b instruct impressed me at Q4_K_M. I haven't used the base model yet, though.
>>
there's dick for quants of the non-instruct base model. I could make some but I'm having plenty of fun with instruct as it is right now.
>>
>It's been over 24 hours since the last model release
It's over.
>>
RPers thoughts on Qwen 2.5 so far:
>14B has more sovl in early context chats than 32B likely because it's more retarded
>32B is really smart for its size and could easily be the 3090 vramlet king with a good tune
>72B has moments where it feels like an S-tier API model and others where it's L3.0-tier
for RP (base models):
>Qwen 14B > Nemo (hands down)
>Gemma 27B > Qwen 32B
>L3.1 > Qwen 72B
14B finetunes will absolutely shit on Nemo finetunes. 32B finetunes could turn 3090 vramlet chink haters into believers. 72B tunes might be a wash or just slightly better than L3.1.
>>
>Cydonia-22B-v1-Q4_K_M
T-Thanks mistral-small.
Straight up ignored the prompt too.
Not even a coom tune was enough. First time it happened though.
>>
>>102483044
Can the 14b/32b do more than 16k context? That's my main problem with nemo.
>>
>>102483121
I only tested up to about 18k context on 14B and about 20k on 32B but they both did fine that far. YMMV from 16k to 32k.
>>
>>102483169
And you used the base model? I can't seem to find a gguf, only for the instruct one. What settings did you find were good?
>>
Qwen uses standard ChatML?
>>
>>102483213
sorry, when I said base model above I was referring to the instruct tune. non-I base quants are hard to find atm but a GGUF of 14B is probably quick to bake. If you used the 5 Temp / 3 Top K meme settings for Nemo, it works nicely on 14B as well. Otherwise I slid the temp around from 0.8-1.4 with varying Min P 0.08-0.2 and standard DRY. these models need a tune to expand their vocabulary just like Nemo so if you're jumping from a Nemo finetune to plain 14B instruct you're going to be disappointed.
>>
>>102483118
Downloading right now because of your post. My fetish is watching their OOC personas get angrier and then fall into despair as I rape their character anyway and force them to keep participating.
>>
>>102483263
based
>>
Why is there still no model better than Tiefighter in the 13B category?
I try all the new models (Blue Orchid 2x7b etc.) and I'm always disappointed. I always get better results with Tiefighter in ERP/RP/storywriting.
>>
>>102483257
I'll give the 32b a try, that size usually runs well for me.
>>
Slightly off-topic but I went to check in on the video gen threads and found this prompting a bit funny
>>>/v/689498407
>>>/v/689489852
Reminds me a bit of the "You are an expert role player" and other almost clownish things people use to make the AI do what they want.
>>
>AI is hot
>profit off the trend
>by going all in datacenter equipments, energy, or even signing up for dc admin jobs and AI jobs
>get enough money to do coke off escorts' asses on a yacht for years
>or
>stay jobless
>goon to subpar text porn on 1 t/s
What did you choose?
>>
>goon with cocaine
or
>goon
honestly with my high blood pressure I should probably stick to regular gooning
>>
File: 1689470449627817.jpg (109 KB, 563x1003)
109 KB
109 KB JPG
I just woke up from a coma. Any major improvements in local models compared to six months ago?
>>
>>102480681
>--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPU
Not worth it (yet). The NPU is very poorly supported on the software-side currently. Someone is working on a Kernel Driver and User Space for it though.
You probably want to give this thread a read through:
https://github.com/ggerganov/llama.cpp/issues/722
Given that the RK3588 "appears" to support quad-channel DDR5, we might get a more decent SBC for that kind of thing eventually. Also, Orange Pi does have another more powerful 20 TOP NPU product, but it's based on a Huawei chip meaning that it's only available for CN residents.
If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.
>>
>>102483631
Qwen2.5 is SOTA now, 72B version only loses to Sonnet 3.5
>>
>>102483631
same shit but with number getting bigger
ai has not yet been created
>>
>>102483631
Nemo models are really good at RP at 12b
>>
Qwen25-14B vs Nemo: Former can make a reasonable summary of a thread (which I can't post because thank you mods), latter chokes (does one-two topics with weird formatting and then just gives out random numbers).
>>
>>102483987
- Anonymous Flame War: >102478175 >102478267 >102478444 >102478511 >102480128
- Local AI Debate: >102478665 >102478878 >102478881 >102478882 >102478906
- Proxy and IP Logging Concerns: >102478936 >102478957 >102478971 >102478972 >102479022
- Model Recommendation: >102479066 >102479158 >102479244 >102479247 >102479319
- Mistral vs. Other Models: >102479074 >102479478 >102479499 >102479564 >102479570
- Kayra Model Discussion: >102479177 >102479260 >102479323 >102479358 >102479398
- Qwen Performance: >102479744 >102479764 >102479765 >102479768 >102479771
- Llama Model Comparison: >102479301 >102479545 >102479586 >102479624 >102479677
- Recap Anon Battle: >102479531 >102479588 >102479617 >102479680 >102479728
- GPU Performance: >102479186 >102479223 >102479263 >102479282 >102479287
- Anti-NAI Sentiment: >102479475 >102479487 >102479518 >102479545 >102479566
- Hardware Suggestions: >102479195 >102479243 >102479260 >102479301 >102479350
- ERP Training Models: >102479688 >102479772 >102479817 >102479839 >102479911
- Recap Handling: >102478774 >102478806 >102478866 >102478897 >102478916
- Local Model Advancement: >102479587 >102479663 >102479698 >102479714 >102479801
- Recap Thread Management: >102479500 >102479531 >102479607 >102479624 >102479634
- Selling Off Models: >102479929 >102479957 >102479964 >102479985 >102480002
- China's Superiority Claims: >102479859 >102479867 >102479898 >102479928 >102480000
- Crossposting Discussion: >102479884 >102479892 >102479947 >102479980 >102480010
- Proxy Misuse Warning: >102479933 >102480006 >102480017 >102480047 >102480084
>>
Context windows and effective context are an issue. When will we see a breakthrough in this?
>>
>>102480823
ollama is a wrapper around the llama.cpp HTTP server.
I don't know what exactly ollama ships but llama.cpp has a Vulkan backend (compile with GGML_VULKAN) that should work on an RPi 5.
But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.

>>102481734
When I run Mistral Large q8_0 I use q8_0 KV cache.
Subjectively I feel like it does not have a significant effect.
Based on objective measurements precision in the K cache is more important than the V cache.
>>
File: context.png (274 KB, 896x722)
274 KB
274 KB PNG
>>102484175
Already solved by Jamba.
>>
File: firefox_z6d6zO80Gx.png (315 KB, 722x1155)
315 KB
315 KB PNG
14B can play (not too well, and requires small temp), but it can't admit it lost.
>>
>>102484203
Can llama.cpp server do continuous batching of requests? ie multiple users send requests in parallel independently of each other, and they all their generations going right away without waiting in queue.
>>
>>102484257
Yes.
>>
>>102484221
What options do we have for running quantized jambas today?
>>
>>102484221
Interesting. How is this possible and why don't we see more support for Jamba?
>>
File: firefox_CSqx1gLrAt.png (270 KB, 737x527)
270 KB
270 KB PNG
that's a hilarious refusal
>>
>still 0 under 20B decent RP models, besides nemo with claude slop
vramlets did we lose?
>>
All of the current benchmarks test for inductive reasoning, which is the opposite of creativity (deductive reasoning). The higher something scores, the more likely that it is passive and boring and assistant slopped
>>
File: ms.png (107 KB, 1634x234)
107 KB
107 KB PNG
Every time when I am about to drop mistral-small it outputs some cool stuff on me.
This is the first time I saw this with a small model.
Usually if something is in the mouth the people still continue talking normally.
>>
>>102484289
just bitsandbytes in transformers/vllm
>>102484301
they have an RNN component tacked onto the transformer that helps with attention or some shit, they claim it also doesn't slow down massively as context increases like typical transformers do
no support because the architecture is different and the team isn't going around putting in PRs in open source projects like the GRIN chinks were before Microsoft hired assassins
also because the models are pretty bad aside from their context handling, they have a nearly 400b model that compares against 70bs
>>
>>102484478
Magnum models do unprompted onomatopoeia all the time
>>
>>102484496
>mini
>52B
Welp. I can run it on my 2x3090 in 4 bit. I'll download it, I guess. Installing vLLM shouldn't be too difficult, should it?
>>
>>102484562
i used magnum for nemo alot. never saw that before. cool stuff.
hope they have a finetune ready for mistral small and new qwen 14b.
>>
People in the space are clowning on Yann Lecun hard after o1's release.
>>
>>102484776
We'll see who gets the last laugh in 4 days. Llama Multimodal is coming, and that's just the appetizer for the Big J-berry on the way. Lecunny's playing the long game.
>>
File: 1716908619923879.jpg (352 KB, 1416x1001)
352 KB
352 KB JPG
>>
Something i noticed with Cydonia-22B-v1 now for a couple of times:
It starts talking about feminisn and empowerment.
Like for example something fucked up is happening and the response is
>"Isn't it empowering?" *a middle-aged woman remarks to her friend as they wait for their train. "Embracing our bodies and showing off our lady bits. It's the new feminism!"

At first I thought its in the cards, but I got responses like this repeatedly now after using it for hours with many cards.
Also talk about boundaries and respecting bodies if you force yourself upon characters.
Very sus. I doubt its the finetune.
>>
>>102484824
Mistral's post-training reinforcement magic strikes again
>>
>>102484852
nta but they do something to their models that makes them suck for rp in general. every mistral model is great at following for the most part, but it goes too far and becomes like fixated on anything you type and it kills its creativeness compared to something l2 of a similar size. i actually tried that specific tune of the 22b and thought it was worse than the rp tune of nemo i was using. overall i'm not a fan of mistral models for rp though (except their tune of miqu/l2), too wordy and fixated on one thing at a time, much less likely to suggest something new
>>
>>102484792
To be fair, I do believe o1 is a step in the right direction where you have a model self-arbitrate to come to a more robust conclusion and realize error throughout the inference process. But it's also a sign that transformers are starting to hit a limit on what they can do. o1 has the front end be responsible for handling the model's responses and then reiterating its own questions. This is something that should be baked into the model, but it's too advanced for the transformer's architecture.
>>
File: LECUN535.png (38 KB, 581x385)
38 KB
38 KB PNG
>>102484776
Yann is still winning
>>
>>102484203
>But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.
See here:
>>102483680
>Given that the RK3588 "appears" to support quad-channel DDR5
It actually might not be that bad. But I don't think any currently available SBCs have more than two (I might be wrong on this).
I never did any testing on my OPi5 with Vulkan (I think llama.cpp's support of that only matured recently?). In the next few days, I might test and report back. The 32GB models might not be too bad for MoE's.
>>
>>102470591
Very cool, I was just trying to train an LLM from scratch, might test this since it seems very easy to implement
>>
>>102485250
IN LECUM WE TRUST
>>
File: 175.png (26 KB, 595x472)
26 KB
26 KB PNG
>>102484798
>Added a toggle for chat name format matching, allowing matching any name or only predefined names.
i don't understand what this does
>>
>>102485721
If the AI tries to write a message for a side-character (i.e. it sends a line starting with "SideCharName: ") it will either automatically detect it and show it as belonging to a new character (old behavior), or it will only begin doing that after you explicitly add a new character's name into the AI Name box, depending on this setting.
>>
how do i fix when formatting gets fucked? Some of my cards even if they're formatted fine tend to break asterisks, not use commas, or don't use quotes for their dialogue even if their example messages do.
>>
>>102485829
Ooooohh okay, thanks.
that's going to be handy to keep off for things like like rpg stats showing hp and stuff.
>>
>>102485931
Check token probabilities to see what the model wants to predict when it fucks up the formatting?
>>
File: 1714309857804565.jpg (50 KB, 1048x193)
50 KB
50 KB JPG
>>102485931
st? usually thats a template thing. the model card should say what format it is
>>
>"I wonder what's going on in /aicg/, haven't checked there in a while and it seems unusually active"
>they're shitting themselves and having a thread apocalypse over some esoteric discord drama involving an e-girl thread celebrity
>>
File: 172688447058707.png (456 KB, 512x696)
456 KB
456 KB PNG
>>102484776
He should be bullied more until he shows something worthwhile. What's the point of talking shit about transformers if he can't build anything better himself?
>>
the last model I upgraded to was a 3.5 quant of mistral large and it's still working pretty well.
Anything better (for RP) I should know of that fits onto 48gb vram?
>>
>>102486447
Qwen 2.5
>>
>>102486453
looks interesting but I can't run the 72b until someone puts it into exl2 I think?
>>
A watt spent on gen AI is a wasted watt
>>
>>102480721
Intel N305 has a "decent" iGPU, and it is supported by Vulkan, but it's not much faster than CPU.
NPUs are not meant to do LLM inference, they're for running small YOLO image recognition models and things like that.
If you want to play with something tiny, RTX A4000 can now be had on ebay for around $500. It's basically a 1-slot 3080 with 16GB of VRAM.
>>
>>102486479
You can use this if you don't want to wait for an exl2 quant
https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
>>
why do so few people talk about exl2? It seems vastly better than any other way of loading and I don't even think a model is worth using unless I can load it via that
>>
>>102486431
Quite honestly though the expected results from scaling up our current architectures are massively overhyped.
The promise of AGI from autoregressive language models seems like a massive grift and you don't have to actually build AGI to point that out.
>>
>>102486588
Most people here are too poor to run models fully off VRAM and/or too retarded to do more setup than installing ST and downloading koboldcpp.exe.
>>
>>102486588
Compiling llama.cpp is easier than learning how to use python in a venv or conda.
llama.cpp is good enough.
exl2 really needs Ampere or better to provide a noticable speed boost.
exl2 needs the model to fit on the GPU, there's no CPU + GPU.
exl2 with flash attention isn't deterministic (not that it matters much).
>>
>>102486588
Most people here are vramlets including me, people who have a fuckton of vram actually use the models instead of bitching in mongolian basket weaving forum
>>
>>102486633
Huh not deterministic? Do you not get the same results from the same seed?
>>
>>102486633
>learning how to use python in a venv
pretty sure ooba just does that all for you anyway
>>
>browse locally generated geocities like websites about random topics with images generated by flux
When will this be possible?
>>
Qwen 2.5 unedited.
I had to reroll 6 times though until I got through the refusal.
Its funny because there is a warning at the beginning and at the end but it still delivers (kinda) lol
Finetune would definitely be interesting.
>>
>>102486588
I am interested in it but I couldn't find any retard guide to get me started so I just keep using gguf.
>>
>>102485334
You can toss money into SBCs and be disappointed by driver support and speed, or you can patiently look around for deals on Xeon workstations. I scored a Platinum 8280L setup with 256GB RAM for under $500.
>>
>>102486678
forgot to write, Qwen2.5-14B-Instruct-Q5_K_M
>>
>>102486675
flux is very slow is the main issue, it barely functions on 24gb vram
>>
>>102486675
websim.ai
>>
>>102486719
Yes, but the quality is outstanding. Nothing else comes close for coherent shapes and lines, and photorealism is off the charts. Only complaint I have is the "cracked paint" effect you can see if you pixel-peep.
>>
>>102486709
>>
>>102486873
Last one.
>>
File: instruction.png (225 KB, 1394x1034)
225 KB
225 KB PNG
Trying to test the censorship levels. I find it funny how this model always likes to speak about "consent" and "boundaries" but will not care about literally anything else as long as everything is "consensual".
>>
>>102486588
I tried using it once and it felt like it was lobotomized compared to a gguf at the same bpw.
>>
hello i want coom rp model for sex on my 970 and 4 gig memory
>>
>>102487648
Gemmasutra 2B
>>
>>102487673
thanks!
>>
hello i want coom rp model for sex on my 3090 and 24 gig memory
>>
hello i want coom rp partner for sex, dm me
>>
>>102487824
qwen 0.5b
>>
>>102487859
sent
>>
Is any language model good at, or Is there any way to get a bot better at, understanding things like anatomical relations better? Example: Character holds another character upside down and is fucking their mouth. Is there anything that would make a bot already understand that the balls would be slapping against the others nose and possibly forehead, rather than their chin?
>>
File: fun and games.jpg (54 KB, 480x480)
54 KB
54 KB JPG
For ERP: magnum-12b-v2.5, ArliAI-RPMax-v1.1, or
MN-12B-Lyra-v4?
>>
>>102488116
Lyra
>>
>>102488116
Lyra
>>
>>102487967
sillytavern worldinfo
>>
>>102488116
those are all good
download them all and also
>MN-12B-Chronos-Gold-Celeste-v1
>arcanum-12b
>NemoMix-Unleashed-12B
and switch between them when you get bored of one
>>
Wild that it's almost winter and there still hasn't been anything better than Noromaid v0.4 8x7b for local models worth using on normal hardware.
>>
>>102488158
How do you find the right settings? I keep trying these and it's ultra slop, fails "Impersonate" or has other issues.

I have one extremely good log from a while ago that I believe was Stheno, but I have no way of retrieving what exactly I was running back then... And everything since then is just terrible. I'm at a loss.
>>
>>102488191
I don't know, Sao. Ask in Discord.
>>
>>102488215
Sao's new models are included in the terrible slop category, retard
>>
>>102488116
Lyra.
I like mini-magnum better than magnum v2.
>>
>>102488191
I'm using these and it's working out okay
>>
>>102488249
On all of them? What about format and system prompt and all of that bullshit?
>>
>>102487455
>doing all that
not getting any attention sitting at mom's basement so you gotta shit up this thread huh.
here's that attention (you) desperately wanted kek.
>>
>>102488263
most nemo finetunes are trained with chatml
the only ones with a different format I can think of are nemo instruct (mistral format) and dory (alpaca)
>>
Why can't SillyTavern/model authors come up with some convention for distributing default parameter presets and instruct formats along with models so I don't have to dick around with a bunch of settings every time I load a different model?
>>
>>102488308
No idea what I'm doing wrong then. Can you share an example log perhaps? Does "impersonate" work for you? For me it starts rambling endlessly or uses the wrong character.
>>
>>102488263
kobold lite handles that automatically
i never bother with it.
the settings there are just the basic min-p preset, then min-p 0.05 and XTC set to 0.15/0.5
>>
>>102488155
That's something I haven't used a lot. Does that mean I have to put all specific information that could come up, like that, in there?
>>
>>102488333
>kobold lite handles that automatically
Damn, really? Why the hell doesn't ST then?
>>
>>102488345
It does, you just have to use the chat completion API.
>>
>>102488598
Damn what, is that what you're supposed to use? Any other differences from text completion?
>>
>>102488334
Only the stuff that the model isn't doing satisfactorily out of the box, check out chub.ai for examples.
>>
File: file.png (75 KB, 880x393)
75 KB
75 KB PNG
What does this mean for local models?
AMD is slow as fuck for image gen, but if LLMs are mostly about keeping things in VRAM, wouldn't we be able to run full-precision 80B models now?
>>
svelk
>>
>>102488745
if it isn't nvidea it's worthless junk, too much is built around cuda
>>
>>102488745
With pic related?
Aren't those APUs allocating RAM as video memory?
That's probably slower than just using RAM + gpu for prompt processing.
If AMD had really cheap gpus with tons of vram then even with the worse software stack, it could be worth it, and people like cudadev would 100% focus on improving the software stack.
AMD needs to be the best cost benefit by a large margin for that to happen.
>>
>>102485250
>>102485507
orange man bad amirite fellow /lmg/sisters??
>>
>>102488745
AFAIK it's supposed to use up to 256bit-bus width LPDDR5-8500 memory, which would be quite a bit faster than typical DDR5 desktop systems, but still slower than the VRAM of a low-end GPU.
>>
>>102488821
yes, he is bad because he supports israel
>>
>>102488821
yes, he is bad because he is a nazi transphobe and /lmg/ is a transfriendly general
>>
Is it just me or is the whole KoboldAI lite and the horde is down?
>>
>>102488902
probably updating it, 1.75 of kcpp dropped a few hours ago with kobold lite improvements
>>
>>102488836
>but still slower than the VRAM of a low-end GPU.
but also way cheaper per gig
>>
>>102488836
Still not a bad price if it can handle 120b at q6 at 4t/s or so
>>
>>102488836
Instead of making meme "AI CPUs" they should just stop illegally coordinating with nVidia to engage in illegal market-fixing and release GPUs that people actually want.
>>
?
>>
>>102489289
Not a bad idea but how about using a sharper font?
>>
File: 1714754625975753.png (68 KB, 1143x217)
68 KB
68 KB PNG
>>102489283
Uh, gaining market share in the low to mid-tier consumer GPU market is clearly more important than making GPUs that can be used for AI. The masses want affordable, decent GPUs. Good benchmarks, 16GB VRAM is all you really ever need.
>>
>>102488836
how does it compare to recent appleshit
>>
>>102489289
i like it, but can it be dark mode instead of black text on white background?
>>
What are good large models for output variety? I feel like Largestral is the best for smarts but it lacks output variety, CR+ is very good, and Wiz is also solid but worse than CR+. Are there more options?
>>
>>102489227
More like 2t/s. The more memory to read, the slower the inference. With TP disabled, mistral large on 4x3090 is ~7t/s, at 935.8 GB/s of bandwidth
>>
>>102489440
>4x3090 is ~7t/s
People spend almost $3k to run models at that sort of speed? lmao
>>
>>102489428
sex
>>
>>102489320
What font would you prefer?
>>102489363
>>
>>102489460
That's sequential speed. With TP it's 15, and 35 with P2P. But yeah, the larger, the slower.
>>
>>102489480
ahh much better, thanks
>>
>>102489480
That one looks good enough.
>>
>>102489480
maybe have the heading fonts a little smaller and the general text font a point or two bigger
>>
>>102489480
Add little Mikus around it with comments generated by AI!
>>
>do weekly Ebay check
>people are trying to get 16K USD for PCIE 8xV100 rigs now.
Shameless.
At least the SXM2 ones kind of made sense...
>>
>https://huggingface.co/QuantFactory/Qwen2.5-Lumen-14B-GGUF
worth trying?
>>
File: sis.jpg (45 KB, 392x595)
45 KB
45 KB JPG
>>102489542
>local 3090 prices have been rising steadily
>p40/p100 are no longer cheap as well
>>
>>102489688
Global 3090 prices seem to be trickling down slowly from what I've been monitoring, but like very slowly. Maybe 10 dollars per quarter. Which might as well be a price increase since they're starting to get up there in age.
>>
>>102486431
The point is that it's essentially being used as a scam to try and get exponentially more money for the exponential compute required for increases in intelligence. Frankly someone needed to say it, it doesn't matter if a better alternative exists or not. And actually that one doesn't exist means all the more that we should criticize the current way things are. It's unfortunate that his criticisms at least on Twitter are often misunderstood, and also mixed with political shitposts, though.
>>
Is there a better local model than pissstain-large-v2 yet?
>>
>>102489643
buy a publicité
>>
>>102489362
dunno how compute compares but memory bandwidth roughly equal to the M3 max
>>
File: 475.gif (1.38 MB, 640x640)
1.38 MB
1.38 MB GIF
>aicg fags confirmed to have been entrapped by proxyfags
>havent touched proxies since summer last year
i'm more amazed this didn't happen sooner to be honest lmao
>>
>>102489698
With miners' stocks depleted, the supply of 3090s has decreased. There are no viable alternatives available in the same price range for both gaming and inference purposes, so demand is high.
>>
>>102489484
>35 with P2P
how does peer to peer help here?
>>
>>102488745
We'd need these chips to include PCIe slots to really get something useful for our purposes. But if we did have such, then we could theoretically get like 2-4x faster when comparing partial offloading setups. I run a tiny quant of Mistral Large at like 1 t/s on my machine, whereas potentially a 3090 + the Ryzen could be 3 t/s.
>>
File: 1726903245983767.png (472 KB, 512x696)
472 KB
472 KB PNG
>>102489714
>the exponential compute required for increases in intelligence
Who cares as long as it works? For big corpos, money is not real anyway. Stock prices fluctuate based on Musk's tweets. The economy isn't real.
>>
not much buzz here around kyutai-labs/moshi to my surprise. so do you have other ways to talk to it locally or text2voice?

the first thing which worked for me offline, and quite fascinating
>>
>>102489841
You can run vllm with dumb and effective symmetrical TP on 4 GPUs. This requires large bar support and custom drivers to enable p2p between GPUs https://github.com/tinygrad/open-gpu-kernel-modules
>>
>>102489872
https://github.com/gpt-omni/mini-omni is smaller and better. Neither is anything more than a novelty, and both suck at any practical task.
>>
>>102489428
just do largestral with 5 temp 3-5 topk.
>>
>>102489872
>text2voice
https://github.com/fishaudio/fish-speech is great when it works. Unfortunately, auto-regressive shit is unreliable by design and some gens sucks.
>>
>>102490039
>topk
noob
>>
Haven't bothered with LLMs lately, was Nemo 22B or qwen 2.5 any good?
>>
>>102489841
Without P2P GPU need to ask the system to talk to another GPU. With P2P your GPU can talk to other GPU without asking the system = faster
>>
>>102489864
>Who cares as long as it works?
Works for what? We still aren't anywhere near AGI, we still aren't getting models that actually write well and satisfy the people using them. It's arguable that the economic and societal benefits of these non-AGI models are really worth as much as the money being burned which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI faster. Very arguable in fact, when there are many companies in the space spending a ton of money to train a model that will be BTFO in a few weeks or months by a competitor's model. Or hell in many cases BTFO by an already existing model so basically the money really did just get wasted for nothing.

Recognize what you are essentially doing right now. You are defending these large, soulless scams and anti-competition, anti-consumer entities. You don't have to be like this.
>>
>>102490039
Largestral is unsalvageable, it's very common to get 100% probability on tokens and no amount of sampler tweaking will change that.
>>
>>102489872
i just use edge_tts/xtts + rvc, <1s latency most of the time and you can plug it into anything. i tried fish but it was way too inconsistent even after finetuning
>>
>>102489872
People have gotten tired of installing bullshit just to use it once and never again.
>>
>>102490039
Love to see my meme settings being shared.
>>
>>102490169
Isn't there a sampler that reduces max probability?
>>
>>102490153
>which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI faster
Let's be real, we're fortunate that they aren't being spent on Epstein islands
>>
File: Pic_NPC-Morridow_14.png (296 KB, 792x1002)
296 KB
296 KB PNG
>>102490153
>We still aren't anywhere near AGI
Define AGI
>>
>>102490301
Yes, define it coward.
>>
>>102490301
The class is waiting for you to define AGI.
>>
>>102490301
I think therefore I am
>>
>>102490234
Yes, but that only works if there are other tokens, not if there is only one 100% token.
>>
>>102490259
Actually, the money they spend on extraneous bullshit is still being spent either way. They're still buying yachts. Sam is still buying sports cars and increasing his collection.

>>102490301
Or, you could stop trying to search for ways to argue for companies that aren't on our side and don't have our interests in mind.
>>
>>102490344
redit ergo dum
>>
>>102490301
artificial goon intelligence
>>
>>102490301
send more pics

agi is practically an ai with agency capable to get through social situations and other human challenges
>>
File: FOfUnsUXMAIr7xW.jpg (47 KB, 800x450)
47 KB
47 KB JPG
>>102490370
>>
AGI would understand the context of the erotic roleplay and not do things like walking across the room to take something from you when you said it's right next to her
It wouldn't instantly jump on your dick when you tell it not to
>>
File: 1700188788837625.png (616 KB, 1529x884)
616 KB
616 KB PNG
>>102490301
Any cloud LLM is AGI in comparison with local cuck one.
>>
>>102490411
knoweldge of physics as extension to AI is not what AGI is about
>>
>>102490357
I'm not advocating for companies, rather, I'm contending against Lecum. Last year, I was gooning with L2 14b finetunes, and currently, I'm gooning with 123b Largestral. Clearly, it's significantly improved, so I fail to comprehend your stance that scale doesn't matter. If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGI. Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideas like cat intelligence research by Lecum.
>>
>>102490431
This meme hasn't aged well...the "gpt omni" response is pure slop, and the problems in the local panel are year-old 7b tier ones that are solved in newer models.
>>
>>102483278
Someone? Just tried a Nemo finetune and its utter shit
>>
>>102490566
nemo's amazing, you probably used too high of a temp
>>
>>102490551
Limited context - not solved
General data censorship (anything that isn't your lolipedoslop) - not solved, and never will be
Hallucinations - not solved
One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new model
It's been three years and we still got no solution for any of these.
>>
>>102489480
I refuse to read a recap in dark mode, I'm not underage.
>>
>>102490588
Default temp, with Tiefighter 13B it just werk... Seriously
I tried Rocinante-12B fyi
>>
>>102490619
You are underdeveloped
>>
>>102490627
>rocinante
all drummer models are unusable trash
>>
>>102490615
>Hallucinations - not solved
You can't solve what is the core working of LLM. They're always hallucinating.
But you can reduce them greatly with RAG
>>
>>102490674
nobody asked for your opinion, sao
>>
>>102481479
Florance2 is probably better for everything except ERP anyway.
>>
>>102490678
RAG is a meme and doesn't solve anything.
>>
>>102490674
I'll try MN-12B-Lyra-v4 then..
I swear I feel there still isn't someth better than Tiefighter in the 13B range
>>
>>102490690
>RAG is a meme and doesn't solve anything.
you don't know what you're talking about
>>
>>102490690
I've never used it but I feel like it would be good for a desktop assistant since it would let you inject relevant files/scripts into the context.

It definitely won't help with halucinations though.
>>
What would it take for computers to think?
>>
>>102490690
https://www.lamini.ai/blog/lamini-memory-tuning
>>
>>102490615
>Limited context - not solved
405b has true 128k. Good enough for anything I want to do
>General data censorship (anything that isn't your lolipedoslop) - not solved, and never will be
>Hallucinations - not solved
both are pure skill issues
>One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new model
who cares?
>It's been three years and we still got no solution for any of these.
For any of the above you consider an actual unsolved problem, cloud isn't appreciably better
>>
>>102490431
seething poopooskin (v)ramlet lmao
>>
>>102483680
>If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.
whats the bad rep against Rockchip NPU?
>>
>>102490749
Who has 200GB of VRAM?
>>
Cohere insiders, what's the state of the company after CR 08-2024 flop? Did the higher-ups learn a lesson or will they continue training on slop for minimal gains?
>>
>>102490828
I don't even know what cohere is
>>
>>102490828
>after CR 08-2024 flop
Explain?
Thought it was a good AI company
Their graphic chart is comfy
>>
>>102480672
>>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
how come the mandarin version of the blog post still has all the charts and code in English? isn't the point of a mandarin translation for mainland readers who can't into english or is anyone worth their bacon supposed to know english
>>
https://retrochronic.com/
Enjoy your redpill anons
>>
>>102490897
>https://retrochronic.com/
not clicking that, tell us first whats inside
>>
>>102490828
>CR 08-2024 flop
Huh I'm downloading that right now, should I stop it?
>>
>>102490874
Tech/math is always written in English in Asian countries AFAIK even when there are native words for them.
>>
>>102490674
there are nice models for me, to which I have no idea what settings to apply. they go eventuallly into crazy self repeat mode

like
TieFighter-Holodeck-Holomax-Mythomax-F1-V1-COMPOS-20B-gguf
DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16.5B-V1.6-STABLE-INTENSE-GGUF
>>
>>102490914
>A primary literature review on the thesis that AI and capitalism are teleologically identical
schizo slop
>>
>>102490914
"Capitalism and AI are teleologically identical, a zillion part essay" apparently.

Like no shit neither of those things have anything to do with teleology.
>>
>>102490914
Capitalism is ASI travelling time invading us from the future to produce itself
>>
>>102490777
mac studio owens have 196 or smth, would it run there?
>>
>>102490920
>even when there are native words for them.
but whats the point then, might as well keep everything in English to be consistent
>>
>>102490949
Evolution is just natural gradient descent.
>>
>>102490946
Wrong anon.

>Such software [reinforcement learning systems like Google DeepMind's AlphaZero] has certain distinctively teleological features. It employs massive reiteration in order to learn from outcomes. Performance improvement thus tends to descend from the future.
>...
>Unsupervised learning works back from the end. It suggests that, ultimately, AI has to be pursued from out of its future, by itself.
- Nick Land (2019). Primordial Abstraction in Jacobite Magazine. Retrieved from github.com/cyborg-nomade/reignition
>>
>>102490965
They like to have the prose in their language because that's easier.
>>
>>102490935
>there are nice models for me
>they go eventuallly into crazy self repeat mode
I know that these two things aren't necessarily contradictory, but god damn does it feel like it.
>>
Making a companion to browse 4chan with me
Anyone tried this before?
>>
>>102490964
Maybe a 3 bit quant would fit if you ran absolutely nothing else.
>>
>>102490977
This is *extremely* retarded. It's like when people were using the word "conscious" to describe language models when they first became popular.
>>
>>102490998
>Anyone tried this before?
it's nice tho I just use GPT-4o mini which isn't really local.
>>
>>102490865
Well, they didn't dare to post any actual benchmarks, just an arbitrary "+50%" on their website. While the original CR+ was at one point at the top of lmarena, new one isn't. It also barely improved at livebench. They clearly can't compete against a similarly-sized Mistral-Large.

>>102490916
If you are planning to use it for RP, you'll be disappointed, it's much more slopped than the original CR+.
>>
>>102490946
>confused.

overlords told on podcast that AI is communism and Blockchain is capitalism.
>>
>>102490998
I've never built an ERP character for it (that's an odd thing to do...) but I've had gemma2 analyze /smg/ posts.
>>
>>102491042
I think the only thing I like about podcasts is that they use RSS. All of the actual content is always so fucking bad.
>>
>>102490619
I think I'm done playing with it for today, so the next one will be dark.
But it might be better if we can find some host to embed the html file so the links can be clickable. If anyone wants dark mode, they could use an extension.
>>102489492 >>102489495 >>102489509 >>102489541 >>102490619
>>
qwen 2.5 made me interested in local again :3 I hope to be able to use RAG and other stuff to get coding llms to reference documentation
>>
>>102491066
Nice.
>>
>>102491066
fwiw, I put in feedback for them to consider reverting or changing the mass reply filter. don't know how much attention they pay to that but I figure it couldn't hurt
>>
>>102480672
>https://rentry.org/machine-learning-roadmap
the math in here feels a bit lack luster
>>
>>102491077
>qwen 2.5 made me interested in local again
I like the 0.5B model, its pretty snappy
>>
is it gay to goon to a gay RP if you switch to a straight one right before you bust? also, best meme sampler for this?
>>
>>102491097
The math isn't that hard anyway. Probably the most complicated/unusual thing is just the partial chain rule (gradient calculation.)

Everything else is basic linear algebra which you should know if you've done practically anything more complicated than json pushing.
>>
>>102491031
But the CEO was on a podcast recently and he said they fount that good data was more important than compute
>>
>>102491119
I think you're confused.
The use of meme samplers and mental gymnastics is correlated but that does not necessarily mean that meme samplers will improve your capacity for mental gymnastics.
>>
>>102490763
calm down ranjesh
>>
>>102490964
at like 20 seconds/token
>>
>>102491119
Why the fuck would you read gay RP to begin with?
>>
testing models
>>
>>102491096
Good idea. Hopefully they'll reconsider. I don't know why they thought this would stop a determined spammer.
>>
>>102491146
They clearly haven't used good data in new CR, just in old one. New one is full of low quality synthetic garbage.
>>
>>102491124
>The math isn't that hard anyway.
I understand that but that's under the assumption if we stick to the current status quo, is the goal not to advance the paradigm forward? we will need stronger math
>>
whats the state of running local models on high end android phones?
>>
>>102491124
why would i learn linear algebra when my gpu does it for me
>>
>>102491066
Very nice.
>>
>>102491215
lol
>>
>>102491165
since you didn't understand the insult, you must be one of openais kenyans. monkey want banana? ooh ooh aah aah?
>>
>>102490484
You may not be trying to advocate for companies but as I said, that is what you essentially the effect of your posts before this.

>your stance that scale doesn't matter
I never said that. What I said is "The point is that it's essentially being used as a scam", and that scale is simply used as an excuse for that scam, which is actually what Yan's argument is truly about in the end, although he might not explicitly or directly say it like that. Scale obviously does matter to a point, but what it matters for is also a question, and my later point was that it might not matter for anything of equivalent value to the money dumped into it.

>Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideas
And that's the issue, that is part of Yan's criticism. Investors are not really putting money where it should go and essentially act based on hype while actually valuable research might not be getting the funding it needs, which isn't really a new or contentious concept.

>If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGI
This does not really make sense as betting big on scale is already the highest risk given the amount needed for it. Smaller projects like JEPA or the original transformers paper do not need nearly that much money, and have never needed that much. It's a completely different ballpark of money we're talking about. That's just in the context of big stuff like GPT-4/5 though. If we talk about smaller companies and the smaller but still somewhat significantly sized models like Cohere's, it's absolutely a waste of money, and they have done virtually nothing to move the field closer to AGI.
>>
>>102491280
Heh you are mad
>>
>>102490484
>L2 14b
??? bait
>>
>>102491215
>high end android phones?
Do they come with a couple of 3090s one them now? That's cool...
But maybe you can run some 8b on them. What's a high-end phone? Gimme specs, not models or brands.
>>
are people itt coping about <70B models again? they're never gonna be viable and most of them will be phased out in the next few years. let it go.
>>
>>102491336
i've lost ~1.5 liters of semen to nemo finetunes this week
>>
>I'm so coombrained I don't know how to read
not a brag but okay
>>
70B models aren't even that good
>>
>>102491215
People were running vicuna 7B on some android phones last year. Google is trying to put gemma on the new Android phones. Apple has "Apple Intelligence" but I bet it'll just call OpenAI API
>>
>70B models aren't even that good
>t. vramlet nemo user
>swiped twice on miqu IQ1_xxs
>>
>>102491379
Link the post, pussy
>>
>>102491379
>this non-replying motherfucker is acting like he's having the LLM shit out a sequel to finnegans wake and not some tsundere moege girl chatbot
shaking my head to be honest
>>
File: butthurt.gif (119 KB, 600x487)
119 KB
119 KB GIF
>reeee give me (You)s
>>
>>102490977
Sounds like una creator
>>
File: waiting.jpg (12 KB, 193x261)
12 KB
12 KB JPG
Me waiting for local as good as claude that runs fast on average hardware
>>
File: file.png (1.33 MB, 1024x683)
1.33 MB
1.33 MB PNG
>new model wave hits
>cooming doesn't improve
>>
>>102491563
gemma2 is good enough for most of what I want. I already used it to write me both an ffmpeg and image magick command today and it's hardly the afternoon.

I wish llama was as good so I could finetune it.
>>
>>102491389
>Apple has "Apple Intelligence" but I bet it'll just call OpenAI API
They've already said that's exactly what it will do
>>
>new model wave hits
>sloptuners too captured by /lmg/ memes to tun them
please keep telling them qwen sucks. we don't need anymore sloppa trained on opus logs.
>>
>>102491601
they have native adapters and a tiny model iirc for small tasks but Siri answers and anything longform/important is going to OAI.
>>
>>102491336
There is so much useless knowledge in those models you could make a perfect coombot in less than 7B. It is just a matter of cutting out the useless shit.
>>
>>102491228
So you know what to tell the GPU to do.
>>102491205
No. You need to be better at applying the math.
And if you thought there was something extra but unknown how would the people teaching you know? Then it wouldn't be new. If you want that just start reading random math books (this isn't a bad idea btw, I used to do this all the time before I became cynical and jaded.)
>>
>>102491619
I still haven't gotten around to trying OpenELM. Has anyone else? I think support got merged into llama.cpp.
>>
>There is so much useless knowledge in those models
>t. spends every day on a forum dedicated to LLMs
>still a coomer who doesn't know how anything works
>just cranks his dick to /gif/ and sillytavern all day
>>
>finally figured out how to completely remove repetition using rep pen and DRY
>suddenly, all my mixtral variants push plots forward, have far more elegant prose, and not a single spine shiver
IT WAS THAT EASY?? FUCK
>>
>>102491663
You didn't know about repetition penalty and went so far as to come here for help before trying it? How do you manage to dress yourself?
>>
>>102491663
>not a single spine shiver
that's not how rep pen and DRY works, pierre. stop shilling your shit 12B slop.
>>
>>102491663
What's your settings?
>>
>>102491663
Share settings plox, also, where's the Dry dial in openwebui? I cant find it
>>
Qwen2.5 is such a piece of shit model, holy fuck, how could anyone use that shit.
>>
>>102490086
Isn't Nemo a 12b model? You're thinking Mistral Small 22b.

Qwen2.5 is amazing. I've heard people speak of refusals, but I haven't encountered any so far on 32b.
>>
>>102491584
Tinfoil hat: there's one dataset that slops your models tf up but every epoch on it boosts your mmlu by 20%
>>
>>102491724
Yeah is dogshit, you are better off using anything else.
>>
>>102491677
it's absolutely true, but i make my own characters and don't share logs so you have to take my word for it
>that's not how DRY works
i don't know what you're talking about, i read the pull request, and the person that made the DRY sampler says that's literally how it works
>>102491676
i knew about basic repetition penalty for months, but i had been using it wrong, because the ST devs can't be bothers to add context docs in most of the samplers, so i had to go digging into full docs and fucking reddit posts for how it actually works
yeah, the principle of "apply X penalty to any tokens to the last Y tokens" seemed obvious in hindsight, but putting penalty as high as 1.08 led to occasional incoherence, and any higher was gibberish, so i just thought i'd never be able to use it
>>
>>102491724
They don't. Those are trolls.
>>
>>102491756
What are your settings?
>>
File: news.png (1.2 MB, 1082x768)
1.2 MB
1.2 MB PNG
>>102491592
maybe it's good for stuff that a functional member of society would use but i want to goon
>>
>>102491694
>>102491767
pic rel, no point in sharing catbox json, since i've changed nothing else
also, my mixtral tune uses alpaca system prompt, and i just wrote a basic 3 sentence one stating its a roleplay, and the desired length. all of my lewd shit is in my char defs
>>102491711
i use ST+tabby, look at your own docs, because idk, sorry
>>
File: bait.png (238 KB, 540x540)
238 KB
238 KB PNG
>i read the pull request, and the person that made the DRY sampler says that's literally how it works
lol
since other people are taking the bait I'll give the retard explanation: DRY attempts to prevent shivers from showing up multiple times. it has to show up AT LEAST once in order to deprioritize it, similar to but more effective than rep pen.
>>
>>102491823
I see, so you've cranked rep penalty up high, but reduced the rep penalty range. Interesting. I'll give it a try.
>>
File: 1726945087728.jpg (126 KB, 626x999)
126 KB
126 KB JPG
So this is the power of closed LLMs
>>
>>102491883
problem, western man?
>>
>>102491849
Sounds great but Mistral models are repetitive on the paragraph level, not just phrases. DRY don't work here
>>
>>102491823
>temp 1.3 to 5
>top k 0
temp 5 top k 3 guy has competition now
>>
>>102491823
>temp 3.26
??? what nuts bowl sits on top of perch shivers down the spine while chair 习近平 ding dong die
>>
>>102491849
correct, and the allowed length is how many tokens its looking backwards for repeated phrases in the context, and if it finds a match, it discards the current token and tries again
>>
>>102491813
I use it for my ERP characters too and it's fine it just has a very short context.
>>
https://www.reddit.com/r/StableDiffusion/comments/1fm9pxa/joycaption_free_open_uncensored_vlm_alpha_one/
New JoyCaption model. I dunno how many people care about this, but I've been using the pre-alpha version as part of a multi-model workflow to caption thousands of images for training Flux loras. So I'm super excited about this, gonna be playing around with it today and doing side-by-side comparisons with the pre-alpha.
>>
>>102491901
Looks like he has dynamic temperature turned off though.
>>
>>102491901
i'm not using dynatemp
i experimented with it, but i wasn't getting the results i wanted and went with neutralizing samplers and starting over
the box is clearly not checked
>>
how can I vectorize black and white symbols? remove the white background
>>
>>102491883
trash in - trash out :^)
>>
>>102480754
>>102480814
Is this a new restriction in 4chan?
>>
>>102491903
just trust me, it works
>>
disable slider limits
temperature 10
top k 1
min p 0.5
standard DRY
you can thank me later
>>
Mistral models sees a concept appear twice and spends one paragraph of every reply from then on to rephrase that concept. How do you even fix this?
>>
>rephrase
shit in my experience mistral models just straight up repeat the sentence verbatim
>>
>>102491971
With topK 1 does anything else even matter?
>>
>>102492003
It's because I had some kinda rep pen on
>>
>>102491975
>>102492003
That does happen a lot, yeah.
Try the temp 5 topk 3 minp 0.1 meme settings and see if that adds some variety without making it stupid.
For RP at least it should work "fine".
>>
>>102491971
Even if you put temp first, i don't think temp can ever change the order of the tokens to sample. And if temp goes last, it does absolutely nothing with a single token. And if you have top-k 1 before min-p, min-p has nothing to work with either. Even the other way around min-p does absolutely nothing.
>>
>>102491661
Is this a bot?
>>
File: sloptuners.png (63 KB, 680x1483)
63 KB
63 KB PNG
>create synthetic dataset using cloud API
>finetune shitty research model with dataset
>research model is now substantially dumber than before
>still worse than cloud API in every way
have you gone to the Kobold Discord to thank a finetuner today?
>>
>>102483044
I overlooked Gemma, assuming it would be censored to hell because of Google. Is 27b Gemma really better than 32 Qwen?
>>
>>102492191
Gemma writes well but it's cucked to 8k ctx
>>
File: 1726836595916437.png (186 KB, 1873x554)
186 KB
186 KB PNG
https://docs.novelai.net/text/Editor/slidersettings.html#Unified
Thoughts?
>>
>>102492169
All their finetuning can do is change style. For cooming it may be okay, but if you don't want to RP in claude's default style, they are pretty useless. Claude can do more than one style, you know.
>>
>>102492241
unfathomably based and good for the local LLM crowd
>>
>>102491956
https://www.photopea.com/
layer --> new adjustment layer --> threshold
layer --> flatten image
right click layer in layer panel on right --> blending options --> pull right arrow on "current layer" to anything below 255 --> OK
right click layer in layer panel on right again --> rasterize layer style
image --> vectorize layer --> colors 1 --> OK
file --> export as --> svg
>>
>>102492318
>>102492241
NAI are pieces of shits that literally spam forums with their garbage.
>>
>>102492527
why are you so obsessed? you sound like b*rn*yf*g
>>
>>102492527
This. So much this.
>>
>>102492577
Uhmm.. can we unpack this, y'alls?
>>
Is it possible to pre-tokenize prompts when running batched inference in vllm?

I.e., I’m going run the same prompt through multiple times with different system prompts, and I’m trying to reduce the computational costs. Or am I going about this all the wrong way?
>>
>>102490777
enough of us
"don't be poor" falls under skill issues
>>
File: literatedog.jpg (42 KB, 640x640)
42 KB
42 KB JPG
>>102492254
>All their finetuning can do is change style
They hardly manage to do that even. One thing Nemo is really good at is bilingual conversation; I was able to hold a chat with Nemo in English + Japanese with almost no errors or misunderstandings in the outputs. Yet none of the Nemo finetunes can do that, and they still talk exactly like Nemo but add degenerate coomer words like "cunny" and "obscene squelching" to sentences where they don't belong. Sloptuners lobotomize the fuck out of these models with approximately no benefit.
>>
>>102492639
With OAI API, no. Tokenization is pretty much free anyway. You want the cache, and it's on and just works by default. If you want to make sure it's working, prepend a random number to the very beginning of each your request and see performance worsen by a lot.
>>
>>102492639
Not sure that can be done. On llama.cpp, for example, you can cache a prompt and run it multiple times almost instantly, but since the system prompt goes before the prompt, the whole thing would need to be reprocessed again. Or more succinctly, You can only cache a common prefix. If vllm has caching, i'd assume it works the same way.
>>
>>102480672
Anons, everyone is saying qwen is shit for RP, but what about general purpose tasks, like classifying and summarizing text, including with "objectionable" content?
Is there a 4-bit GGUF quant yet?
>>
>>102492706
>everyone is saying qwen is shit for RP
lol no just a few mistral shills and retards who haven't tried the model
>>
>>102492695
>>102492698
Thank you for answering my question
>>
>>102492706
Any model is better than that trash.
>>
>>102492712
What size do people run then? I only see like 7B and 72B, but not quants. Every 7B model I've ever seen have been fast but utterly retarded.
I fed mistral 7B with so much to fill up its 128k context and still it was retarded and started breaking my expected output format.
>>
>>102492731
I'm currently using Llama-3.1 70B 4-bit. It's doing my classification tasks well. I'm always looking to improve though. Otherwise I'd still be on GPT-J-6B or markov chains.
>>
>>102492733
72B is great if you can run it otherwise 32B or 14B, they released almost every size anyone could ask for just look at their hugginface page.
>>
>>102492759
I can do 72B 4-bit or 32B prob in fp8 or 16, it's just weird I can't find quants, not even from TheBloke. I feel like I'm not searching right.
>>
>>102492706
>Anons, everyone is saying qwen is shit for RP
>including with "objectionable" content?
Being bad at one would make it bad at the other. Subjects overlap. But i don't know. Why don't you try it yourself?
>Is there a 4-bit GGUF quant yet?
yes. huggingface.co. Pretty new site to upload files. It seems some people are using it to upload language models, among other things.
>>
>>102492771
You are not searching right, grandpa.
>>
>>102492771
>not even from TheBloke
The bloke hasn't been active since january.
Look bartowski or quant cartel.
>>
File: xhs5WpbkpD.png (77 KB, 1071x290)
77 KB
77 KB PNG
>>102492771
>just look at their hugginface page
>just look at their hugginface page
>just look at their hugginface page
>>
>>102492684
I'm pretty new to this, but Nemo finetunes seem like complete shit. I can pretty much predict what the characters are going to say. Perhaps I should give the base model a try.
>>
>>102492813
No gguf quants
>>102492799
Thanks, bartowski has a 4-bit instruct
>>
>>102492880
>No gguf quants
Nigga, are you blind? Top right.
>>
>>102492880
>No gguf quants
The screenshot anon posted has gguf as the first item of the second column.

>Thanks
You are welcome.
You can often just search for
>model name GGUF
in the huggingface's search bar and find something.
Do be aware that people can fuck quants up, so keep an eye for that (look into the --check-tensors argument for llama.cpp).
>>
File: qwen.png (123 KB, 1280x539)
123 KB
123 KB PNG
>>102492880
You'll struggle your entire life not understanding what's going on around you.
>>
>No gguf quants
lol okay I'm done being helpful in /lmg/ this year. it's nothing but shitposting and trolling now. you people are fucking retarded.
>>
>>102492952
>>102492945
gguf is a file format (with some tranny jizz mixed in)
Quants are requantized versions of the model.
Not all ggufs are the same quant. For my vram, 48GB, I need 4-bit quants
>>
File: smKuqAehF1.png (40 KB, 428x435)
40 KB
40 KB PNG
yeah if only you were shown where to find 4-bit gguf quants the first time you asked
these anons are fucking dumb, huh?
oh wait
>>
>>102492993
ur niggerlicious
>>
File: file.png (12 KB, 665x45)
12 KB
12 KB PNG
pissing me off
>>102493018
>>102493018
>>102493018
>>
File: 05-17.jpg (100 KB, 1319x1029)
100 KB
100 KB JPG
Hey guys I'm new here, could someone point me to some resources to getting started? Also, is there an official /lmg/ card I can test with once I get everything running? Sorry if this listed in plain text somewhere on the thread, I'm just looking to be spoonfed links. Thanks!
>>
File: op.png (385 KB, 1354x842)
385 KB
385 KB PNG
>>102493138
If only we had some resources...
>>
File: mmlu_vs_quants.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>102493138
Read the OP.
For an easy entry, koboldcpp + mistral-nemo-instruct gguf. Get the version (quants) that is smaller than your vram by about 15%, enable flash attention in koboldcpp, and set your context size to 8192.
Then start messing with things. Different models, different context sizes, different quants, etc.
>>
>>102493138
>could someone point me to some resources to getting started?
https://ollama.com/download
>official /lmg/ card
no current on the market is worth shelling money out, the official card to test with is whatever NVIDIA GPU you have that isn't a decade old
>>
>>102491066
consider using more than one column
>>
>>102493186
>no current on the market is worth shelling money out
I assumed he meant a character card, based on "once I get everything running".
>>
>>102491907
allowed_length is the number of tokens that can be repeated before a penalty is applied. DRY actually looks for repetition in all the context
>>
>>102491823
a quick update to this: neutralize presence penalty, or the model goes mildly schizo several replies in and starts dropping articles in front of nouns and starts talking like a caveman



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.