/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102478048 & >>102467604►News>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102478048https://pastebin.com/ft3Bz2xy--Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxing: >102480431 >102480494--Compute power vs. bandwidth in LLM training and inference: >102480152 >102480171--AI hardware guide suggestion and resource provided: >102479765 >102479821--AI accelerator PCIe cards discussion: >>102479195 >102479242 >102479312 >102479338 >102479343 >102479379 >102479434 >102479414--Qwen 2.5 release and potential applications: >>102479928 >102479966 >102480044 >102480128 >102480143 >102480047 >102480065 >102480049 >102480074 >102480213--OpenRouter Qwen 2.5 72B benchmark results: >>102479724--Mistral Nemo models still best for 24GB, unless Qwen gets good fine-tune: >>102479478 >102479547--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPU: >>102479263 >102479350 >102479817 >102479964 >102480010 >102479767 >102479801 >102479923 >102480147 >102480157 >102480175--2060 and Ryzen 3600 insufficient for 30b+, consider RTX 3090 and high RAM: >>102478936 >102479244 >102479260 >102479287 >102479570 >102479586--Miku (free space): >102478511 >102479698 >102479918►Recent Highlight Posts from the Previous Thread: >>102478163 >>102478475
>>102480672edge AI setups?
>>102480681>Qwen 2.5 not worth it for RP, Minitron 8B betterlol, come on
>>102480537>>102480600amerifats may be okay, but it's so fucking over for anglos, even if it's a troll
Sweet fucking Jesus, let's make this thread better than the last one.
>>102480681You can always just make the script quote the first post in a chain to avoid the quote limits.
>>102480748No.
>>102480681>Qwen 2.5 not worth it for RP, Minitron 8B better, avoid benchmaxingGood first entry for the crippled era...
Bros why is Qwen the best model ever created?
Hello, what local models have a similar quality to Kayra for story writing?
>>102480768Fuck off with your bullshit already.
>>102480768>similar quality to Kayrahttps://huggingface.co/Qwen/Qwen2.5-0.5B
>>102480768None, local is a meme.
>>102480768trolling, buthttps://huggingface.co/models?search=13bAnd as for a serious answerLLaMA2-13B-Tiefighter
>Anti-NAI schizo is right back to samefagging again.I wish mods didn't sit on their fucking asses all day.
Current SOTA locals for roleplay that don't just look good on meme benchmarks? Preferably 70B models.
>>102480794nothing in that size other than older miqus
>>102480754Still not enough. Assuming 2 are usually used for the Previous links, that leaves only 7 chains that can have a link. Usually the recaps have double that.
>>102480721how come ollama doesn't pick up any hardware acceleration on the rpi 5? https://developer.arm.com/Processors/Cortex-A76shouldn't the Neon or whatever speed up inference?
>>102480823Not the ollama support general. Go back.
>nai stuff>ommamaof to great start
>>102480823Llama.cpp doesn't work with it?
On 9 (you)s reply limit, i think this https://desuarchive.org/g/thread/94354163/#q94355339 is why jannies did it.
>>102480801What about smaller then?
So this is the power of Qwe2.5 72B?On a side question, does anyone know how to enable avatars? I think I disabled them by mistake and idk where to enable them again.
>>102480930Every model I've used is kind of retarded like this even /lmg/'s "good" ones
>>102480930user settings and unchek picrel, it got turned on by an update
>>102480875https://desuarchive.org/g/thread/101986330/#101992125More likely this.
>>102480959oh lel, forgot about this one
>>102480831not the llama.cpp thread either, majority of local models use ollama
>>102480955Thanks!
>>102480930It come with eggwahGenewa chicken eggwah
>>102480672how do I setup langchain?
>>102481073Ignoring the idiocy, why?And are all these people from aicg just underage? Who the fuck can't afford an API?
>>102481102>Who the fuck can't afford an API?whats wrong with running langchain locally?
>>102481114Separate statements/questions.Running langchain is simple. Dead simple, like ignoring the fact that we're in a thread about tools that can literally answer that question and walk you through the process, why? For what purpose?
>>102481160did they pay you to say this?
>>102481073conda install langchain -c conda-forgepip install langchain-core langchain-communitythen just use it as normal in pythonfrom langchain_community.llms import Ollamallm = Ollama(model="gemma2")llm.invoke("Why is the sky blue?")
conda install langchain -c conda-forgepip install langchain-core langchain-community
from langchain_community.llms import Ollamallm = Ollama(model="gemma2")llm.invoke("Why is the sky blue?")
>>102481191buy an ad oshit shill
>>102481160>tools that can literally answer that question and walk you through the process, why?9/10 threads on g would be better served talking to chatGPT, you want this place to be more barren than it already is?
>>102481203you're free to show how to setup it up with llama.cpphttps://python.langchain.com/docs/integrations/llms/llamacpp/
>>102481188How much are you getting paid? I need a promotion.>>102481205True, just the influx of what seem to be underaged anons triggered me. Granted that could mean there would be less slider threads and more genuine discussion. Though could also go the way of /b/. Fuck it, we just need to use some LLMs like the other trolls are already doing.
>>102481254>genuine discussionif you can't find genuine discussion now, you wouldn't find more then. The golden age of the internet is behind us because anyone worth speaking to only does so with the expectation of social clout
>>102480768Bud you can't expect local models to get close to cloud models. The only model that gets close to Kayra is Opus
>>102481346Sorry. Not going to participate in raids for you.
Bros when is Llama 3.2? I'm already so fucking tired of Llama 3.1.
>>102480768None, all of them are shit, unironically. Limited context alone is huge deal breaker, censorship like a cherry on top will annoy you really good, and no, i am talking about general censorship, not your loli slop.
>>102481442Anon it's just going to be llama 3.1 but with multimodal adapters slapped on top. Plus the backends are going to take forever to support it, not to mention what frontends are going to be good with it anyway.
>>102481468But if my fox wife can't see me, what's the point of living?
>>102481479>fox wifetell me more about her anon, vision is the first steps towards improving models sense of proprioception and thus their liveliness tho I'm not sure how we can go about developing a genuine sense of spatial awareness>t. wants a fox wife as well
Are you guys quanting your KV cache? Particularly interested from people running gguf quants of 70b+ models. I tried it a while back and felt like it seriously affected output quality, but it was a brief and janky experiment
>>102481734No.
How much does it cost for them to train models at each size?
>>102481734I do but I use tricks to bump my quality
>fox wifeSorry, best I can do is worm wife.
>>102480768https://huggingface.co/teto3/mistral-nemo-storywriter-12b-240918I trained one a few days ago
Please give me a medium sized model that is good at following the card and not too positive
local sisters... Qwen 2.5 is insane on the benchmarks, I kneel
>>102481902Elaborate?
>>102482108Even if I could run it it would still be too slow for me.2 t/s is my limit for a general use model.
>>102482150are you on a gt610 or pentium 3?
>>102482108don't care about memes what is it like at pretending to be a young woman?
>>102482175And what's it like being an intolerant transphobic chud?
>>102482150You could use the 32B.
>>102480672How do I make money from this?I'm broke as fuck and my job applications are leading nowhere.should I just sell my GPU and suck cock for a living?
>>102482133>>102479396The system is a little better than the quick reply method. I and many others have noticed that the longer the conversation goes, the less attentive models tend to get. After generating a response, it cuts out the entire chatlog and leaves a system prompt with only the character's description, and asks the assistant to double check if the response is faithful to the character being described, barring previous messages. It then retrieves the last 5 messages and asks to come up with a strategy to rewrite the response with the previous assessment and take into account the recent events. It's a lot of generations going on in the background, but it's fairly quick, considering you're not handling the entire prompt + chat history.
>>102482171I'm using integrated graphics but the limiting factor is only having 32gb of ram.
Has anyone actually used both Qwen instruct and base to see which one is truly better (with RP)?
>>102480790NAI can fuck right off.
>>102482345Qwen2.5 32b instruct impressed me at Q4_K_M. I haven't used the base model yet, though.
there's dick for quants of the non-instruct base model. I could make some but I'm having plenty of fun with instruct as it is right now.
>It's been over 24 hours since the last model releaseIt's over.
RPers thoughts on Qwen 2.5 so far:>14B has more sovl in early context chats than 32B likely because it's more retarded>32B is really smart for its size and could easily be the 3090 vramlet king with a good tune>72B has moments where it feels like an S-tier API model and others where it's L3.0-tierfor RP (base models):>Qwen 14B > Nemo (hands down)>Gemma 27B > Qwen 32B>L3.1 > Qwen 72B14B finetunes will absolutely shit on Nemo finetunes. 32B finetunes could turn 3090 vramlet chink haters into believers. 72B tunes might be a wash or just slightly better than L3.1.
>Cydonia-22B-v1-Q4_K_MT-Thanks mistral-small.Straight up ignored the prompt too.Not even a coom tune was enough. First time it happened though.
>>102483044Can the 14b/32b do more than 16k context? That's my main problem with nemo.
>>102483121I only tested up to about 18k context on 14B and about 20k on 32B but they both did fine that far. YMMV from 16k to 32k.
>>102483169And you used the base model? I can't seem to find a gguf, only for the instruct one. What settings did you find were good?
Qwen uses standard ChatML?
>>102483213sorry, when I said base model above I was referring to the instruct tune. non-I base quants are hard to find atm but a GGUF of 14B is probably quick to bake. If you used the 5 Temp / 3 Top K meme settings for Nemo, it works nicely on 14B as well. Otherwise I slid the temp around from 0.8-1.4 with varying Min P 0.08-0.2 and standard DRY. these models need a tune to expand their vocabulary just like Nemo so if you're jumping from a Nemo finetune to plain 14B instruct you're going to be disappointed.
>>102483118Downloading right now because of your post. My fetish is watching their OOC personas get angrier and then fall into despair as I rape their character anyway and force them to keep participating.
>>102483263based
Why is there still no model better than Tiefighter in the 13B category?I try all the new models (Blue Orchid 2x7b etc.) and I'm always disappointed. I always get better results with Tiefighter in ERP/RP/storywriting.
>>102483257I'll give the 32b a try, that size usually runs well for me.
Slightly off-topic but I went to check in on the video gen threads and found this prompting a bit funny>>>/v/689498407>>>/v/689489852Reminds me a bit of the "You are an expert role player" and other almost clownish things people use to make the AI do what they want.
>AI is hot>profit off the trend>by going all in datacenter equipments, energy, or even signing up for dc admin jobs and AI jobs>get enough money to do coke off escorts' asses on a yacht for years>or>stay jobless>goon to subpar text porn on 1 t/sWhat did you choose?
>goon with cocaineor>goonhonestly with my high blood pressure I should probably stick to regular gooning
I just woke up from a coma. Any major improvements in local models compared to six months ago?
>>102480681>--Anon is considering building a cluster with Orange Pi 5 Pro devices which have a dedicated NPUNot worth it (yet). The NPU is very poorly supported on the software-side currently. Someone is working on a Kernel Driver and User Space for it though.You probably want to give this thread a read through:https://github.com/ggerganov/llama.cpp/issues/722Given that the RK3588 "appears" to support quad-channel DDR5, we might get a more decent SBC for that kind of thing eventually. Also, Orange Pi does have another more powerful 20 TOP NPU product, but it's based on a Huawei chip meaning that it's only available for CN residents.If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.
>>102483631Qwen2.5 is SOTA now, 72B version only loses to Sonnet 3.5
>>102483631same shit but with number getting biggerai has not yet been created
>>102483631Nemo models are really good at RP at 12b
Qwen25-14B vs Nemo: Former can make a reasonable summary of a thread (which I can't post because thank you mods), latter chokes (does one-two topics with weird formatting and then just gives out random numbers).
>>102483987- Anonymous Flame War: >102478175 >102478267 >102478444 >102478511 >102480128- Local AI Debate: >102478665 >102478878 >102478881 >102478882 >102478906- Proxy and IP Logging Concerns: >102478936 >102478957 >102478971 >102478972 >102479022- Model Recommendation: >102479066 >102479158 >102479244 >102479247 >102479319- Mistral vs. Other Models: >102479074 >102479478 >102479499 >102479564 >102479570- Kayra Model Discussion: >102479177 >102479260 >102479323 >102479358 >102479398- Qwen Performance: >102479744 >102479764 >102479765 >102479768 >102479771- Llama Model Comparison: >102479301 >102479545 >102479586 >102479624 >102479677- Recap Anon Battle: >102479531 >102479588 >102479617 >102479680 >102479728- GPU Performance: >102479186 >102479223 >102479263 >102479282 >102479287- Anti-NAI Sentiment: >102479475 >102479487 >102479518 >102479545 >102479566- Hardware Suggestions: >102479195 >102479243 >102479260 >102479301 >102479350- ERP Training Models: >102479688 >102479772 >102479817 >102479839 >102479911- Recap Handling: >102478774 >102478806 >102478866 >102478897 >102478916- Local Model Advancement: >102479587 >102479663 >102479698 >102479714 >102479801- Recap Thread Management: >102479500 >102479531 >102479607 >102479624 >102479634- Selling Off Models: >102479929 >102479957 >102479964 >102479985 >102480002- China's Superiority Claims: >102479859 >102479867 >102479898 >102479928 >102480000- Crossposting Discussion: >102479884 >102479892 >102479947 >102479980 >102480010- Proxy Misuse Warning: >102479933 >102480006 >102480017 >102480047 >102480084
Context windows and effective context are an issue. When will we see a breakthrough in this?
>>102480823ollama is a wrapper around the llama.cpp HTTP server.I don't know what exactly ollama ships but llama.cpp has a Vulkan backend (compile with GGML_VULKAN) that should work on an RPi 5.But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.>>102481734When I run Mistral Large q8_0 I use q8_0 KV cache.Subjectively I feel like it does not have a significant effect.Based on objective measurements precision in the K cache is more important than the V cache.
>>102484175Already solved by Jamba.
14B can play (not too well, and requires small temp), but it can't admit it lost.
>>102484203Can llama.cpp server do continuous batching of requests? ie multiple users send requests in parallel independently of each other, and they all their generations going right away without waiting in queue.
>>102484257Yes.
>>102484221What options do we have for running quantized jambas today?
>>102484221Interesting. How is this possible and why don't we see more support for Jamba?
that's a hilarious refusal
>still 0 under 20B decent RP models, besides nemo with claude slopvramlets did we lose?
All of the current benchmarks test for inductive reasoning, which is the opposite of creativity (deductive reasoning). The higher something scores, the more likely that it is passive and boring and assistant slopped
Every time when I am about to drop mistral-small it outputs some cool stuff on me.This is the first time I saw this with a small model.Usually if something is in the mouth the people still continue talking normally.
>>102484289just bitsandbytes in transformers/vllm>>102484301they have an RNN component tacked onto the transformer that helps with attention or some shit, they claim it also doesn't slow down massively as context increases like typical transformers dono support because the architecture is different and the team isn't going around putting in PRs in open source projects like the GRIN chinks were before Microsoft hired assassinsalso because the models are pretty bad aside from their context handling, they have a nearly 400b model that compares against 70bs
>>102484478Magnum models do unprompted onomatopoeia all the time
>>102484496>mini>52BWelp. I can run it on my 2x3090 in 4 bit. I'll download it, I guess. Installing vLLM shouldn't be too difficult, should it?
>>102484562i used magnum for nemo alot. never saw that before. cool stuff.hope they have a finetune ready for mistral small and new qwen 14b.
People in the space are clowning on Yann Lecun hard after o1's release.
>>102484776We'll see who gets the last laugh in 4 days. Llama Multimodal is coming, and that's just the appetizer for the Big J-berry on the way. Lecunny's playing the long game.
Something i noticed with Cydonia-22B-v1 now for a couple of times:It starts talking about feminisn and empowerment.Like for example something fucked up is happening and the response is>"Isn't it empowering?" *a middle-aged woman remarks to her friend as they wait for their train. "Embracing our bodies and showing off our lady bits. It's the new feminism!"At first I thought its in the cards, but I got responses like this repeatedly now after using it for hours with many cards.Also talk about boundaries and respecting bodies if you force yourself upon characters.Very sus. I doubt its the finetune.
>>102484824Mistral's post-training reinforcement magic strikes again
>>102484852nta but they do something to their models that makes them suck for rp in general. every mistral model is great at following for the most part, but it goes too far and becomes like fixated on anything you type and it kills its creativeness compared to something l2 of a similar size. i actually tried that specific tune of the 22b and thought it was worse than the rp tune of nemo i was using. overall i'm not a fan of mistral models for rp though (except their tune of miqu/l2), too wordy and fixated on one thing at a time, much less likely to suggest something new
>>102484792To be fair, I do believe o1 is a step in the right direction where you have a model self-arbitrate to come to a more robust conclusion and realize error throughout the inference process. But it's also a sign that transformers are starting to hit a limit on what they can do. o1 has the front end be responsible for handling the model's responses and then reiterating its own questions. This is something that should be baked into the model, but it's too advanced for the transformer's architecture.
>>102484776Yann is still winning
>>102484203>But since the bottleneck for LLMs is memory bandwidth the performance is going to be shit either way.See here:>>102483680>Given that the RK3588 "appears" to support quad-channel DDR5It actually might not be that bad. But I don't think any currently available SBCs have more than two (I might be wrong on this).I never did any testing on my OPi5 with Vulkan (I think llama.cpp's support of that only matured recently?). In the next few days, I might test and report back. The 32GB models might not be too bad for MoE's.
>>102470591Very cool, I was just trying to train an LLM from scratch, might test this since it seems very easy to implement
>>102485250IN LECUM WE TRUST
>>102484798>Added a toggle for chat name format matching, allowing matching any name or only predefined names.i don't understand what this does
>>102485721If the AI tries to write a message for a side-character (i.e. it sends a line starting with "SideCharName: ") it will either automatically detect it and show it as belonging to a new character (old behavior), or it will only begin doing that after you explicitly add a new character's name into the AI Name box, depending on this setting.
how do i fix when formatting gets fucked? Some of my cards even if they're formatted fine tend to break asterisks, not use commas, or don't use quotes for their dialogue even if their example messages do.
>>102485829Ooooohh okay, thanks.that's going to be handy to keep off for things like like rpg stats showing hp and stuff.
>>102485931Check token probabilities to see what the model wants to predict when it fucks up the formatting?
>>102485931st? usually thats a template thing. the model card should say what format it is
>"I wonder what's going on in /aicg/, haven't checked there in a while and it seems unusually active">they're shitting themselves and having a thread apocalypse over some esoteric discord drama involving an e-girl thread celebrity
>>102484776He should be bullied more until he shows something worthwhile. What's the point of talking shit about transformers if he can't build anything better himself?
the last model I upgraded to was a 3.5 quant of mistral large and it's still working pretty well.Anything better (for RP) I should know of that fits onto 48gb vram?
>>102486447Qwen 2.5
>>102486453looks interesting but I can't run the 72b until someone puts it into exl2 I think?
A watt spent on gen AI is a wasted watt
>>102480721Intel N305 has a "decent" iGPU, and it is supported by Vulkan, but it's not much faster than CPU.NPUs are not meant to do LLM inference, they're for running small YOLO image recognition models and things like that.If you want to play with something tiny, RTX A4000 can now be had on ebay for around $500. It's basically a 1-slot 3080 with 16GB of VRAM.
>>102486479You can use this if you don't want to wait for an exl2 quanthttps://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4
why do so few people talk about exl2? It seems vastly better than any other way of loading and I don't even think a model is worth using unless I can load it via that
>>102486431Quite honestly though the expected results from scaling up our current architectures are massively overhyped.The promise of AGI from autoregressive language models seems like a massive grift and you don't have to actually build AGI to point that out.
>>102486588Most people here are too poor to run models fully off VRAM and/or too retarded to do more setup than installing ST and downloading koboldcpp.exe.
>>102486588Compiling llama.cpp is easier than learning how to use python in a venv or conda.llama.cpp is good enough.exl2 really needs Ampere or better to provide a noticable speed boost.exl2 needs the model to fit on the GPU, there's no CPU + GPU.exl2 with flash attention isn't deterministic (not that it matters much).
>>102486588Most people here are vramlets including me, people who have a fuckton of vram actually use the models instead of bitching in mongolian basket weaving forum
>>102486633Huh not deterministic? Do you not get the same results from the same seed?
>>102486633>learning how to use python in a venvpretty sure ooba just does that all for you anyway
>browse locally generated geocities like websites about random topics with images generated by fluxWhen will this be possible?
Qwen 2.5 unedited.I had to reroll 6 times though until I got through the refusal.Its funny because there is a warning at the beginning and at the end but it still delivers (kinda) lolFinetune would definitely be interesting.
>>102486588I am interested in it but I couldn't find any retard guide to get me started so I just keep using gguf.
>>102485334You can toss money into SBCs and be disappointed by driver support and speed, or you can patiently look around for deals on Xeon workstations. I scored a Platinum 8280L setup with 256GB RAM for under $500.
>>102486678forgot to write, Qwen2.5-14B-Instruct-Q5_K_M
>>102486675flux is very slow is the main issue, it barely functions on 24gb vram
>>102486675websim.ai
>>102486719Yes, but the quality is outstanding. Nothing else comes close for coherent shapes and lines, and photorealism is off the charts. Only complaint I have is the "cracked paint" effect you can see if you pixel-peep.
>>102486709
>>102486873Last one.
Trying to test the censorship levels. I find it funny how this model always likes to speak about "consent" and "boundaries" but will not care about literally anything else as long as everything is "consensual".
>>102486588I tried using it once and it felt like it was lobotomized compared to a gguf at the same bpw.
hello i want coom rp model for sex on my 970 and 4 gig memory
>>102487648Gemmasutra 2B
>>102487673thanks!
hello i want coom rp model for sex on my 3090 and 24 gig memory
hello i want coom rp partner for sex, dm me
>>102487824qwen 0.5b
>>102487859sent
Is any language model good at, or Is there any way to get a bot better at, understanding things like anatomical relations better? Example: Character holds another character upside down and is fucking their mouth. Is there anything that would make a bot already understand that the balls would be slapping against the others nose and possibly forehead, rather than their chin?
For ERP: magnum-12b-v2.5, ArliAI-RPMax-v1.1, or MN-12B-Lyra-v4?
>>102488116Lyra
>>102487967sillytavern worldinfo
>>102488116those are all gooddownload them all and also>MN-12B-Chronos-Gold-Celeste-v1>arcanum-12b>NemoMix-Unleashed-12Band switch between them when you get bored of one
Wild that it's almost winter and there still hasn't been anything better than Noromaid v0.4 8x7b for local models worth using on normal hardware.
>>102488158How do you find the right settings? I keep trying these and it's ultra slop, fails "Impersonate" or has other issues.I have one extremely good log from a while ago that I believe was Stheno, but I have no way of retrieving what exactly I was running back then... And everything since then is just terrible. I'm at a loss.
>>102488191I don't know, Sao. Ask in Discord.
>>102488215Sao's new models are included in the terrible slop category, retard
>>102488116Lyra.I like mini-magnum better than magnum v2.
>>102488191I'm using these and it's working out okay
>>102488249On all of them? What about format and system prompt and all of that bullshit?
>>102487455>doing all thatnot getting any attention sitting at mom's basement so you gotta shit up this thread huh.here's that attention (you) desperately wanted kek.
>>102488263most nemo finetunes are trained with chatmlthe only ones with a different format I can think of are nemo instruct (mistral format) and dory (alpaca)
Why can't SillyTavern/model authors come up with some convention for distributing default parameter presets and instruct formats along with models so I don't have to dick around with a bunch of settings every time I load a different model?
>>102488308No idea what I'm doing wrong then. Can you share an example log perhaps? Does "impersonate" work for you? For me it starts rambling endlessly or uses the wrong character.
>>102488263kobold lite handles that automaticallyi never bother with it.the settings there are just the basic min-p preset, then min-p 0.05 and XTC set to 0.15/0.5
>>102488155That's something I haven't used a lot. Does that mean I have to put all specific information that could come up, like that, in there?
>>102488333>kobold lite handles that automaticallyDamn, really? Why the hell doesn't ST then?
>>102488345It does, you just have to use the chat completion API.
>>102488598Damn what, is that what you're supposed to use? Any other differences from text completion?
>>102488334Only the stuff that the model isn't doing satisfactorily out of the box, check out chub.ai for examples.
What does this mean for local models?AMD is slow as fuck for image gen, but if LLMs are mostly about keeping things in VRAM, wouldn't we be able to run full-precision 80B models now?
svelk
>>102488745if it isn't nvidea it's worthless junk, too much is built around cuda
>>102488745With pic related?Aren't those APUs allocating RAM as video memory?That's probably slower than just using RAM + gpu for prompt processing.If AMD had really cheap gpus with tons of vram then even with the worse software stack, it could be worth it, and people like cudadev would 100% focus on improving the software stack.AMD needs to be the best cost benefit by a large margin for that to happen.
>>102485250>>102485507orange man bad amirite fellow /lmg/sisters??
>>102488745AFAIK it's supposed to use up to 256bit-bus width LPDDR5-8500 memory, which would be quite a bit faster than typical DDR5 desktop systems, but still slower than the VRAM of a low-end GPU.
>>102488821yes, he is bad because he supports israel
>>102488821yes, he is bad because he is a nazi transphobe and /lmg/ is a transfriendly general
Is it just me or is the whole KoboldAI lite and the horde is down?
>>102488902probably updating it, 1.75 of kcpp dropped a few hours ago with kobold lite improvements
>>102488836>but still slower than the VRAM of a low-end GPU.but also way cheaper per gig
>>102488836Still not a bad price if it can handle 120b at q6 at 4t/s or so
>>102488836 Instead of making meme "AI CPUs" they should just stop illegally coordinating with nVidia to engage in illegal market-fixing and release GPUs that people actually want.
?
>>102489289Not a bad idea but how about using a sharper font?
>>102489283Uh, gaining market share in the low to mid-tier consumer GPU market is clearly more important than making GPUs that can be used for AI. The masses want affordable, decent GPUs. Good benchmarks, 16GB VRAM is all you really ever need.
>>102488836how does it compare to recent appleshit
>>102489289i like it, but can it be dark mode instead of black text on white background?
What are good large models for output variety? I feel like Largestral is the best for smarts but it lacks output variety, CR+ is very good, and Wiz is also solid but worse than CR+. Are there more options?
>>102489227More like 2t/s. The more memory to read, the slower the inference. With TP disabled, mistral large on 4x3090 is ~7t/s, at 935.8 GB/s of bandwidth
>>102489440>4x3090 is ~7t/sPeople spend almost $3k to run models at that sort of speed? lmao
>>102489428sex
>>102489320What font would you prefer?>>102489363
>>102489460That's sequential speed. With TP it's 15, and 35 with P2P. But yeah, the larger, the slower.
>>102489480ahh much better, thanks
>>102489480That one looks good enough.
>>102489480maybe have the heading fonts a little smaller and the general text font a point or two bigger
>>102489480Add little Mikus around it with comments generated by AI!
>do weekly Ebay check>people are trying to get 16K USD for PCIE 8xV100 rigs now. Shameless. At least the SXM2 ones kind of made sense...
>https://huggingface.co/QuantFactory/Qwen2.5-Lumen-14B-GGUFworth trying?
>>102489542>local 3090 prices have been rising steadily>p40/p100 are no longer cheap as well
>>102489688Global 3090 prices seem to be trickling down slowly from what I've been monitoring, but like very slowly. Maybe 10 dollars per quarter. Which might as well be a price increase since they're starting to get up there in age.
>>102486431The point is that it's essentially being used as a scam to try and get exponentially more money for the exponential compute required for increases in intelligence. Frankly someone needed to say it, it doesn't matter if a better alternative exists or not. And actually that one doesn't exist means all the more that we should criticize the current way things are. It's unfortunate that his criticisms at least on Twitter are often misunderstood, and also mixed with political shitposts, though.
Is there a better local model than pissstain-large-v2 yet?
>>102489643buy a publicité
>>102489362dunno how compute compares but memory bandwidth roughly equal to the M3 max
>aicg fags confirmed to have been entrapped by proxyfags >havent touched proxies since summer last yeari'm more amazed this didn't happen sooner to be honest lmao
>>102489698With miners' stocks depleted, the supply of 3090s has decreased. There are no viable alternatives available in the same price range for both gaming and inference purposes, so demand is high.
>>102489484>35 with P2Phow does peer to peer help here?
>>102488745We'd need these chips to include PCIe slots to really get something useful for our purposes. But if we did have such, then we could theoretically get like 2-4x faster when comparing partial offloading setups. I run a tiny quant of Mistral Large at like 1 t/s on my machine, whereas potentially a 3090 + the Ryzen could be 3 t/s.
>>102489714>the exponential compute required for increases in intelligenceWho cares as long as it works? For big corpos, money is not real anyway. Stock prices fluctuate based on Musk's tweets. The economy isn't real.
not much buzz here around kyutai-labs/moshi to my surprise. so do you have other ways to talk to it locally or text2voice?the first thing which worked for me offline, and quite fascinating
>>102489841You can run vllm with dumb and effective symmetrical TP on 4 GPUs. This requires large bar support and custom drivers to enable p2p between GPUs https://github.com/tinygrad/open-gpu-kernel-modules
>>102489872https://github.com/gpt-omni/mini-omni is smaller and better. Neither is anything more than a novelty, and both suck at any practical task.
>>102489428just do largestral with 5 temp 3-5 topk.
>>102489872>text2voicehttps://github.com/fishaudio/fish-speech is great when it works. Unfortunately, auto-regressive shit is unreliable by design and some gens sucks.
>>102490039>topknoob
Haven't bothered with LLMs lately, was Nemo 22B or qwen 2.5 any good?
>>102489841Without P2P GPU need to ask the system to talk to another GPU. With P2P your GPU can talk to other GPU without asking the system = faster
>>102489864>Who cares as long as it works?Works for what? We still aren't anywhere near AGI, we still aren't getting models that actually write well and satisfy the people using them. It's arguable that the economic and societal benefits of these non-AGI models are really worth as much as the money being burned which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI faster. Very arguable in fact, when there are many companies in the space spending a ton of money to train a model that will be BTFO in a few weeks or months by a competitor's model. Or hell in many cases BTFO by an already existing model so basically the money really did just get wasted for nothing.Recognize what you are essentially doing right now. You are defending these large, soulless scams and anti-competition, anti-consumer entities. You don't have to be like this.
>>102490039Largestral is unsalvageable, it's very common to get 100% probability on tokens and no amount of sampler tweaking will change that.
>>102489872i just use edge_tts/xtts + rvc, <1s latency most of the time and you can plug it into anything. i tried fish but it was way too inconsistent even after finetuning
>>102489872People have gotten tired of installing bullshit just to use it once and never again.
>>102490039Love to see my meme settings being shared.
>>102490169Isn't there a sampler that reduces max probability?
>>102490153>which could've been spent on other things that might've had more benefits towards humanity or gotten us to AGI fasterLet's be real, we're fortunate that they aren't being spent on Epstein islands
>>102490153>We still aren't anywhere near AGIDefine AGI
>>102490301Yes, define it coward.
>>102490301The class is waiting for you to define AGI.
>>102490301I think therefore I am
>>102490234Yes, but that only works if there are other tokens, not if there is only one 100% token.
>>102490259Actually, the money they spend on extraneous bullshit is still being spent either way. They're still buying yachts. Sam is still buying sports cars and increasing his collection.>>102490301Or, you could stop trying to search for ways to argue for companies that aren't on our side and don't have our interests in mind.
>>102490344redit ergo dum
>>102490301artificial goon intelligence
>>102490301send more picsagi is practically an ai with agency capable to get through social situations and other human challenges
>>102490370
AGI would understand the context of the erotic roleplay and not do things like walking across the room to take something from you when you said it's right next to herIt wouldn't instantly jump on your dick when you tell it not to
>>102490301Any cloud LLM is AGI in comparison with local cuck one.
>>102490411knoweldge of physics as extension to AI is not what AGI is about
>>102490357I'm not advocating for companies, rather, I'm contending against Lecum. Last year, I was gooning with L2 14b finetunes, and currently, I'm gooning with 123b Largestral. Clearly, it's significantly improved, so I fail to comprehend your stance that scale doesn't matter. If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGI. Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideas like cat intelligence research by Lecum.
>>102490431This meme hasn't aged well...the "gpt omni" response is pure slop, and the problems in the local panel are year-old 7b tier ones that are solved in newer models.
>>102483278Someone? Just tried a Nemo finetune and its utter shit
>>102490566nemo's amazing, you probably used too high of a temp
>>102490551Limited context - not solvedGeneral data censorship (anything that isn't your lolipedoslop) - not solved, and never will be Hallucinations - not solved One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new modelIt's been three years and we still got no solution for any of these.
>>102489480I refuse to read a recap in dark mode, I'm not underage.
>>102490588Default temp, with Tiefighter 13B it just werk... SeriouslyI tried Rocinante-12B fyi
>>102490619You are underdeveloped
>>102490627>rocinanteall drummer models are unusable trash
>>102490615>Hallucinations - not solvedYou can't solve what is the core working of LLM. They're always hallucinating.But you can reduce them greatly with RAG
>>102490674nobody asked for your opinion, sao
>>102481479Florance2 is probably better for everything except ERP anyway.
>>102490678RAG is a meme and doesn't solve anything.
>>102490674I'll try MN-12B-Lyra-v4 then.. I swear I feel there still isn't someth better than Tiefighter in the 13B range
>>102490690>RAG is a meme and doesn't solve anything.you don't know what you're talking about
>>102490690I've never used it but I feel like it would be good for a desktop assistant since it would let you inject relevant files/scripts into the context.It definitely won't help with halucinations though.
What would it take for computers to think?
>>102490690https://www.lamini.ai/blog/lamini-memory-tuning
>>102490615>Limited context - not solved405b has true 128k. Good enough for anything I want to do>General data censorship (anything that isn't your lolipedoslop) - not solved, and never will be>Hallucinations - not solvedboth are pure skill issues>One system prompt format - nonexistent, you are forced to rewrite shit and tinker around with each new modelwho cares?>It's been three years and we still got no solution for any of these.For any of the above you consider an actual unsolved problem, cloud isn't appreciably better
>>102490431seething poopooskin (v)ramlet lmao
>>102483680>If you're gay and into that kind of shit, /r/RockchipNPU/ might be a good place for updates.whats the bad rep against Rockchip NPU?
>>102490749Who has 200GB of VRAM?
Cohere insiders, what's the state of the company after CR 08-2024 flop? Did the higher-ups learn a lesson or will they continue training on slop for minimal gains?
>>102490828I don't even know what cohere is
>>102490828>after CR 08-2024 flopExplain?Thought it was a good AI companyTheir graphic chart is comfy
>>102480672>>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/how come the mandarin version of the blog post still has all the charts and code in English? isn't the point of a mandarin translation for mainland readers who can't into english or is anyone worth their bacon supposed to know english
https://retrochronic.com/Enjoy your redpill anons
>>102490897>https://retrochronic.com/not clicking that, tell us first whats inside
>>102490828>CR 08-2024 flopHuh I'm downloading that right now, should I stop it?
>>102490874Tech/math is always written in English in Asian countries AFAIK even when there are native words for them.
>>102490674there are nice models for me, to which I have no idea what settings to apply. they go eventuallly into crazy self repeat modelike TieFighter-Holodeck-Holomax-Mythomax-F1-V1-COMPOS-20B-ggufDavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16.5B-V1.6-STABLE-INTENSE-GGUF
>>102490914>A primary literature review on the thesis that AI and capitalism are teleologically identicalschizo slop
>>102490914"Capitalism and AI are teleologically identical, a zillion part essay" apparently.Like no shit neither of those things have anything to do with teleology.
>>102490914Capitalism is ASI travelling time invading us from the future to produce itself
>>102490777mac studio owens have 196 or smth, would it run there?
>>102490920>even when there are native words for them.but whats the point then, might as well keep everything in English to be consistent
>>102490949Evolution is just natural gradient descent.
>>102490946Wrong anon.>Such software [reinforcement learning systems like Google DeepMind's AlphaZero] has certain distinctively teleological features. It employs massive reiteration in order to learn from outcomes. Performance improvement thus tends to descend from the future.>...>Unsupervised learning works back from the end. It suggests that, ultimately, AI has to be pursued from out of its future, by itself.- Nick Land (2019). Primordial Abstraction in Jacobite Magazine. Retrieved from github.com/cyborg-nomade/reignition
>>102490965They like to have the prose in their language because that's easier.
>>102490935>there are nice models for me>they go eventuallly into crazy self repeat modeI know that these two things aren't necessarily contradictory, but god damn does it feel like it.
Making a companion to browse 4chan with meAnyone tried this before?
>>102490964Maybe a 3 bit quant would fit if you ran absolutely nothing else.
>>102490977This is *extremely* retarded. It's like when people were using the word "conscious" to describe language models when they first became popular.
>>102490998>Anyone tried this before?it's nice tho I just use GPT-4o mini which isn't really local.
>>102490865Well, they didn't dare to post any actual benchmarks, just an arbitrary "+50%" on their website. While the original CR+ was at one point at the top of lmarena, new one isn't. It also barely improved at livebench. They clearly can't compete against a similarly-sized Mistral-Large.>>102490916If you are planning to use it for RP, you'll be disappointed, it's much more slopped than the original CR+.
>>102490946>confused. overlords told on podcast that AI is communism and Blockchain is capitalism.
>>102490998I've never built an ERP character for it (that's an odd thing to do...) but I've had gemma2 analyze /smg/ posts.
>>102491042I think the only thing I like about podcasts is that they use RSS. All of the actual content is always so fucking bad.
>>102490619I think I'm done playing with it for today, so the next one will be dark.But it might be better if we can find some host to embed the html file so the links can be clickable. If anyone wants dark mode, they could use an extension.>>102489492 >>102489495 >>102489509 >>102489541 >>102490619
qwen 2.5 made me interested in local again :3 I hope to be able to use RAG and other stuff to get coding llms to reference documentation
>>102491066Nice.
>>102491066fwiw, I put in feedback for them to consider reverting or changing the mass reply filter. don't know how much attention they pay to that but I figure it couldn't hurt
>>102480672>https://rentry.org/machine-learning-roadmapthe math in here feels a bit lack luster
>>102491077>qwen 2.5 made me interested in local againI like the 0.5B model, its pretty snappy
is it gay to goon to a gay RP if you switch to a straight one right before you bust? also, best meme sampler for this?
>>102491097The math isn't that hard anyway. Probably the most complicated/unusual thing is just the partial chain rule (gradient calculation.)Everything else is basic linear algebra which you should know if you've done practically anything more complicated than json pushing.
>>102491031But the CEO was on a podcast recently and he said they fount that good data was more important than compute
>>102491119I think you're confused.The use of meme samplers and mental gymnastics is correlated but that does not necessarily mean that meme samplers will improve your capacity for mental gymnastics.
>>102490763calm down ranjesh
>>102490964at like 20 seconds/token
>>102491119Why the fuck would you read gay RP to begin with?
testing models
>>102491096Good idea. Hopefully they'll reconsider. I don't know why they thought this would stop a determined spammer.
>>102491146They clearly haven't used good data in new CR, just in old one. New one is full of low quality synthetic garbage.
>>102491124>The math isn't that hard anyway.I understand that but that's under the assumption if we stick to the current status quo, is the goal not to advance the paradigm forward? we will need stronger math
whats the state of running local models on high end android phones?
>>102491124why would i learn linear algebra when my gpu does it for me
>>102491066Very nice.
>>102491215lol
>>102491165since you didn't understand the insult, you must be one of openais kenyans. monkey want banana? ooh ooh aah aah?
>>102490484You may not be trying to advocate for companies but as I said, that is what you essentially the effect of your posts before this.>your stance that scale doesn't matterI never said that. What I said is "The point is that it's essentially being used as a scam", and that scale is simply used as an excuse for that scam, which is actually what Yan's argument is truly about in the end, although he might not explicitly or directly say it like that. Scale obviously does matter to a point, but what it matters for is also a question, and my later point was that it might not matter for anything of equivalent value to the money dumped into it.>Investors readily fund guaranteed improvements, but are reluctant to invest in seemingly far-fetched ideasAnd that's the issue, that is part of Yan's criticism. Investors are not really putting money where it should go and essentially act based on hype while actually valuable research might not be getting the funding it needs, which isn't really a new or contentious concept.>If they cease focusing resources on the "bigger is better" approach, I question whether they will dare invest in riskier yet potentially more effective research avenues for achieving AGIThis does not really make sense as betting big on scale is already the highest risk given the amount needed for it. Smaller projects like JEPA or the original transformers paper do not need nearly that much money, and have never needed that much. It's a completely different ballpark of money we're talking about. That's just in the context of big stuff like GPT-4/5 though. If we talk about smaller companies and the smaller but still somewhat significantly sized models like Cohere's, it's absolutely a waste of money, and they have done virtually nothing to move the field closer to AGI.
>>102491280Heh you are mad
>>102490484>L2 14b??? bait
>>102491215>high end android phones?Do they come with a couple of 3090s one them now? That's cool...But maybe you can run some 8b on them. What's a high-end phone? Gimme specs, not models or brands.
are people itt coping about <70B models again? they're never gonna be viable and most of them will be phased out in the next few years. let it go.
>>102491336i've lost ~1.5 liters of semen to nemo finetunes this week
>I'm so coombrained I don't know how to readnot a brag but okay
70B models aren't even that good
>>102491215People were running vicuna 7B on some android phones last year. Google is trying to put gemma on the new Android phones. Apple has "Apple Intelligence" but I bet it'll just call OpenAI API
>70B models aren't even that good>t. vramlet nemo user>swiped twice on miqu IQ1_xxs
>>102491379Link the post, pussy
>>102491379>this non-replying motherfucker is acting like he's having the LLM shit out a sequel to finnegans wake and not some tsundere moege girl chatbotshaking my head to be honest
>reeee give me (You)s
>>102490977Sounds like una creator
Me waiting for local as good as claude that runs fast on average hardware
>new model wave hits>cooming doesn't improve
>>102491563gemma2 is good enough for most of what I want. I already used it to write me both an ffmpeg and image magick command today and it's hardly the afternoon.I wish llama was as good so I could finetune it.
>>102491389>Apple has "Apple Intelligence" but I bet it'll just call OpenAI APIThey've already said that's exactly what it will do
>new model wave hits>sloptuners too captured by /lmg/ memes to tun themplease keep telling them qwen sucks. we don't need anymore sloppa trained on opus logs.
>>102491601they have native adapters and a tiny model iirc for small tasks but Siri answers and anything longform/important is going to OAI.
>>102491336There is so much useless knowledge in those models you could make a perfect coombot in less than 7B. It is just a matter of cutting out the useless shit.
>>102491228So you know what to tell the GPU to do.>>102491205No. You need to be better at applying the math.And if you thought there was something extra but unknown how would the people teaching you know? Then it wouldn't be new. If you want that just start reading random math books (this isn't a bad idea btw, I used to do this all the time before I became cynical and jaded.)
>>102491619I still haven't gotten around to trying OpenELM. Has anyone else? I think support got merged into llama.cpp.
>There is so much useless knowledge in those models>t. spends every day on a forum dedicated to LLMs>still a coomer who doesn't know how anything works>just cranks his dick to /gif/ and sillytavern all day
>finally figured out how to completely remove repetition using rep pen and DRY>suddenly, all my mixtral variants push plots forward, have far more elegant prose, and not a single spine shiverIT WAS THAT EASY?? FUCK
>>102491663You didn't know about repetition penalty and went so far as to come here for help before trying it? How do you manage to dress yourself?
>>102491663>not a single spine shiverthat's not how rep pen and DRY works, pierre. stop shilling your shit 12B slop.
>>102491663What's your settings?
>>102491663Share settings plox, also, where's the Dry dial in openwebui? I cant find it
Qwen2.5 is such a piece of shit model, holy fuck, how could anyone use that shit.
>>102490086Isn't Nemo a 12b model? You're thinking Mistral Small 22b.Qwen2.5 is amazing. I've heard people speak of refusals, but I haven't encountered any so far on 32b.
>>102491584Tinfoil hat: there's one dataset that slops your models tf up but every epoch on it boosts your mmlu by 20%
>>102491724Yeah is dogshit, you are better off using anything else.
>>102491677it's absolutely true, but i make my own characters and don't share logs so you have to take my word for it>that's not how DRY worksi don't know what you're talking about, i read the pull request, and the person that made the DRY sampler says that's literally how it works>>102491676i knew about basic repetition penalty for months, but i had been using it wrong, because the ST devs can't be bothers to add context docs in most of the samplers, so i had to go digging into full docs and fucking reddit posts for how it actually worksyeah, the principle of "apply X penalty to any tokens to the last Y tokens" seemed obvious in hindsight, but putting penalty as high as 1.08 led to occasional incoherence, and any higher was gibberish, so i just thought i'd never be able to use it
>>102491724They don't. Those are trolls.
>>102491756What are your settings?
>>102491592maybe it's good for stuff that a functional member of society would use but i want to goon
>>102491694>>102491767pic rel, no point in sharing catbox json, since i've changed nothing elsealso, my mixtral tune uses alpaca system prompt, and i just wrote a basic 3 sentence one stating its a roleplay, and the desired length. all of my lewd shit is in my char defs>>102491711i use ST+tabby, look at your own docs, because idk, sorry
>i read the pull request, and the person that made the DRY sampler says that's literally how it workslolsince other people are taking the bait I'll give the retard explanation: DRY attempts to prevent shivers from showing up multiple times. it has to show up AT LEAST once in order to deprioritize it, similar to but more effective than rep pen.
>>102491823I see, so you've cranked rep penalty up high, but reduced the rep penalty range. Interesting. I'll give it a try.
So this is the power of closed LLMs
>>102491883problem, western man?
>>102491849Sounds great but Mistral models are repetitive on the paragraph level, not just phrases. DRY don't work here
>>102491823>temp 1.3 to 5>top k 0temp 5 top k 3 guy has competition now
>>102491823>temp 3.26??? what nuts bowl sits on top of perch shivers down the spine while chair 习近平 ding dong die
>>102491849correct, and the allowed length is how many tokens its looking backwards for repeated phrases in the context, and if it finds a match, it discards the current token and tries again
>>102491813I use it for my ERP characters too and it's fine it just has a very short context.
https://www.reddit.com/r/StableDiffusion/comments/1fm9pxa/joycaption_free_open_uncensored_vlm_alpha_one/New JoyCaption model. I dunno how many people care about this, but I've been using the pre-alpha version as part of a multi-model workflow to caption thousands of images for training Flux loras. So I'm super excited about this, gonna be playing around with it today and doing side-by-side comparisons with the pre-alpha.
>>102491901Looks like he has dynamic temperature turned off though.
>>102491901i'm not using dynatempi experimented with it, but i wasn't getting the results i wanted and went with neutralizing samplers and starting overthe box is clearly not checked
how can I vectorize black and white symbols? remove the white background
>>102491883trash in - trash out :^)
>>102480754>>102480814Is this a new restriction in 4chan?
>>102491903just trust me, it works
disable slider limitstemperature 10top k 1min p 0.5standard DRYyou can thank me later
Mistral models sees a concept appear twice and spends one paragraph of every reply from then on to rephrase that concept. How do you even fix this?
>rephraseshit in my experience mistral models just straight up repeat the sentence verbatim
>>102491971With topK 1 does anything else even matter?
>>102492003It's because I had some kinda rep pen on
>>102491975>>102492003That does happen a lot, yeah.Try the temp 5 topk 3 minp 0.1 meme settings and see if that adds some variety without making it stupid.For RP at least it should work "fine".
>>102491971Even if you put temp first, i don't think temp can ever change the order of the tokens to sample. And if temp goes last, it does absolutely nothing with a single token. And if you have top-k 1 before min-p, min-p has nothing to work with either. Even the other way around min-p does absolutely nothing.
>>102491661Is this a bot?
>create synthetic dataset using cloud API>finetune shitty research model with dataset>research model is now substantially dumber than before>still worse than cloud API in every wayhave you gone to the Kobold Discord to thank a finetuner today?
>>102483044I overlooked Gemma, assuming it would be censored to hell because of Google. Is 27b Gemma really better than 32 Qwen?
>>102492191Gemma writes well but it's cucked to 8k ctx
https://docs.novelai.net/text/Editor/slidersettings.html#UnifiedThoughts?
>>102492169All their finetuning can do is change style. For cooming it may be okay, but if you don't want to RP in claude's default style, they are pretty useless. Claude can do more than one style, you know.
>>102492241unfathomably based and good for the local LLM crowd
>>102491956https://www.photopea.com/layer --> new adjustment layer --> thresholdlayer --> flatten imageright click layer in layer panel on right --> blending options --> pull right arrow on "current layer" to anything below 255 --> OKright click layer in layer panel on right again --> rasterize layer styleimage --> vectorize layer --> colors 1 --> OKfile --> export as --> svg
>>102492318>>102492241NAI are pieces of shits that literally spam forums with their garbage.
>>102492527why are you so obsessed? you sound like b*rn*yf*g
>>102492527This. So much this.
>>102492577Uhmm.. can we unpack this, y'alls?
Is it possible to pre-tokenize prompts when running batched inference in vllm?I.e., I’m going run the same prompt through multiple times with different system prompts, and I’m trying to reduce the computational costs. Or am I going about this all the wrong way?
>>102490777enough of us"don't be poor" falls under skill issues
>>102492254>All their finetuning can do is change styleThey hardly manage to do that even. One thing Nemo is really good at is bilingual conversation; I was able to hold a chat with Nemo in English + Japanese with almost no errors or misunderstandings in the outputs. Yet none of the Nemo finetunes can do that, and they still talk exactly like Nemo but add degenerate coomer words like "cunny" and "obscene squelching" to sentences where they don't belong. Sloptuners lobotomize the fuck out of these models with approximately no benefit.
>>102492639With OAI API, no. Tokenization is pretty much free anyway. You want the cache, and it's on and just works by default. If you want to make sure it's working, prepend a random number to the very beginning of each your request and see performance worsen by a lot.
>>102492639Not sure that can be done. On llama.cpp, for example, you can cache a prompt and run it multiple times almost instantly, but since the system prompt goes before the prompt, the whole thing would need to be reprocessed again. Or more succinctly, You can only cache a common prefix. If vllm has caching, i'd assume it works the same way.
>>102480672Anons, everyone is saying qwen is shit for RP, but what about general purpose tasks, like classifying and summarizing text, including with "objectionable" content? Is there a 4-bit GGUF quant yet?
>>102492706>everyone is saying qwen is shit for RPlol no just a few mistral shills and retards who haven't tried the model
>>102492695>>102492698Thank you for answering my question
>>102492706Any model is better than that trash.
>>102492712What size do people run then? I only see like 7B and 72B, but not quants. Every 7B model I've ever seen have been fast but utterly retarded.I fed mistral 7B with so much to fill up its 128k context and still it was retarded and started breaking my expected output format.
>>102492731I'm currently using Llama-3.1 70B 4-bit. It's doing my classification tasks well. I'm always looking to improve though. Otherwise I'd still be on GPT-J-6B or markov chains.
>>10249273372B is great if you can run it otherwise 32B or 14B, they released almost every size anyone could ask for just look at their hugginface page.
>>102492759I can do 72B 4-bit or 32B prob in fp8 or 16, it's just weird I can't find quants, not even from TheBloke. I feel like I'm not searching right.
>>102492706>Anons, everyone is saying qwen is shit for RP>including with "objectionable" content?Being bad at one would make it bad at the other. Subjects overlap. But i don't know. Why don't you try it yourself?>Is there a 4-bit GGUF quant yet?yes. huggingface.co. Pretty new site to upload files. It seems some people are using it to upload language models, among other things.
>>102492771You are not searching right, grandpa.
>>102492771>not even from TheBlokeThe bloke hasn't been active since january.Look bartowski or quant cartel.
>>102492771>just look at their hugginface page>just look at their hugginface page>just look at their hugginface page
>>102492684I'm pretty new to this, but Nemo finetunes seem like complete shit. I can pretty much predict what the characters are going to say. Perhaps I should give the base model a try.
>>102492813No gguf quants>>102492799Thanks, bartowski has a 4-bit instruct
>>102492880>No gguf quantsNigga, are you blind? Top right.
>>102492880>No gguf quantsThe screenshot anon posted has gguf as the first item of the second column.>ThanksYou are welcome.You can often just search for >model name GGUFin the huggingface's search bar and find something.Do be aware that people can fuck quants up, so keep an eye for that (look into the --check-tensors argument for llama.cpp).
>>102492880You'll struggle your entire life not understanding what's going on around you.
>No gguf quantslol okay I'm done being helpful in /lmg/ this year. it's nothing but shitposting and trolling now. you people are fucking retarded.
>>102492952>>102492945gguf is a file format (with some tranny jizz mixed in)Quants are requantized versions of the model.Not all ggufs are the same quant. For my vram, 48GB, I need 4-bit quants
yeah if only you were shown where to find 4-bit gguf quants the first time you askedthese anons are fucking dumb, huh?oh wait
>>102492993ur niggerlicious
pissing me off>>102493018>>102493018>>102493018
Hey guys I'm new here, could someone point me to some resources to getting started? Also, is there an official /lmg/ card I can test with once I get everything running? Sorry if this listed in plain text somewhere on the thread, I'm just looking to be spoonfed links. Thanks!
>>102493138If only we had some resources...
>>102493138Read the OP.For an easy entry, koboldcpp + mistral-nemo-instruct gguf. Get the version (quants) that is smaller than your vram by about 15%, enable flash attention in koboldcpp, and set your context size to 8192.Then start messing with things. Different models, different context sizes, different quants, etc.
>>102493138>could someone point me to some resources to getting started?https://ollama.com/download>official /lmg/ cardno current on the market is worth shelling money out, the official card to test with is whatever NVIDIA GPU you have that isn't a decade old
>>102491066consider using more than one column
>>102493186>no current on the market is worth shelling money outI assumed he meant a character card, based on "once I get everything running".
>>102491907allowed_length is the number of tokens that can be repeated before a penalty is applied. DRY actually looks for repetition in all the context
>>102491823a quick update to this: neutralize presence penalty, or the model goes mildly schizo several replies in and starts dropping articles in front of nouns and starts talking like a caveman