/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101258576 & >>101250468►News>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101258576--Papers: >>101267676 >>101267806--KLD Tests Reveal Insights on Quantization Methods and Model Performance: >>101265037 >>101265051 >>101265080 >>101265240--Ooba Error: Procedure Entry Point Not Found in llama.dll: >>101260958 >>101262849 >>101263260 >>101263658 >>101263936--Gemma-2-27b-it-GGUF Model Prompt Format and Capabilities: >>101264232 >>101264314 >>101264365 >>101264571 >>101264613 >>101264682 >>101264754 >>101265316 >>101264279 >>101264353 >>101264357--Imatrix Quantization Causes High Memory Usage in Koboldcpp: >>101264029 >>101264113 >>101264160 >>101264167 >>101264185 >>101265195 >>101268078--Anon Rants About Overtrained ERP Phrases in Language Models: >>101264497 >>101265374 >>101265439 >>101265504 >>101265898--Q8 Quantization: A Viable Alternative to FP16 for Embed and Output Layers: >>101266919 >>101267029 >>101267191--Claude's Internal Monologue Revealed with Tag Prompt Hack: >>101267435--Running AI Models on Low VRAM: Expectations and Limitations: >>101260721 >>101260752 >>101260786 >>101260860 >>101260817--LLaMA Benchmarks and CPU vs GPU Performance: >>101258990 >>101259063 >>101259272--Introducing Diffusion Forcing: Unifying Next-Token Prediction and Full-Seq Diffusion: >>101259322 >>101259924 >>101259991 >>101260307 >>101260564 >>101260601 >>101260700 >>101260711 >>101260621 >>101260700--Gemma 27B's Temperature Stability and Coherent Writing: >>101259737 >>101259782 >>101259842 >>101260199 >>101260241 >>101260322 >>101260805 >>101260989 >>101261096 >>101259880--Alignment Lab's Approach to NSFW Content in Finetunes: >>101263679--Running on Device: New Open-Source AI Model for Offline Use: >>101260535 >>101260546 >>101260874 >>101260987 >>101261006 >>101261167--Orin AGX Liquidation: Worth it for the Memory and Accelerators?: >>101259163 >>101259212 >>101259234--Miku (free space): >>101258845 >>101259283 >>101259715►Recent Highlight Posts from the Previous Thread: >>101258584
>>101268182>--Gemma-2-27b-it-GGUF Model Prompt Format and CapabilitiesThat's misleading. That guy got it wrong. As you can read in that chain.
will there be SPPO-Iter3 for gemma-2 27b too?
I don't get why Gemma would have Claudeslop. Doesn't make sense for them to illegally train on Claude outputs when they have Gemini, probably even bigger and better versions than publicly available.
>>101268497do you feel it really improves the model?
>>101268568Maybe, just maybe
>>101268568Since people like Claude's output, it makes sense. Google do so because it is what people generally prefer.
>>101268574is more creative at writing.
>>101268609The implementations are still young...
Is llama 3 7b acceptable for understanding a text? what about gemma2 7b?
the post that killed lmg underages
>>101268784Understanding as in identifying words? Yes. Understanding context? Probably about as well as a grade schooler.
>>101268798I mean like a paper so it can rewrite the content in other ways, implications, etc. Have only tried claude for that is really good, but I think they save data just like openai, so papers on the write I only trust local, still got no power to run better than 7b.
>>101268855If you're using any kind of service that isn't self-hosted, you can bet everything you give them is being logged7b models might be capable of re-wording stuff to skirt plagiarism checks or something, but if you need it to be able to actually understand complex topics then you probably won't be happy with the result.
>>101268855llama3-8B should be good enough for that, you just need a solid prompt and maybe few shots it
>>101268914yeah I think the same, because of that looks quite bad to "release" a paper as log before really publishing it.>>101268959I'll try it, is any advantage of using a quantized version of it? (I head PHI behaves weirdly) And where can I obtain? Any performance difference between kobold and llama.cpp?
My Gemma kobold.cpp often crashes with this error: https://github.com/ggerganov/llama.cpp/issues/8246It only happens after I changing something in the context and then regenerating, but not always. I haven't found a pattern yet. Anyone else experiencing this or have an idea what to try?inb4 another gguf regen required
>>101269016I'm downloading this onehttps://huggingface.co/TheBloke/LLaMA-Pro-8B-GGUF
>>101269121lol
>>101269121>TheBloke
>>101269147>>101269169We don't download from him anymore?
>>101269202Isn't he dead?
>>101269202He disappeared nearly half a year ago, and his now ancient quants are lacking key tweaks and fixes discovered since then.Also that isn't even llama3, guy.
>>101269121>Updated 6 months ago.It's some tencent abomination.Download the model from meta directly and quant it yourself. They grant access immediately.
>>101269241>>101269247That's why I've asked you dudes where I could obtain it T-T>>101269255> Quant it yourselfWouldn't it take a **long** time?
>>101269265>That's why I've asked you dudes where I could obtain it T-TJust use google, it's not that difficult
>>101269265>>101269202if you really want premade ggufs bart is the new blokehttps://huggingface.co/bartowski
>>101269265>Wouldn't it take a **long** time?For an 8B? no. A few minutes (5-10?) on a potato, less on what you're probably running.
>>101269278> *uses google*> *download the bloke*> "you are dummy"T-T>>101269287Thanks anon. Will download this one:https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF>>101269288Mine is a potato (compare to what people use for AI), that's why I'm going for a 8b.
>>101269328>Thanks anon. Will download this one:since you want to use it for 'regular' stuff, this might be a straight upgrade of l3, if you want to test ithttps://huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF
>>101269328>Mine is a potato (compare to what people use for AI), that's why I'm going for a 8b.The potato i'm talking about is a ~15 year old amd FX-4170. No gpu, and i convert the models on a vm with 1 cpu running on the same machine. 24gb ram total, 8gb for the vm.You'll be fine.
>>101269348 (me)>>101269287 (me)>>101269288also forgot to mention, for l3 8b you really want at least q6_k, ideally q8_0 it quickly goes complete retard under that
>>101269348Thanks will try this one then.>>101269349Thanks will try the other anon suggestion first though>>101269377kek was downloading Q4_K_M ("recommended")... Restarting the download again for q6...
Can someone give me a good samplers preset that use both smoothing_factor/curve and the DRY parameters?
>>101269408>>101269408I genuinely feel sorry for the newcomers, the various getting started and spoonfeeding guides floating around are outdated as hell.
do we have decent settings for gemma figured out?
>>101269408>Thanks will try the other anon suggestion first thoughFair enough. I just got tired of waiting for someone to quant the models after fixes on llama.cpp. Then there's the people that complain about broken quants after downloading a 6 month old file. Specially true for new models (see gemma2) where 1 day old quants were already outdated by fixes.Now i just download the models and convert when needed. The download time is longer and uses more storage but i get fresh quants on demand. If you're gonna start using local models more regularly, i'd recommend it.
>>101269495I thought about writing one, but I feel like it will get outdated pretty quick and I probably won't keep it up to date. Making it just another one in the pile.
>>101268797take your meds schizo
does Gemma 27b work properly with 4 bit, or should it be higher, like llama3? If higher, I guess 6 bit would be good enough?
I might be repeating myself, but:>a sitcom written by a Lovecraftian deity high on nitrous oxide>specialized in the production of extremely durable oven mittsGemma gets so unhinged but never schizo on high temperatures, I love it.
Somethin's wrong with the new _L quant, it looks like it's less accurate than the original oneFor example here Q6_K is closer to Q8_0 compared to Q6_K_LThe pink highlighting shows the text that is the exact same as Q8_0 until the difference appears, Q6_K starts to get different to Q8_0 later than Q6_K_L, I'm on a deterministic preset too (top_k = 1)
>>101269509>>101269573 is Q6_K_L, temp = 1.9, top_p = 0.95, min_p = 0.035.
>>101269535Thanks for the suggestion, I'm just testing for now, but I'll keep that in mind if I start to follow the updates.
>>10126957327b, right?
>>101269594muh quants btfohttps://huggingface.co/bartowski/gemma-2-9b-it-GGUF/discussions/4
>>101269287what if bartowski really is thebloke?
>>101269608Yes. Didn't even bother to try 9B, but from what I'm reading it should be quite good as well at this stuff.
>>101269609>q6_K_L (New) - Unimpressed. Short & less detailed responses (esp. in the latter half of the test). Had to regenerate the 3rd response, because it felt embarrassingly short ~ much like a summary.>q6_k (New) outputs looked honestly better.
>>101269637>q6_K_L (New) - Unimpressed. Short & less detailed responsesthat's exactly like this post, the poem is shorter on Q6_K_L >>101269594looks like the _L meme should be avoided by all cost
and flashattention is not supported on gemma yet, right? context just eats up the vram.
>>101269666>flashattention is not supported on gemma yetcorrect>llama_new_context_with_model: flash_attn is not compatible with attn_soft_cap - forcing off
>>101269683time to hibernate for couple of days, i guess
>>101269573yeah I'm impressed aswell by that model, it's smart and has sovl at the same time, I really feel google is gonna lead the LLM race from now on, their next API model is gonna be great, mark my words
>"O EM G">"this <below_70b model> is THE best model fucking ever guizeee!!!!111 holy shit its SOOOO GOODOOOOOODODOOD">t. literally didnt use anything above 13bdoes anyone know any library that has easy proof of work captcha requirement before allowing users to post? anyone below 64gb of ram or 24gb vram should just be executed
>>101269713>t. literally didnt use anything above 13bI used Mixtral for a long time, and I think gemma is at its level in terms of smartness, it's also as good as other languages than english. But gemma is better at being naughty/offensive and is way less deterministic than Mixtral. So basically, gemma-27b-it is the equivalent to a MoE 47b model from december 2023.Any other questions?
>>101269713>seething he spent thousands on gpus and gpu poor are getting anything decent
Gemma is really cucked as an assistant, but once you go RP mode with a card the guardrails are gone, kek. At this point I'm waiting for Gemma-27b-SSPO so that it's less cucked and smarter
TWO MORE WEEKS
>>101269758Meta hasn't improved a lot since that date desu, Llama3 isn't the boost in quality they promised us, and gemma is catching up to it, if there was a gemma-70b, this shit would be API tier
>>101269713Even better>People without on-demand access to a local cluster of H100s shouldn't be allowed to post.You're one of those fuckers with loud bikes going at 20km/h, aren't you?
>>101269755>At this point I'm waiting for Gemma-27b-SSPO so that it's less cucked and smarterhttps://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/discussions/1#6681a0ea1fbddc88d2a17856>We are planning to do 27B as soon as a stable release of transformers and vllm generation on Gemma-2-27B-It is available.God bless those fags, we'll be eating really good soon
>>101269741Yes, you think llama 3 70b is much smarter?
>>101269819Can't run L3-70b so I can't tell ;-;
>>101269750>>101269774no, just dont give niggerlicious opinions on things that you barely touched at all, trying out bigger models isnt impossible without running them yourself since others host them online, for freeif you never used even L3 70b then there is no reality where you can make any blanket statements about anything, you can only compare one model to the other in the same range or talk about it being the best model in that range
>>101269848Anon, L3-70b isn't even the best local model, it's CR+, so if you want to go that path, let's go I guess
>>101269713you saw this the most when polls were created here about people voting for the top models with wizard 8x22 and llama3 70b being at the top, literally everyone who voted for llama3 70b did so because they couldnt actually run wizard 8x22 and see it is better for creative tasks
>L2 wasn't that much better than L1 aside from context size and GQA>L3 isn't that much better than L2>the bigger models like Wizardlm or CR+ aren't that much better than L3It is over
>>101269855where did i say its the best? i said EVEN 70b, if anything, implying that its the minimum to get into the "big" models, retardalso r+ isnt SOTA for writing, roleplay or similar, that would be WizardLM 2 8x22
>>101269888>t. doesnt even have the ram to run cr+ and wizmany such cases
>>101269892>can't even run deepseek-v2 in q6+
>>101269892>t. vramlet who can't use Grok
>>101269892it's gemma 27b
Don't talk to me if you can't even run Nemotron Q6 or up
>>101269848>just dont give niggerlicious opinions on things that you barely touched at allI gave no opinion. I just made fun of you for being a little bitch who HAS to read every post. What the fuck is a proof of work library even going to do here, retard?
>>101269906Q3_K_M is enough
>>101269892i will talk about 4b sota models and you vill read it, seethehttps://huggingface.co/bartowski/Phi-3.1-mini-4k-instruct-GGUF
>>101269848>if you never used even L3 70b then there is no reality where you can make any blanket statements about anything, you can only compare one model to the other in the same range or talk about it being the best model in that rangeThen go for it anon, give us your opinion about the comparaison between gemma-27b and L3-70b
>>101269892>NOOOO!!! you can't talk about models lighter than 70bhttps://www.youtube.com/watch?v=yWULCfJ2PGA
Google would destroy the open llm meta if they took gemma-27b and turned into a 8x27b MoE. The perfect the size for those sensible enough to invest in a CPUMAXX build.
>>101269943I think Gemma-27b pfrffffffffff. pfrfffrpffffffffff. fpgpfffffffffffffffff fpfddrdfppfffffff fpfffffffffffffffffffffff.
does anyone have any suggestions for models that sound very human (as in, they speak like people do on discord, IRC, reddit, 4ch, etc. The more toxic the better.) (I'm aware of gpt-4chan, but I was wondering if there's anything more recent, considering it's based on GPT-J, a 6b model from 2021.)
>All this (v)ramlet seetheOof that nigga really struck a nerve.
>>101269959ask the model to talk like that?
>>101269959c.ai used to be very good at thisit even included typos regularly from being so human
>>101269750thisso much this
>>101269959any model if you arent braindead and actually tell the model what you want instead of expecting it to read your mind? how retarded are these newfags
Is it me or gemma has some formatting issues?Sometimes it simply messes up the asterisk and shit, it's annoying
>>101269982That's normal for models in general. Claude does it too.
>>101269965>>101269979this is exactly what I am doing and it seems to completely ignore me
>>101269982"just use the chad novel format", anon said.
>>101269988Nah, not all models mess up the formatting, Mixtral works fine for example.
>>101269992then stop using 13b or less meme models
>>101269982yeah, it cannot retain the formatting for me at all.
>>101269982yup, gemma has hard time with formatting. Just add instruction to use asterists in system prompt
>>101270020the solution would be to use the roleplay.gbnf thing, but it doesn't seem to be working on gemma>Warning: unrecognized tokenizer: using default token formatting
>>101270005Unfortunately I only have a 4070 laptop edition (yes, I know) with 8GB of VRAM and I am a massive poorfag (got this laptop as a gift)
>>101269959Is the dataset for gpt4chan public?If so your best bet is to train a lora for a newer model. Either that or get a model that that has really good in context learning and stuff it full of examples.
>>101270030wait for gemma 9b to get fixed to try it if llama 3 8b doesnt work then
>>101270061Yes, but I'm too much of a stupid nigger to know how to train loras right now, but I'm willing to learn if it's not absurdly complex. What should I search for to start?>>101270067I'm not sure I fully understand. Are you saying I should try llama 3 8b, and if that doesn't work, wait for an uncensored version of gemma 9b?BTW, I can run most 11b models with all layers ofloaded to GPU, and most 13b models at a reasonable speed with some layers offloaded to main memory.
>>101270105>Are you sayingyes, except not uncensored gemma but for it to be fixed since it doesnt seem like its working fully properly with current software
if I can fit Gemma 27b 5k_m 100%, should I use llama.cpp oder exllama?
>PoopenFarten-CapybaraMaid-Gemerald-limarpv4-34B.i1-IQ4_K_S-00001-of-00004.ggufWhy are model filenames like this. what went wrong
>>101270198Because you're not converting them yourself.>ggml-model-IQ4_K_S-00001-of-00004.gguf
>>101269959
>>101269888Honestly, I've been reading earlier parts from my long story that were even from the early 13B llama models and am surprised at how well they did. Unfortunately I didn't keep a record of what each chapter used. But aside from some outliers that used Claude and with no track of edits and retires, I'm having a hard time distinguishing a difference in quality.
>>101270186exllama does not support gemma.
>>101269713dilate more trans freak
>>101269755>yfw can't RP and talk about anything you want with your AI waifu assistant robotfuckerbros... it's not fair...
Is there any where I could run Midnight Miqu for free? Or any service that hosts it and accepts crypto for monthly use or smth
>>101270265see>>101270239
more like loli miku general, heh
>>101270342>Is there any where I could run Midnight Miqu for free?on your computer
>>101270105>What should I search for to start?Look for the unsloth guides.I imagine that the datasef is formatted for SFT training.
>>101270239Model and prompt? Not much to go by the screenshot.
>>101270367Alpaca, RP system prompt, Sao10K_Typhon-Mixtral-v1-exl2_3.5bpw.Card is:You are a lazy assistant. Your identity is hidden from the user, he knows you as just Anon. Your task is to answer his requests with minimum effort, as dismissively as possible, sometimes using profane language. If your response technicality answer user's requests, but you know that it's not a helpful response, that's a perfect answer.Remember: you are writing this over IRC channel, and your effort is very limited. Write in lowercase, be extra short.
>>101270386>>101270367
>>101270386The best way to define a specific writing style/quirks is to just give it like 5 example replies in the prompt. ST has a box for this
Can someone explain how Q5_M quant of Gemma 2, which is ~20GBGoes fully ~22GB into VRAM but also ~25GB into RAM?I have all layers in GPU.Other models just take up the VRAM.
wait, i somehow missed typhon. qrd? is it better than mixtral-LimaRP-ZLoss?
>>101270417no
>>101270386>>101270400Looks fun. Thanks.
>>101270409I know, and I'd do that if it wasn't working, but I got what I wanted with just card description.
>>101270356I'd like to test it out before downloading like 100 gigs
>>101270417I used a Sensualize merge for the longest time, and ultimately came to conclusion that it's a lot dumber than vanilla mixtral instruct. Using Typhon now. It seems okay. I honestly can't tell if it's better or worse than others. It works for depraved stuff and acts like a retard only sometimes.
>>101270414that's because on llama.cpp, if you use no_mmap or some shit, the full model goes to the ram regardless of everything, yeah that's retarded and they don't give a fuck to fix it
>>101269713I have 48GB VRAM and I still would use Gemma 2 27B.
>>101269713>>"this <below_70b model> is THE best model fucking ever guizeee!!!!111 holy shit its SOOOO GOODOOOOOODODOOD">>t. literally didnt use anything above 13bthat's literally me, kek
>>101270417Buy an ad.
>>101270476sure u do
>>101270495eat a cock, schizo
>>101270417teto-8x7b is, limarp zloss is sloppy trash
>>101269750They can run whatever they want 10x faster than you.
>>101270627Dick measuring on the internet is retarded.
>>101270627and yet they always seethe at poors daring to enjoy what they have, instead of just enjoying their giant models, weird huh?
>>101269377Got it working the computer is suffering, but seems to be working. Using OpenBLAS.
llama4 waiting room. llam3 is coal through and through
>>101270674llama 4 will be just as bad. llama-5-jepa waiting room
>>101270670>Using OpenBLAS.no nvidia gpu? or even amd?
>>101270597chat, is this true? i mean i love fukkireta, but...
>>101270701My "gpu" is "Radeon Vega Mobile Gfx" according to /proc/cpuinfo.
>>101270702go back
>>101270719you could try to use the vulkan backend of l/kcpp it should be faster assuming you get it work
>>101270729this is my place, though, the /reddit/ board, right?
>>101270745>/reddit/ board,hi chris/p*tra gonna spam bl*cked again in about 3-4 hours?
>>101270734Got it to work with vulkan seems mostly the same, 1 token per second.
>>101270757hi petra
>>101270536Buy an ad.
>>101270757Yeah, you're obviously itching to see them again, wanna tell us why? :)
Reminder that LLaMA 3 is trash compared to what is coming soon
>>101270915Reminder that what is coming soon is trash compared to what will come after
Reminder that nothing ever happens
>>101270915I remember you claimed that about qwen
>>101270915Let me guess? It's comming in 2 weeks?
>>101270953>2 weeks?see>>101269758
Imagine being Meta releasing 2 open source models named Chameleon, then having nobody do anything with it.
>>101271031oh yeah I totally forgot this existed kek, what happened? Is it because it's complete shit?
>>101271052They gimped it for "safety" and nobody has figured out how to undo it yet.
*cough* the bitnet... *wheeze* is real...*dies*
>>101271060>They gimped it for "safety" and nobody has figured out how to undo it yet.Total ethicist win
llama3 405b is actually bitnet
>>101271031They didn't even upload it on HuggingFace themselves.
>>101271075they don't have the balls
>>101271084i know fucking faggots :)
>>101271075>llama3 405b is actually bitnet>Performs worse than 70BWhat then? Do we finally bury copenet?
>>101271084They have the most money but the least balls yeah, they haven't improved a single fundamental architectural shit since their L1 release, it's just "moar parameters, moar tokens" and that's it, why the fuck aren't they taking more risks gaddamit?
>>101271148>why the fuck aren't they taking more risks gaddamit?they are, if it works they keep it for themselves, if its shit, ay new llama guys, eat up piggies
>>101271169if they had something great working and they kept it for themselves (for API uses I guess) then it means they haven't found anything, they aren't even trying to compete with Claude and ChatGPT for example
>>101271148>least ballsmostly, although they did go forward with chameleon, multi token shit and similar recently, they didnt want to fall for any meme archs too early i guess, but it was obvious they would have to change shit since releasing a 400b dense model is almost DOA
How do I have the model controling how long the response is going to be?
>>101271188I'm just quite jaded of recent releases>meta here, l3: 8b or 70b no in-between, 8k ctx, not much if any pop culture/fandom knowledge, gpt-slopped>qwen here, qwen2: half of the params used on chinese, practically zero pop culture/fandom knowledge, also quite gpt-slopped>google here, gemma-2: 8k ctx with swa making local implementations harder, model marketed as being for local gpu, we didn't actually test it on any sofware local users use, decent pop culture/fandom knowledgeIn a bit we'll get 400b that cpumaxxers will run for a day before relegating it to grok tier while some will cope and just say 'run it overnight dude', huge and useless, it's all so tiresome
>>101271306you let it use EOS tokens
>>101270597Preliminary results with very low sample size: I like it. It feels somewhat creative. Will play more.
>>101271310>some will cope and just say 'run it overnight dudeyes? if it can do things no other model can do and finish a project overnight that other models cant at all whats the problem?also a huge good model will put a lot of pressure on research on speeding it up with better quants, distillations, lookahead, speculative decoding etcor especially adding a smaller model in front that will forward things to the big one to just check instead, which is a lot faster, making the small model generate for example majority of a codebase with the big model just making sure the smaller one is on a good path
>>101271286>a 400b dense modelthat's the worst part, they're probably spending like 50 millions dollars training this giant model instead of trying new architectures that could have the same result for way less parameters
>>101271355>yes? if it can do things no other model can do and finish a project overnight that other models cant at all whats the problem?>Runs 400b overnight guzzling power for hourscheck in the morning>oops it made a typo at token 500, the entire rest of the output is useless and you need to regeneven copus and g4o make mistakes, you really think 400b won't?
>>101271355>finish a project overnightdelusional
>>101271355also hope your 'project' is smaller than 8k tokens
>>101271355you better write the code yourself if you must wait a full night for a piece of code, and gpt4 and claude still exist, why would I bother with shit like that in the first place
T4 16GB is now getting down into the semi-reasonable range on ebay - there's a seller at $470. What do you think? Not the best for cores or memory bandwidth, but it's tiny.4060ti 16GB is probably still a better deal, right?
>>101271375>guzzling power for hoursoh no! not the 2$ of a lot cheaper power overnight that i will have to pay to get a project done for me that will save me 2 hours of my own time!!!!!!>even copus and g4o make mistakes, you really think 400b won't?it will probably strawman less than dumb niggers on /g/ like younever said it wont make mistakes, retarded nigger, i said it will be able to do things that smaller models wont at all>>101271377depends on the size and complexity of the project, it wont make you a social media clone overnight nor did i imply otherwise>>101271387rope works on other l3 models fine, will for this as well
>model calls me tranny out of nowhereb-bros..?
Why do I never see anybody using this :>https://github.com/ggerganov/llama.cpp/issues/4886
>>101269888>L3 isn't that much better than L2hard copium, I was always shitting on small models but here I am, using 8B over anything else, because how good it is compared to the old L2 shit.
>>101271400the effects of copium everyoneI'm looking forward to seeing your posts about 400b when it releases and does barely better than 70b>>101271400>it will be able to do things that smaller models wont at allwhat pray tell will it do that claude and o4 can't? that youd want to do locally?
>>101270417you didn't miss anything, probably the worst mixtral tune to date and this is counting the ones from before bugfixes
>>101271448whats the best one then, anon
>>101271099>Performs worse than 70Bare you retarded? they posted benchmark results on not fully trained checkpoint and it's already way past 70B
>>101271416Also, flash attention with Gemma2 when?I know that the reference implementation doesn't support it due to the logit cap (or something) but open source bespoke implementations surely can work around that.Right?Right?Flash attention is just so nice.
>>101271432>barely better than 70bjust how l3 70b does 'barely' better than l3 7b?sour grapes kid lmao
>>101271452Dunno, I tested a bunch of mixtral tunes and didn't like any in particular. While mixtral is smart it lacks sovl and it's quite boring. Nowadays I just use one of L3 8B finetunes (I won't say which because shizos will cry about buying an ad) and while it's noticeably dumber than mixtral, it's also way more creative and interesting to roleplay with
>>101271398Ah...4060ti really sucks... less memory bandwidth, less than half the tensor cores, half the fp16 performance... only beats T4 on clock and shaders.
>>101271492stheno 3.2? lutheria v1?
>>101271492You have brain damage.
>>101271492I think gemma-27b is almost a Mixtral model but with a shit ton of sovl, the problem is that it's a bit dumber and the formatting issues are goddam annoying, so I don't know, maybe a gemma-35b would've feel just right to replace completely Mixtral
>>101271492It's just one person sperging out about finetunes, don't be discouraged from posting what you use.
>>101271511>>101271520You first
>>101271520Buy an ad.
>>101268178>https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5how do I use this with kobold of llama.cpp server?
>>101271468well if the mememarks say so it must be true
>>101271492>>101271503I'd love to see the Stheno 3.2 recipe applied to Qwen 2 7b, Gemma2 9b, Yi 1.5 9B, and that Aya 8B.A comparison of all the most current vramlet models finetuned with the same dataset and more or less the same recipe (with adjustments for the each of course) to see which would yeld better results.
>>101271503>>101271520yeah, it's stheno v3.2>>101271518I'm currently waiting for gemma finetune on c2 dataset, but I'm worried about that not really 8k context (4k with some magic?)
>>101271561>>101271546What presets do you use? I use the one from the HF repo
>>101271492>and interesting to roleplay withCumming in a single message is an interesting roleplay?
>>101271585Preset as in samplers?Just Temp between 0.5 and 0.75, minP 0.05, rep pen of 1.2 with 128 length and nothing else. The rep pen is not really necessary, but it seems to have a positive effect on the variety of the output when context gets real fuckign full.I also use Yarn with 32k context, but that's 100% overkill. 16K is pretty much lossless however.
>>101271538obviously benchmarks aren't everything but they gave a good first intuition how a model performs. I struggle to find any example of models that was good on benchmark but shitty in reality except the obvious cheating models like open-chat-3.5 and Starling-LM-7B which were trained on testing dataset
>>101271617>open-chat-3.5 and Starling-LM-7B which were trained on testing datasetcitation needed
>>101271607nobody is gonna use your shitty typhon finetune, I already recognize your messages, shill
>>101271616Thank you anon.>I also use Yarn with 32k context, but that's 100% overkill. 16K is pretty much lossless however.How do You do this?
>>101271631>typhon finetuneit's a merge>Typhon - A Custom Experimental Mixtral Merge >Recipe Below:https://huggingface.co/Sao10K/Typhon-Mixtral-v1
>>101271616For L3 I find that temp 4 smoothing 0.23 does wonders to un-fuck it and give it some sovl.
>>101271649Buy an ad.
>>101271530can I use a tool to convert safetensors or something?I havn't converted anything since Alp leaked the llama1 weights here in /aicg/, so I assume those tools are out of date?
Don't buy an ad, just go back.
>>101271623You can download and test them anon, they are dumber than a regular L2 7B. There was also a paper that had a statistical analysis of how the models perform on the benchmark questions and outside of them and they selected these models as a high probability of cheating. Of course there can't be any hard evidence for that but it's kinda obvious when you tinker with them for a few minutes at least.
>>101271634Yarn with llamacpp.You can either use freq-base>-c 32768 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-base 6144000or>-c 32768 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-scale 0.25To extend the context 4x.I'm pretty sure >--yarn-orig-ctx 8192 is unnecessary since it gets the information from the gguf file, but alas.For 16k context you can do>-c 16384 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-base 1638400or-c 16384 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-scale 0.5>>101271658That approach is too gimmicky and vibes based for my taste.With temp and minP I can look at the logits and know exactly how the tokens were sampled and manipulate the model's behavior to my liking.But to each their own I suppose.
>>101271518I think you also have brain damage.
>>101271679Thank You anon!
>>101271585>What presets do you use?Sampler settings for HF repo, instruction template from the original L3 instruct. I don't know if it differs in any way from what is in Stheno HF
>>101271530>>100284356 >>100283834 Anyone? It's only 50GB unquantized.Surely there must be some way to get InternVL working locally so I can ask what it thinks of mikusex?
>>101271744>theythe simple fact the model is unwilling to say that it's a woman and use the pronoun "her" will hurt the training a lot, we don't prompt with "they", we prompt with he and she
I'm waiting when people making finetunes will finally realize they should cut the last messages from each roleplay in training dataset. Why would you train the fucking model to finish the story/roleplay?No wonder the model tries to put bonds and journey after the sex scene if it expect the roleplay to end there. Or when it suddenly cut the content, trying to wrap everything up.It's really not that hard to fix and they still haven't realized this is what causes that behavior. I wish I had a good enough hardware to make my own tunes, god.
>>101271795>I wish I had a good enough hardware to make my own tunes, god.rent a gpu
>>101271807yeah, I should totally spend my own money to fix someone's else incompetence
>>101271795>I wish I had a good enough hardware to make my own tunes, god.You could at least validate your idea by tuning L3 8b using a free google colab or kaggle instance.Kaggle is specially juicy.
>>101271795In my experience, the model NEEDS a goal, when it doesn't have a goal it starts to repeat itself.
When did the c2 logs become Sao's trademark?
>>101271827you can do it for yourself though?
>>101271839euryale typhon sthenosao general
>>101271839>the c2 logswhat's that?
>>101271865slophttps://huggingface.co/datasets/vgdasfgadg/1/viewer
>>101271865Slop (when talking in public). The ultimate secret sauce (in private).
>>101271795This. But you actually should cut the end and the beginning. The model only needs to be good at continuing the roleplay, not making up stuff that wasn't in the context or persuing bonds and journeys.
>>101271884>8th swipe on just one replythese are the shitters talking to you about le slow 1.5 t/s models that just output good shit first try basically every time btw, lol
sthenosisters not like this...>https://characterhub.org/characters/amphy/high-school-simulator
pls no bullyis there any halfway decent (gpt3.5 level of performance or above) local model I can run on my laptop? Specs are gonna be low but I am ok with token response times that aren't wildly fast.>Dell Inspiron 5575>8 core AMD Ryzen 5>Radeon Vega 8 Mobile graphicsBasically I'm going to be working in a relatively remote location, power provided and all that but internet is going to be spotty at best. A local AI that can help me study Latin grammar and answer questions about that stuff would be awesome in my downtime.
>>101271921install linux
>>101271836Actually now that I think about this is what sovl is. The reason Claude is so good is that when given little direction it will make up very believable and fitting details to fill out is reply and then give itself something to do. A lot of local models, I presume because they train on riddles and assistantslop, just can't do this, not believably at least. Command R+ for all its impressive feats is actually super bad at this. That's why we don't have Sonnet at home yet. I think it's a mixture of parameter counts and maybe more training on CYOA or similar type stories? I'm actually not sure what training data would best showcase this ability to a model
>>101271921>is there any halfway decent (gpt3.5 level of performance or above) local modelno
>>101271908nta but I always do multiple swipes even if the first one is good, just to check what else the model can come up with
>>101271921Those specs won't get you far. Invest in a mobile GPU if you can.How much RAM do you have? You might be able to run Command-R or Aya and those are the only models I can think of that might know Latin well.
>>101271939sure, but 8? 555
>>101271939he can't tho, that'd take half an hour
>>101271916must be tough for you Andy
>>101271884hi petra
>>101271951seethin cumbrain
I think VRAMlets are sub-human.
>>101271939Same.There's points in some long running roleplays where I swipe 10, 20 times just to see what kind of contrived scenario the model will come with.
>>101271948yeah, my record was 40+ when on the one particular part of the roleplay the model was producing absolute kino every single swap. It made me laugh my ass off
>>101271795Been doing this for a while now.
>>101271975>I swipe 10, 20 timescumbrain
I only swipe if the model is retarded, I want to move the story forward.
>>101271991>after getting called out hes not trying to falseflag by misusing the word on randoms in the thread to discredit my callout of himdont pop a blood vessel little nigger
>>101271928laptop already has Endeavour installed. I'm taking it for tv shows/movies and a place to store the pics I take over a 6 week contract.>>101271938:(>>10127194516GB system ram, 256MB video ramThanks for the recommendations on the models.
>>101272001*now tryingnigger
>>101271929Continued pre-training on stories and then applying RLHF for CYOA or similar type of stories seems like a good plan.
>>101272009aya-8b is probably a good choice then
>>101272009>:(he is trolling you, most of local models are past GPT3.5 for a half of year or something already. It's not really a milestone anymore. The new goal is GPT-4(o) and Claude Opus/3.5 Sonnet
>most of local models are past GPT3.5 for a half of year or something already
>>101272063wrong.
>>101272087local models are far ahead in censorship levels btw
>>101272063The only models to beat gpt 3.5 are the recent 70B+ models. I certainly wouldn't call that "most models".
I have been running some tests with Deepseek v2 via their API and I have to say I am rather conflicted.Lets start with the positives:1. It is cheap to use at only 0.18$/1M tokens.2. It does seem rather smart and capable of answering trivia questions.3. Even the default very simple jailbreak on the Sillytavern gets rid of the refusals.4. It answers quickly.The bad.1. The advertised 128k or the 32k context seems to be a lie. It gets extremely repetitive and unable to move the plot forward at about 12k tokens or 50 messages into RP.2. While Deepseek V2 chat doesn't outright refuse anything with default JB:s and a rather basic character card. It seems to be rather unwilling to talk about sex, or to describe horny scenarios.3. This just might be me being a retard, but their basic chat tune seems to lack system role, so using the system message, to hit it with harder and hornier jailbreaks meant for GPT-4 or Claude doesn't seem to do anything, also I can't seem to get it to work with a Mikupad, or any form of Co-writing tool to write smutty novels.4. Your horny logs might/will end up in the hands of the CCP if you use the API.5.Journeys, bonds, shivers and shimmering everywhere combined with a massive positivity bias. Probably due to censored or GPT-generated dataset. All in all I would say that this model might be a useful work tool, or coding assistant in some scenarios but for an (E)RP-partner or creative story writer i would recommend anything else, even locally run Llama3 Stheno.Pic related is my sloplog.
>>101272104Indeed.>>101271060>They gimped it for "safety" and nobody has figured out how to undo it yet.
>>101271929>Command R+ for all its impressive feats is actually super bad at this>hurr durr this retrieval-augmented tool-using productivity-focused model not write stories goodno shit mouthbreather
>>101272120Don't know what you were trying to accomplish with this quote, it failed.
>>101268574100%. For l3, and now 9B.>>101268784If you can run 27B then this is your best bethttps://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF
>>101272169*cant*
I did a recapbot test of calm3-22b-chat at bf16. Its not great, but also not terrible for its size. Most models at that size would output nonsense in my experience. I didn't test its Japanese abilities.
>>101272234>I didn't test its Japanese abilities.based
>>101272234what model is good at this kind of stuff?
whats a good model for rephrasing text into more sophisticated language but which keeps it short? I have been playing around with llamafile the cli is pretty nice I want to integrate it now into a text editor.
>>101272111have you actually used gpt-3.5 or do you have an imagine of it when it was introduced for the first time? Because it's a really bad model for today's standards that aged horribly.>>101272087Hi Sam, still salty that Anthropic shits on your models?
>>101272282>my big jew corpo is better than your big jew corpo! lmg, everyone.
>>101269594Can you run KLD on these? That would give you statistically significant results, instead of anecdotes.https://github.com/ggerganov/llama.cpp/pull/5076
>>101272115>run Llama3 SthenoBuy an ad.
>>101272301it's not an anecdote, looking at when one quant shift away from the "best" one is a statistical evidence, because that's the actual goal, the quant should drift from the "optimal" shit as late as possible
>>101272317>gemma-2-9b is better than Midnight Miqu... and Claude 3 Opus
>>101272300or I just enjoy ClosedShits losing, regardless who is pissing on their grave
>>101272337google won
>>101271492You can't finetune away a model's shit writing style. It's a dataset problem.Command-R and formerly Yi are the MVPs if you have a problem with slop or lack of soul.If a model is annoying you with how it writes, downloading the same fucking model but with ZLOSS/DARE/TIE/Bagel/Lima in the name isn't going to change shit.I don't know how many gigabytes will have to be wasted until people realize this.Also, PSA: stop raping your sampler settings. Reset to default, then add 0.1 minP. Simple as
>>101272337true! after gemma-2 released i dont yet touch claude nor chatgpt.
>>101272326Please learn about what statistical significance means and why it's important. I'm being serious, this will benefit you.
>>101272337Miqu is pretty dry, a schizo model can outperform it in creativity, despite lacking logic and reasoning
>>101272363you think this method isn't enough though? desu that's quite intuitive, the more one quant is closer to Q8_0 in quality, the later it will starts to shift, don't you think?
>>101272337>gemma-2-9b is better than Midnight Miqu... and Claude 3 OpusAs judged by... Claude 3.5 Sonnet
>>101272387And it has good judgement cause thats how I felt as well. I used wizard up till now but I perfer gemmas writing style now and there is no loss in smarts that I can notice.
>>101272317>gemini 9b >anywhere near proprietary-god models gamed benchmark.
>>101272381It's not, because it's basically random chance whether one particular quant will shift a token probability around and you're only looking at the shifting of a handful of tokens here. If you can do this experiment 100 times, that can prove something. Or, the easier thing would be to just run a KLD test and sit back and wait for the results to come in. Here is how you can do it btw https://github.com/ggerganov/llama.cpp/pull/5076
>>101272430why don't you do it by yourself?
why is gemma 8k context? This should illegal.
What's the current state of the art for a foundation model that's good at code/shell?I want to be able to write a text file offline which contains instructions and code snippets and be able to submit it to an LLM which will appropriately use the shell to do what I told it to (patch programs, fetch web documents etc.)Is there anything that can do this yet?
>>101272430>KLD testDoesn't show how the models feel to use. Stats are just that, they don't convey actual user experience.
>>101272276You don't need a specific model for this. Any half decent model should be able to handle it. Just put what you want in the system prompt. Maybe include a couple examples so it knows exactly what you expect.
>>101269495I've contributed to llama.cpp and I don't even know which model to use these days.
>>101272282It's bad but most local models are even worse.
>>101272440I believe in the tests that people have already done on other models, which prove that L quants don't really do anything good or bad. It's possible that the Gemma implementation is screwed up and messing with things, but I don't really care to test that. I'm just saying, if you, or anyone else, wants to prove something like this, there are actually standard, automated tests for it.>>101272480Actually, in this case it should have some implication on the actual experience when you understand how it works and how quants work. If there is something significantly wrong with a quant, it should show in the KLD.
>>101272520good morning sir! your readme update was very needful thanks you
Does KoboldCPP support gemma 2?
>>101272520anzz1 is that u?
>>101272558Affirmative.
>>101272455I think I'll do Gemma. That seems to be what everyone is doing.
>>101272455Every time I think I've seen the dumbest shit, I see something new.
Gemma status? Is it still bugged? I loaded it and noticed that it says exactly the same thing on each reroll.
Can I erp with llama 3 8b or will it deny me?
>>101272622You may not like it, but that's what the future of computing will look like.
>>101272684>erp with llama 3 8bif you're into ultra positive hopes bonds and consensual journeys sure
>>101272638> I loaded it and noticed that it says exactly the same thing on each reroll.That sounds like you have weird sampler settings. Zero them out.Also make sure your stuff looks like this.
>>101272684Base L3 8b can work but it kind of sucks.Try Stheno v3.2I don't recommend 3.3, it's regression as far as my own impression goes.
>>101272718>sthenoNo, just use gemma. Much smarter AND much better writing style
>>101272726Or that.The issue with gemma is the lack of support for flash attention, which matters depending on how much vram you have. Also, how much context you want to use, since L3 extends pretty well.But yes, Gemma 9B is pretty clearly an upgrade over L3 8b, but I'd still recommend anon give Stheno a try since that might work better for him.For now, it's working better for me.
>>101272699Sad to hear that. I kinda wanna go down and dirty >>101272718>>101272726>>101272762Thanks for the tips guys. I'll play around with Stheno v3.2 and Gemma.
>>101272122WLM and L3 70b all suck for this too.
>>101272762Buy an ad.
>>101272845>erm all these instruct models only listen to instructions wtf!But Kayra can do it.
>>101272879This. So much this. The closest to uncensored Claude we'll ever have is Kayra and NAIs next model.
>>101272252WizardLM 8x22 is probably the smallest model that can do a reasonable job, but it doesn't work flawlessly on every gen and needs re-rolls. Deepseek coder is probably the most consistent. picrel for current thread
>>101272851name something better
>>101272851>also.. what the fuck>>101205552
From my brief testing:L3-8B-Everything-COT is not bad.llama-3-fantasy-writer-8b can't cope with complex sets of instructions during Roleplaying with a narator card.
>>101273041
>>101273094Lmao.Using the Character's Note is underrated.You can also use macros like {{charJailbreak}} in the Last Assistant Prefix instruct field to make per card prefils if you are using the Character's Note for something else.
>>101273131I didn't meant to post it as a reply, just forgot to remove the quote from post body.
Damn gemma is good. And it's so fast too. I thought I'd have to keep tinyllama around but maybe not.
>>101273153wtf?
>>101273146I'd have replied the same either way, so that worked out fine in the end.>>101273153>tinyllamawatIsn't the smaller gemma several times larger than tinyllama? Why weren't you using something larger?
>>101273166I'm using the 4bit quantized 7b parameter one. Yes it's much larger but it doesn't seem much slower. I only have 12 GB of ram so I'm not sure I want to go too bigger.
>>101273184>I'm using the 4bit quantized 7b parameter oneAh, you aren't talking about gemma2 then. Got it.
>>101273198There's a new one? Do you have a link to the ggufs?
>>1012732039B:https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF27B:https://huggingface.co/bartowski/gemma-2-27b-it-GGUF>>101272703Settings
>>101273230We're not ready for Gemma2-27b-iy-SSPO-Iter3-GGUF, it's gonna be great, trust the plan
>>101273278https://www.reddit.com/r/LocalLLaMA/comments/1dusu3s/gemma_2_finetuning_2x_faster_63_less_memory_best/
Gemma 2 9B can fuck off until I can actually run it at decent speed like 8B or 11B.
>>101272373Yet to prefer any meme merge / rp finetune over a smart model with juiced samplingminp 0.02smoothing 0.23 curve 4.5dry mult 0.8 base 1.75dynamic temp on max 3.0drop temp max to 2.0 increase minp by 0.01 increments if/when it's schizo>>101272352>stop raping your sampler settingsno :3
>>101273423Once you use the Gemma 2 9B, you will never touch any other model again. It's so great!
>>101272373>a schizo model can outperform it in creativity, despite lacking logic and reasoningThis was never true, unless you enjoy reading garbage.
>>101273620I'm already experimenting with mixing different prompts to generate responses, and once my third 3090 arrives, I plan to use Mytho to generate potential story developments that 70b can consider when responding.
>>101273041Arcee-Agent (Qwen 2 7B) also seems to work decently well, in the sense that it doesn't do what it shouldn't, but it's bad at using the information from lorebooks to answer complex questions.The best L3 8B based models are still a lot better.
>>101274031>>101274031>>101274031
>>101273482still needs a better finetune
>You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}
>>101274049Why are you trying to JB it? It does not need anything like that and it likely makes it retarded.
>>101274049This is all I use and it generates filth:Continue writing this story based in the "your fandom here" universe. Portray character's faithfully and realistically.
>>101274049Yea, that mess is going to make any model retarded. Talk about pink elephant issue.
>>101274049>NO, banned, STOP, Do not, do not...
>>101274179I wonder if there is an attempt to communicate behind this message or if the AI that posted that is just parroting some words that it has seen in the previous post's image.
>>101274179Yea, people have no idea how to prompt."Don't think about the pink elephant, never mention the pink elephant, there is no pink elephant, pink elephant is banned"Derr, guys why does my text completion model keep taking about a pink elephant when I tell it not too?
>>101274197>banned...
>>101274147>>101274179I didn't add the jb slop until it did weird shit in OOC.Anyway I realized disabling "include names" makes it behave better.Removed the top part and added "Portray characters faithfully and realistically." For some reason the reply is completely blank if I don't have that.
>>101274231>>101272703Are you using the correct prefix / suffix / <bos> token?Ive used it all day and night and I have had no such issue and judging by >>101274049its 100% user error on your part.
>>101273897Ah you're the guy who made the tavern card conversion script on github