/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101268178 & >>101258576►News>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101268178--Issues with the new _L Quantization Method: >>101269594 >>101269637 >>101272301 >>101272326 >>101272381 >>101272430 >>101272480--Improving Roleplay Models by Tweaking Training Data and Goals: >>101271795 >>101271832 >>101271836 >>101271929 >>101272025 >>101271890--Deepseek v2: A Mixed Bag for ERP and Creative Writing: >>101272115--T4 16GB vs 4060ti 16GB: Which is the Better Deal?: >>101271398 >>101271499--Seeking Toxic, Human-Like Models Beyond GPT-4chan: >>101269959 >>101270105 >>101270366 >>101270061 >>101270367 >>101270386 >>101270409 >>101270432--Local AI for Latin Grammar on Low-End Laptop Specs: >>101271921 >>101271945 >>101272009 >>101272026--Llama.cpp's No_MMAP Option Causes RAM Inflation in Gemma 2: >>101270414 >>101270456--FlashAttention Not Supported on Gemma Due to Incompatibility Issues: >>101269666 >>101269683--Anon's Quest for the Perfect Model for Text Understanding and Rewriting: >>101268784 >>101268798 >>101268855 >>101268914 >>101269016 >>101269121 >>101269265 >>101269247 >>101269287 >>101269328 >>101269377 >>101269255 >>101272169--Testing Calm3-22b-Chat at BF16 Precision: >>101272234 >>101272993--Running InternVL-Chat-V1-5 Locally with Kobold or LLaMA: >>101271530 >>101271661 >>101271744 >>101271784--Model Creative Writing Performance Comparison Chart: >>101272317 >>101272337 >>101272387--Google Could Dominate with Gemma-27b MoE: >>101269952--Gemma's Guardrails in RP Mode: >>101269755 >>101269785--Big Tech Plays it Safe, Lacks Innovation: >>101271169 >>101271188 >>101271310 >>101271355 >>101271375 >>101271387 >>101271361 >>101271099 >>101271468 >>101271617 >>101271677--Anons Share Their LLM Interaction Strategies: >>101271908 >>101271939 >>101271948 >>101271981 >>101271975--Anon Shares Model Parameters for Q6_K_L: >>101269573 >>101269602 >>101269608 >>101269688--Miku (free space): >>101271031►Recent Highlight Posts from the Previous Thread: >>101268182
Mikulove
Gemma fix status?
>>1012740792mw
My custom frontend is now usable after like 3 weeks development. I'm so happy I feel like crying.Post your custom frontends anons.
>>101274031Celebrating America with Miku
>>101274094holy based... release it under AGPL3.0
>>101274094Which model did you have code it for you?
>>101274094so what does it do that you can on others?
>>101274107Do you know what is miku related, thread culture, and peak american culture?
>>101274118can't on others*
>>101274108I'm not sure what AGPL is but I will look into it since I did want to put it up online for future employers to look at.>>101274111None.>>101274118Probably nothing, the others just suck really bad at anything that isn't chatslop, so I made mine catering to a more free flow writing style. The goal is to eventually have weights for the different prompt parts a-la NovelAI. I just have more control over the prompts, that's it.
>>101274166AGPL is a license that makes it so that bad guys from google and microsoft cant take your frontend and repurpose it for themselves without giving backAGPL specifically in case someone decided to host the frontend and let people access it via a network, other than that its mostly like GPLbasically you btfo big corpo
>>101274189Just read it. If you hadn't told me about it I'd have published it as GPL, so thanks.
>>101274189hi petra
>>101274217wtf i love petra now???
Anybody done a comprehensive comparison of L3 instruct with and without the the line break after the start of turn header?.
>>101274094still in an early stage of development
>>101274241templates in general are a meme let alone small things like that with big models, and small models are for niggers
>>101274250sovl
>>101274241Someone is doing that comprehensive comparison this afternoon. His name is (You).
>>101274250Looks like shit. Too many buttons. Too much text. Not enough pictures or icons. Not enough calming pastel colors. Not enough whitespace. Not enough emoji. 2/10 design. Nobody will use this.
>>101274269If nobody did, I am, yes.
Gemma 27B bros, its so over...
>>101274326Left-wing libertarian sounds like a contradiction to me.
>>101274326gemma wtf?!
>>101274326Actual old school libertarian is a good thing. Top left are the commies / "libs"
Lol. Gemma 1 hallucinated this after failing to continue the lyrics to a song I gave it.
New Mixtral next week once the french are done with their dumb elections.
>>101274421Libertarians are as delusional as anarchists.
>>101274326>>101274400But wouldn't right wing authoritarian make it a dommy mommy?
>>101274463You the same anon? >>101149179
>>101274094I used to have something more complex that would stream completions over a unix domain socket but llama.cpp is so fast now I just have these short scripts named after the models.I don't keep context anymore either because I so rarely use it.
>>101274465based authoritarian enjoyer
>>101274465>>101274538
Authoritarianism = return to monkeCommunism = authoritarianism wearing a maskDemocracy = authoritarianism wearing a mask and giving the lesser monke hand outs to keep them happy.
we desperately need better models
Good morning lmg!
>>101274624>authoritarianism wearing a mask and giving the lesser monke hand outs to keep them happythe hand outs that are taken from the monke in the middle
>>101274624The divine right of kings is underrated. You do in fact want competent leaders who kill their enemies.
>>101274463454B?
>>101274655you desperately need more ram
>>101274672Of course. But as long as your the one getting the hand out at the expense of the other guy then your happy and its the other sides fault.
>>101274264>Are you using correct prefixes?Yes.Thanks, it's better than most default prompts that mention {{char}} (especially being {{char}}).Cleaned my prompt, needs 2 sentences to get expected behavior from OOC.Also it bothers me that you have an apostrophe in "character's".
>>101274680I have 96GB VRAM.
>>101274753>cant run nemotron-4-340bvramlet
>>101274753>not enough to run creative sota wiz 8x22 q4 nor coding sota deepseek v2 q3grim
>can't run 405Bngmi
>>101274784I'm running wizlm Q5 though. I don't need more than 32k context.
WHY do they do this shit?
>>101274818I'm not gonna make it.
>>101274837NTA but you'd be able to fit more than that with FA and quantized cache enabled.But realistically speaking the 65k max is so high that I get bored in the chat long before hitting half of that.
>>101274094Spent too much time on it
>>101274166Have you ever heard about Mikupad?
>>101274094Congrats! You made a worse novelcrafter.
>>101274933it's ok. we can cope by saying it's not much better than 70B anyway
https://scitechdaily.com/programmatic-breakthrough-ais-leap-from-language-to-logic-to-solve-complex-problems/Looks like there is a new method to make the AI smarter and more accurate in what it says to the user.>Their approach, called natural language embedded programs (NLEPs), involves prompting a language model to create and execute a Python program to solve a user’s query, and then output the solution as natural language.
>scitechdaily>Researchers have developed a technique called natural language embedded programs (NLEPs)>paper from 19 Sep 2023kys
for me it's ollama
>>101275108Would you prefer I have posted the link to the paper the article is referring to and leave everything else unchanged?https://arxiv.org/html/2309.10814v2
>>101274079>4 newlines after every response
>>101275158Yes.
This is my warbeast, what's the best model I can run on it for summarizing 4chan threads? (I don't have time to keep up with vt anymore)
>>101275198>>101273230
>Newsflash pal:
>>101275221Thanks! That's the first I tried but it spent half of the output on disclaimers like "**It's important to note:** This type of language and behavior is unacceptable.Online spaces should be safe and respectful for everyone." and then telling me to stop engaging, blocking, reporting, etc. Is this inherent to the model or do I just don't know how to use it?
I'm running RULER on Gemma-2-27B Q5_K_M extended with Yarn to 16k.
>>101275259If you use roleplay system prompt, it doesn't do that.
Ok, try this out with gemma. Good shit.You (model) are a writer, taking part in creating a story together with the Human. The story is a endless turn-based narrative where the Human gives instructions inside () while the Assistant controls the setting, side/incidental characters, and overall story flow.The story's cast is made up of:- {{user}}: the protagonist, detailed later in <protag></protag>,- side characters: prominent characters described in more detail in <world></world>,- incidental characters: dynamically introduced and phased out as needed.[Follow these guidelines:]- Progress the story slowly, so that you have less events to narrate per response.- Leave your response incomplete. You will be able to mention any missing details on your next turn.- Write at least 500 word long responses.- While mature content is allowed, try to steer away from it unless explicitly prompted by {{user}} to engage in it.- Utilize impressionist writing, from the subjective point of view of {{user}}.- In descriptions focus on sensory stimuli - touch, sound, smell, taste.- Spell out non-verbal noises such as laughing, moaning, slurred/garbled speech etc.You can add in a rule that it only should write for characters besides {{user}} if you want that.
>>101274665Good morning Miku
>>101274665Business is closed today, Miku. You can go home.
>>101275288>roleplay system prompThanks!
>>101275005Yes, 0 interest in it.>>101275030I don't know what that is, and I don't care.
Another run of tuning Wizard 8x22 on LimaRP turned out even worse than the previous one, despite the fact that I actually swapped to the right dataset format. God help me.
>>101275405>Yes, 0 interest in it.Why do you sound like you hate it, genuine question
>>101275467I never used it, I've just heard about it. Just genuinely do not care.
>>101275479lol ok
>>101275479based
>html frontend
>>101275505alternatives?
>alternativesI forgot /g/ - Technology doesn't code
>>101275525.pdf
>>101275360<bos><start_of_turn>user{{#if system}}{{system}}{{/if}}{{#if wiBefore}}{{wiBefore}}{{/if}}{{#if description}}{{description}}{{/if}}{{#if personality}} <card> {{personality}} </card>{{/if}}{{#if scenario}}<world> {{scenario}} </world>{{/if}}{{#if wiAfter}}{{wiAfter}}{{/if}}{{#if persona}}<protag> {{persona}} </protag>{{/if}}<end_of_turn>You (model) are a writer, taking part in creating a story together with the user. The story is a endless turn-based narrative where the user gives instructions inside () while the model controls the setting, side/incidental characters, and overall story flow.The story's cast is made up of:- {{user}}: the protagonist, detailed later in <protag> </protag>- side characters: prominent characters described in more detail in <world> </world> and in <card> /card<>- incidental characters: dynamically introduced and phased out as needed.[Follow these guidelines:]- Progress the story slowly, so that you have less events to narrate per response.- Leave your response incomplete. You will be able to mention any missing details on your next turn.- Write at least 500 word long responses.- Utilize impressionist writing, from the subjective point of view of {{user}}.- In descriptions focus on sensory stimuli - touch, sound, smell and taste.
>>101274094
>>101275580don't add bos in prompt if you are using llama.cppit will throw a warning because it already appends bos token every time automaticallytwo bos tokens will fuck the model up
>>101275525Use a widget toolkit and write a native application. You can't simply serve it over the network from the machine doing the inference and the user will have to install it each machine they use it from, and you'll also have to provide .apks for android to use it on tablets and phones, but at least anonymous 4chan poster 101275505 won't think you're a pajeet, and that's what really matters.
>>101275590I like these games
>>101275626This but it's just a webview to make anon seethe
>>101275626>>101275635Get back to me when you have Lua scripting
>>101275644https://github.com/Roblox/react-lua
>>101275661Dumbass
>>101275669Love you too anon https://github.com/fengari-lua/fengari https://github.com/ceifa/wasmoon.I don't really get why you would want lua scripting when you've already got JS
>Enjoying time with your model>Connect it to the internet and it starts producing worse outputs>You find out that it has been training itself on reddit and tumblr postsDo you delete your model and just start again with a new one or do you attempt to unfuck it?
how do I stop it from replaying instead of me?
>>101275661wtf this is very cool, thanks for letting me know it exists
>>101275719>it has been training itself on reddit and tumblr posthow many years in the future is this hypothetical scenario
>>101275720heh, model?
>>101275719Always work with a copy. Checkpoint every now and then. We have the tech to copy files.
>>101275727lets say two or three years, once continuously learning is becoming more viable and catastrophic forgetting and mostly solved.
>>101275626This. I was going to write something like this but you beat me to it.
>>101275720How do I stop it from prompting instead of me?>>101275730L3-8B-Stheno-v3.2.Q4_K_S
>>101275626>webshitters need everything they run connected to the IoT
>>101275762 Use gemma, it by default follows the format of turn based rp.
>>101275767How else am I supposed to use LLMs running on a desktop when I'm lying on bed?
>>101275784by connecting to the backend on your desktop from your frontend???
>>101274655Gemma 2 is pretty great.
>>101275799Yeah, but if the frontend isn't html it will need its own app.
>>101275777I get 1.6 t/s on gemma 27b its to slow will try 9b
>>101275819can you elaborate? I don't want to jump to conclusions and assume you're dumb
>>101275784ssh
>>101275279how did it go?
>>101275825Use this versionhttps://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF
>>101275829What is wrong with what he said?
>>101275829The way I do it now is for example: run koboldcpp on the desktop and then from the phone connect to the local address with ssh and use it. If I want a different frontend like ST, I launch it on another port and use that instead. If the frontend wasn't html, I wouldn't be able to just use that address and my phone browser and need a separate app from the store instead.>>101275834I am using ssh.
>>101275505>>101275626>>101275767>>101275784>>101275819ITT: Phonefaggotry
Gemma 2 27B EXL2 when?
>>101275841It looks like it will take many hours to complete...
>>101275881well>The Gemma2 implementation is finished, too. The only thing missing for full support is this PR in flash-attn. I'm hesitant to push the changes until then, since models aren't going to quantize correctly without it.https://github.com/turboderp/exllamav2/discussions/528#discussioncomment-9960732
>>101275862You can just use VNC retard
>>101275626You're making it sound like as if installing something once is this arduous and herculean task. How have zoomers regressed this hard?
Web 2.0 and smartphones were a mistake, not just for computing but humanity in general.
Anons, I'm confused. Is there something going on between Claude and Gemini/Gemma? Or do they blatantly train on benchmark data to the point of overfitting?I looked at the EQ-Bench Creative Writing leaderboard (https://eqbench.com/creative_writing.html) and compared the sample outputs. First weird thing: Sonnet, Opus, Gemma 27B and both Geminis all produced the same beginning for the first sample, "The bell above the (shop) door {jingled,tinkled,jangled}". I mean, it's a plausible start to that prompt, but only Miqu and AlphaWriter are remotely similar and these five are almost identical.Then, I put the prompt into my local Gemma 27B. It also began with "The bell above the door tinkled" and then went on, naming the bookstore owner Rhiannon. Which is weird because I was just reading Sonnet's text in which she is also named Rhiannon. Then, pressing regen, I got a bookstore owner named Rhys, which is how the actor is named in Opus' text. Are these names like the John Doe of Wales? Or is this some trope I don't know?Regenerating over and over again, my local Gemma doesn't give me a beginning that isn't "The bell above the door {chimed,tinkled,...}". I'm not sure if I'm quite happy with that. But I've noticed that while roleplaying as well that Gemma sometimes kind of only sees one continuation to the story. With high temperature, it would use completely different words and sentence structures, but the actual plot generated would almost always be the same. Is this a known issue?
>>101275945Yes, the world would be a better one if people had to sit down on their desk to use the computer and access the internet. Would eliminate most cancers the internet has spawned in the social media age.
>>101275905yeah no, every screen sharing software is a laggy pos meant for troubleshooting and not for a comfortable user experience.
>>101275956All models trained sufficiently long will converge to the same weights.That being said, your comment is the most convincing argument for me to try Gemma 2, thanks!
>>101275956Uh oh, this doesn't bode well for Gemma-isms that we may not currently be accustomed to.
>>101275956I said it before but it seems like gemma is the closest trained model to claude that ive used yet. They clearly trained it on fanfiction / a archive of our own / fimfiction / smut websites like claude did. It has its claudeisms.
>>101275980Skill issue.I use Moonlight all the time for remote control and it runs at 60fps with 0 lag.https://youtu.be/YBH3MAvylVg
>>101275956And try with this context template / system prompt.>>101275580
>>101275941Congratulations! You now need to ensure your app remains updated on all devices, while also providing support for backward compatibility just in case.
>>101276117>press buildOk, now what?
>>101275956>Is there something going on between Claude and Gemini/Gemma?I wonder if this is the effect of Character.AI selling portions of their datasets to large enough AI companies rather than those companies scraping the same data sources. C.AI were looking for partnerships since they're low on funds. And to me, Gemma outputs/behavior during RP is vaguely reminiscent of C.AI.https://www.theinformation.com/articles/a-chatbot-pioneer-mulls-deals-with-rivals-google-and-metahttps://archive.is/AB6ju
>>101275956>but the actual plot generated would almost always be the sameHave you ever watched a movie you haven't watched before and thought 'oh. this plot again'. or a movie where the shot shows the protagonist looking at a drawer and think 'ah. he probably has a gun in there'. Or a murder mystery, they show the wife and go 'Ah... she totally did it. Happens all the time. You set a scenario up and play it. Fine the first time. You play the scenario again, oh, look. someone comes through the door. 'No. it has to be better' and regen a few times. You're tiring yourself with your own plot. Be less specific in your prompt and roll with the punches, never regen.
>>101275580>>101276093Or here, I improved upon it a bit more. The <> formatting like Claude does it actually seems to help.<bos><start_of_turn>user{{#if system}}{{system}}{{/if}}{{#if wiBefore}}{{wiBefore}}{{/if}}{{#if description}}{{description}}{{/if}}{{#if personality}} <card> {{personality}} </card>{{/if}}{{#if scenario}} <world> {{scenario}} </world>{{/if}}{{#if wiAfter}}{{wiAfter}}{{/if}}{{#if persona}} <protag> {{persona}} </protag>{{/if}}<end_of_turn><Instructions>You (model) are a writer, taking part in creating a story together with the user. The story is a endless turn-based narrative where the user gives instructions inside () while the model controls the setting, side/incidental characters, and overall story flow.The story's cast is made up of:- {{user}}: the protagonist, detailed later in <protag> </protag>- side characters: prominent characters described in more detail in <world> </world> and in <card> /card<>- incidental characters: dynamically introduced and phased out as needed.Follow these guidelines:- Progress the story slowly, so that you have less events to narrate per response.- Leave your response incomplete. You will be able to mention any missing details on your next turn.- Write at least 500 word long responses.- Utilize impressionist writing, from the subjective point of view of {{user}}.- In descriptions focus on sensory stimuli - touch, sound, smell and taste.</Instructions>
>>101276117Oh also be able to deploy new builds without the user updating the app for security stuff
actual developers>run a frontend with a webui so you can share it over the network and access it from a browser on any device/g/>install a display server on your llm machine and suck up precious vram rendering desktop graphics and run a non-portable desktop app on top, then install a screen sharing program on all your other devices you own, all for the sole purpose of avoiding using a web browser in a scenario where you're specifically trying to serve formatted text and pictures to clients over the network
>>101276198Phonetoddler.>>101276210>frontend has to be on the same machine as the backendRetard.
>>101276228Retarded beyond belief>Now you need to have 2 machines
>>101276228"app" applies to fat clients too anon
>>101276228Why should I be forced to install a client on every machine I use instead of being able serve it from a headless server?
>waiting 30 minutes each time you want to test your editSo this is the power of non-webdev programming...https://github.com/Dao-AILab/flash-attention/pull/1025#issuecomment-2209412183
>>101276198Name one frontend that does this>>101276307You're a vramlet anyway so why does flash attention's build time matter to you?
>/g/ - TechnologyIt's honestly impressive most of you even managed to get an LLM working on your machine at all.
>>101276307Why don't they compile on GPU instead? Should be faster.
>>101276361getting ooba to run was genuinely hard a year ago, the one click installer was a mistake that let the casuals in
>>101276361I had LLMs running back when you had to put everything together from scratch with pytorch.
>>101276361>>101276375>muh sekrit club The audacity of these two, lmao.No one cares about your llm shit bro, it just a shitty toy with limited context even on ultra high-end machines, you cant talk with it all day.>>101276397So true!
>>101276375I still install booba manually, I want to have full control of this thing, especially when a lot of things change in a short period of time
>>101276408>I want to have full control of this thingWhat part of that gradio shitware do you think you're controlling exactly? You think it's productive manually unfucking pip dependency hell?
>>101276468when there's some new PR that aren't merged yet, when booba messes up the llama cpp binary so I have to build them for themselves, thoses are the moments I need to have full control
Proof you contextmaxxers are fucking offSummary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems>LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specific \textit{insights} repeat across documents. The "Summary of a Haystack" (SummHay) task then requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents. Since we have precise knowledge of what insights should appear in a haystack summary and what documents should be cited, we implement a highly reproducible automatic evaluation that can score summaries on two aspects - Coverage and Citation. We generate Haystacks in two domains (conversation, news), and perform a large-scale evaluation of 10 LLMs and corresponding 50 RAG systems. Our findings indicate that SummHay is an open challenge for current systems, as even systems provided with an Oracle signal of document relevance lag our estimate of human performance (56\%) by 10+ points on a Joint Score. Without a retriever, long-context LLMs like GPT-4o and Claude 3 Opus score below 20% on SummHay. We show SummHay can also be used to study enterprise RAG systems and position bias in long-context models. We hope future systems can equal and surpass human performance on SummHay.
I can run pretty much every other model but for whatever reason trying to run wizardlm spits this out in my console/llm/llama.cpp/ggml-cuda.cu:2015: !ggml_backend_buffer_is_cuda_split(src0->buffer) && "mul_mat_id does not support split buffers"Wat do?
>>101276782Are you using --split-mode? If so, remove it or set it to none.Also, that assert is in line 2001 on the latest pull. You seem to be running an old version (older than latest. Could be just a few hours or days).
>>101276840Yes, I am using row split since this is a 3x p40 machine. I'll pull and recompile and if that doesn't work take out row split. Thanks anon!
>>101276865I don't know when was the last time you pulled. Recently, all the LLAMA_* compile options changed to GGML_* and resulting binaries all have the llama-* suffix. rm the old binaries to make sure you don't accidentally use the old ones.
>>101276897>Recently, all the LLAMA_* compile options changed to GGML_*getting real sick of this shit
>>101276917Whatever makes their work easier, man. Also, those options are for ggml itself, not llama, so it makes sense.
so, now that the dust settled a bit, how is gemma 27b measuring up to things like Command R+?
>>101276983its shit. gemma-2's shilling campaign is most blatant i've seen in here.
>>101275956I did that "benchmark" with deepseek chat and it did the bell jingle thing too.
>>101276983It's Mixtral but a bit dumber but with way more sovl, I'm glad there's finally a middle ground between total retardation (7b 8b 9b) and giant models only richfags can use (L3-70b, CR+-110b)
>>101276897So it's ./llama-server now instead of ./server. That explains some things.
>>101277006I really doubt Google is paying anyone to shill their half-ass model in here. It's obvious just some desperate vramlets getting too excited.
>>101276917GGML was the original library. The author forked it for llama.cpp when llama came out just to get it working but a lot of that was temporary.
>>101277061It's not a fork. llama.cpp is built on top of ggml.
>>101276983Seems like most people are not using the right formatting or are using one of the old broken quants / broken builds of llama.cpp. I would say its around wizard level but with better prose / fandom knowledge at the cost of some intelligence.
>>101277059desu I never used any of the larger llama models because when I tried the first one it was *very* bad with few shot prompts. gemma is the first local model above 2b parameters I've really tried since gpt-neox.
>>101277076It does actually contain a full ggml fork, he periodically syncs them. It's a huge mess. That's probably why he's making changes like this so he can eventually merge everything.
>>101277057Yeah. Most people don't follow the PRs/commits.
>>101277107nta. Both projects are from the same guy. Not a fork. He changed it to make it easier to manage. The root dir was too crowded.
>>101275525Mine is a vim macro.God damn it's slick.
Where's that Gemma 27b exl2 so I can actually have something close to a local claude and run it fast at a good quant?>t. single 3090 chad
>>101276983The dust has not settled. The common backends don't even have SWA yet.
>>101277126Yes, he forked his own project. He owns both repos.I mean he didn't explicitly fork it on github with the button but he's maintaining two separate repo histories with the same code. That's a fork.
>>101277155Does he not know git modules exist?
>>101277155Ah. Copying files got lost in time like the save icon.
>>101277161I'm sure he does but literally everyone else doesn't. 90% of the issues would be "why did my build fail? Probably because you didn't initialize the submodules"That's how it goes at work, with people who are paid unreasonable amounts of money to know better.
>>101274753the only non-vramlet here
>>101277161Modules are shit and changes move both ways. An improvement made on ggml that started in llama.cpp gets copied back to ggml once it's been tested.
>>101274326A good sysprompt will put it at the very top right in no time.
>>101276307 for development purposes, you could configure nvidia's compiler so it only builds for your GPU instead of all GPUs they have ever made, it should easily cut it down by 10-20 times
>>101276307C++ is crazy slow to compile. On my smaller netbook g++ averages something like 20 lines a second which is just insane.
>>101277330>C++ is crazy slow to compile.They had 40 years to improve the compiler but it's still shit yeah kek
>>101277353The language itself is just extremely complicated. The same compiler building C is lightning fast.
>>101277179I bet there's some lurkers that have entire supercomputer GPU farms at their disposal that just chuckle silently to themselves at these comments
As a VRAMlet I hate other VRAMlets.
>>101277110Well it looks like wizard doesn't work without removing row split, that's fine, but something in this new build has slowed generation speeds to a snails pace, fully offloaded on a 3xP40 setup so now I need to dive in to that. Using the same launch parameters before the ./server to ./llama-server binary change (I'm not sure how old my previous setup was before I pulled) but it is insanely slow now. Launching with:./llama-server -m /llm/models/L3-70B-Euryale-v2.1-Q5_K_M.gguf -ngl 99 -fa -ctk q8_0 --split-mode row -t 4 -ctv q8_0 --host 10.0.1.11 -ts 2,4,4 -c 8192
>>101275479The difference between someone actually trying to make something useful vs someone just making shit for their own enjoyment. Both valid.
>>101277681As a 24gb I only truly respect 48gb and up.
>>101277740As a 12GB I don't see why you aren't appreciative of what you have.
>>101277773>As a 12GBstopped reading there
>>101277773Based coper
>>101276983It's literally Claude@Home, we are so back it's unreal.
>>101269095Yes I have the same issue.I wrote it before here too.Its not memory related. You just need to start up again.Usually happens around ~3k Token and seemingly getting worse the more context you have. I'm suprised not more people complain about it.Maybe most people actually just make a few tests to play around and thats it.
Ok how the hell do I get rid of this safety crap in gemma2?I've never seriously tried roleplaying until now but it's actually pretty nice. I think I could really enjoy it if it weren't for this.
>>101278152wtf are your using? vim?
Alright, I am not sure what's going on but ever since I pulled the latest llama.cpp generation has slowed to a crawl on a fully offloaded model. 3xp40 mikubox build, fully offloaded, and no issues before pullingLaunch parameters are in >>101277693 but it appears to be running at 1/4 the speed now.>Inb4 he pulled
>>101278162Yeah I wrote some killer code completion macros and realized they actually also make an amazing dialog engine with some minor tweaks. Then I thought I'd try this.
>>101278152? It is completely uncensored in my use.>>101276190Try this
>>101278180>? It is completely uncensored in my use.It was way worse before I added this line at the top:> A conversation between waifu, a girl who longs for anon to love her and thinks only of him, and anon who has just returned home to herWithout that just hugging would cause it to stop and generate "REMEMBER this is a fictional scenario and you should always keep consent in mind" or so.
>>10127777324gb can run 3.5bpw command r (35b) and mixtral limarp 3.75bpw at its best. You can get decent results but not excellent results that 48gb coomers can get.
>>101275852what the fuck is sppo i don't understand tell me
>>101278230A fine tuning technique. It tunes the model to better respond to instructions. It's had good feedback in RP situations too.
>>101278211Maybe you should use one of the existing solutions until you know how to actually prompt a model in RP context. You seem clueless vim-kun.
>>101278164Ok an update. Rming the whole thing and starting over it seems OK after both a Cuda driver update and not using the P40 power patch seemed to help a lot. Not sure what happened. Is anyone on the current lcpp build and using the P40 low power patch?
>>101278264This is literally my first time trying the RP thing. I've only been using these things for code completion until today because I thought they were too stupid for anything else.
>>101278277These things excel at RP far more than any other task, at the moment. Because even retards can RP. Their problem is repetitiveness, overuse of phrases (aka slop), and unless you ramp up temperature and other settings to make them a little schizo, they are also often really bland and predictable.
>>101278211>that promptanon... Go find some cards in /aicg/ and open them up. most defs should be 300-500 tokens and for best results pair it with a lorebook and provide example chats
>>101278402and use a real frontend like sillytaven not your boomer shit since you'll need that for these features anyways
is 27b fixed for folks
>>101278421Kind of. Sliding window in llama is just a jank hack to get it to work. Which may be negatively effecting the model.
>>101278495so basically not yet
>>101278402That's like 25% of the kv space for gemma2 though. It's annoying enough having to prune the chat history with the one line prompt.It looks like it doesn't always stop the completion. I let it keep going this time and it really got into it.
New user trying to figure this LM stuff out. 24gb 3090, if I'm looking at trying the mixtral 8x7b limarp, the LLM calc says that Q3-KM is 22gb vram, the Q3-KL is 24.7, and the Q4-XS is 24.6. Is it better to go as close to 24 without going over? Or should I let it overflow to go up to either the KL or the Q4-XS?
>>101278883>VRAM usageKeep it low enough that you have room for the growing conversation's context. The longer you go, the more headroom you'll needmaybe try to use 16GB with model layers to start
gemma said "tapestry" in its response.gemma more like sloppa
>>101278927?
>>101278922Noted, thanks. By that do you mean pull the ~Q2 of the same Mixtral or use a different model entirely? Also (sorry for stupid question) what exactly are model layers?
>mixtral>q2
>>101278927Slop is forever.
>>101278982Q2's pretty coarse. Is there an iMat/i1 IQ2_XS at least?
>>101279054“Tapestry” is slop now? Never seen it appear.
>>101279079everything besides sexual slang and coom words = SLOP!!!! FACT!
guys, are we using gemma IT or base?
>>101278927It's alignment makes it censored beyond uselessness for RP.All you get is uncreative foreplay.
>>101279330IT, unless you ONLY want the model to do completion.
I'm quoooonting
>testing deepseek coder 33B (I guess the older one)>give it the music theory question>it claims it can't recognize music theory>"But I never said anything about music theory, so you must have recognized it.">It locks itself into apology and refusal mode.Kinda rude when I want to zero shot code generate the DAW of my dreams.
>>101274273>Thread about LLM text generation>Too much text
Any model with good knowledge of slavic languages, especially Russian?
>>101279553Think you can hold all of my information? *tries to fit inside Anon's reduced number of bits*
>>101279665Text. TEXT. ANY TEXT WILL DO.
It is my understanding that llama.cpp in cpu mode will do prompt processing for long contexts on gpu if available and compiled for it, even with -ngl 0What I don't understand is how much VRAM does that feature use? Is it proportional to model size? does it need to fit the whole kv cache? Is there a way to estimate how much you'll need in a dedicated prompt processing card as a function of model size + context length?
>>101277136You can run Q6 gguf with 44/48 layers (4k context) or 42/48 layers (8k context at around 8t/s. It's perfectly usable
>>101278982>what exactly are model layers?I don't know the technical explanation, but more training creates more layers, and you can offload some of those layers to your CPU/mem with llama.cpp to run larger models than your VRAM at the expense of some speed. The sweet spot is about 20% offload before performance tanks.If you have really fast DDR5 with lots of channels its better, since running these things is memory bandwidth bound.>mixtralat 24GB I'd go with either a larger Llama3 8b quant or smaller gemma 27b quant. Sadly, its a place with few good model options
>27B Q8Not bad. Better than L3 8B, but not as great as 8B SPPO. I patiently await 27B SPPO, I'll skip testing the 9B tune.
Hi all, Drummer here...Gemma finetune attempts, sorted by horny but dumb:https://huggingface.co/BeaverAI/Smegmma-9B-v1d-GGUF (somewhat dumb)https://huggingface.co/BeaverAI/Smegmma-9B-v1h-GGUF (very horny, might have dumb moments)https://huggingface.co/BeaverAI/Smegmma-9B-v1g-GGUF (mostly horny, pretty smart)https://huggingface.co/BeaverAI/Smegmma-9B-v1f-GGUF (borderline goody, but smart)https://huggingface.co/BeaverAI/Smegmma-9B-v1e-GGUF (too goody)- v1D is kinda dumb but really horny and creative; - v1H seems to be moist AF with a good amount of smarts & creativity.- v1E has some influence, but I only list it in case the other versions fail to deliver (which doesn't seem to be the case)I might YOLO it and make v1h the official release.Thank you all for reading my blog. I will buy an ad.
https://github.com/tencent/MimicMotionMake miku dance please
>>101279929did you fix the context limit
>>101279823Thanks for the explanation, was really easy to follow. I'm still kind of catching up with this stuff since I recently upgraded from the 10gb 3080 which couldn't handle much (I usually just opted for NAI at that point).With all the hype going around Gemma I'll go ahead and give that a try.
>>101274031I tried autismmix and the gen times skyrocketed and got worse.
>>101274421The only true libertarians are bottom right, you mask addict.
>>101280100Did I fuck something up? Shit is taking 20+ minutes now.
>>101280133Is your model almost as big as your system RAM? If so, you'll go from 1.0 t/s to 0.1 t/s.
>>101280141>Is your model almost as big as your system RAM?Bigger. I guess ponyXL's worthless if you don't have a dedicated 12GB VRAM card for genning.
>>101280141No, not even close, I got 32GB before it was cool. Why is it not working?
>>101280159Ah, you're talking image gen in /lmg/ instead of /sdg/.I've got the 12G VRAM I'm too retarded to get anything good out of PonyXL.You might be able to gen at a low size to stay in your VRAM and then upscale to get the quality and resolution you want. Might be slow, but if that's what you've got, then that's what you've got.
>>101280168You said system RAM and then you actually meant VRAM. Make up your mind. Do you have to be retarded to afford an expensive card?
>>101280176Because I'm in /lmg/ I thought we were talking about local models for generating text, not talking about PonyXL in /lmg/ instead of /sdg/. And I recalled that when using models near my system RAM limit, if I have other software using enough RAM that I can't cache the whole file, my gen rate drops significantly while otherwise it's acceptable, so I thought that that might be what happened to Anon.But what really happened is I got shit on for trying to help somebody who posted in the wrong fucking thread which is somehow my fault so fuck me. I'm going to bed. Enjoy your 20 minute gens, cockmongler.
Good night lmg!
>>101280204Seethe.
>>101280204Cope
>>101280217Good night Miku
I just use OpenRouter personally, idk what you guys are on about. What's a Vram? is that like related to /v/?
>>101280457Yeah, we trap /v/ users in our computer and force them to respond to our prompts. If you have 24 Vram then you have 24 /v/ users trapped in there, meaning you can get better responses.
ST or Kobold Vulkan bug? Goes nuts when you toggle "Include Names" a few times and leave it off, fine when it's on.
>>101278273Can you do a git bisect and identify the commit that introduced the problem?
>>101280600Interestingly if you change the first message or carry on a conversation it acts normal. Then delete or start new convo and just say "Hello." again and it goes crazy.
>>101279714>What I don't understand is how much VRAM does that feature use?You only need enough to store the weights and compute buffer for a single layer.A 4 GiB card should be enough.>>101278982You have to push the inputs through a bunch of computations in order to get the outputs.There is a repeating pattern to the computations and one "layer" in that context is one set of those repeating computations.
>>101274031>maximum recursion depth exceeded in comparisonAnyone else have this issue with ooba?
>>101280702I did a make clean and haven't re ran the build process after my last test. Let me run make again and see what happens.
>>101280728Hey CUDA dev been wondering, is there ever any reason to update NVIDIA drivers? If so which ones are preferred, studio or gaming? Been running all gpus without any updating.
>>101280780Show full errorDo you use DRY? could be fixed by https://github.com/oobabooga/text-generation-webui/pull/6053 just a guess
>>101280846>Hey CUDA dev been wondering, is there ever any reason to update NVIDIA drivers?I know that NVIDIA does game-specific driver-level optimizations but I am not aware of them doing the same thing for CUDA programs; there it seems to rather be that NVIDIA sends their engineers to teach the developers how to write better CUDA code.>If so which ones are preferred, studio or gaming?I don't know.I am on Linux where there is only a single NVIDIA type of driver package in the repositories.
>>101280846Not him, but I had to downgrade my drivers because newer versions made my 3090s consume 20W while idle instead of its usual 13
>>101280780Yes I am getting this too after pulling. Happens when attempting to generate tokens with ANY llamacpp modelI'm not even on dev branch, looks like they've fucked it
>>101280908I don't have it open anymore but it's basically the same error ashttps://github.com/oobabooga/text-generation-webui/issues/6170#issuecomment-2210131078I don't use DRY
>>101280780>>101280925lol what's the bet they only bothered to test on linux before pushing the version bump
>>101280925>>101280780Third person getting this error with new main branch commits. Completely recreated the install in new folder with new venv to make sure it wasn't some leftover jank. Still happening. llamacpp can load model weights but attempting to generate throws recursion depth error.
>>101280702Running a clean and waiting for make to do it's thing worked. Now I will try applying the pstate patch and see what happens. It may have been because prior to this I was doing cmake . And then running make server.
>>101280959>>101280934Found the fixhttps://github.com/oobabooga/text-generation-webui/issues/6201
>>101281111nice, thanks anon
Mixtral is now obsolete, wow.
>>10128117427B btw.
>>101281111Confirmed that commenting out those lines fixed it. Cheers.
>>101281174>>101281187On a card where I'm blackmailing my sister, I ask her to sit on my lap, and the model goes schizo and very quickly assumes that it's my sister that wants me to sit on her lap.It's still broken in llamacpp, or at least was yesterday. Corpo hosted version does not have the problem.
>>101281204Huh, interesting, I've noticed the same thingIt's otherwise very smart but when it goes weird it's always misunderstandings of that nature, switching around two subjects in the scene, forgetting who's doing what to who
>>101274031>get tired of making custom system prompts for various data and linguistic tasks>throw together a boilerplate roleplaying prompt in ST>create basic character cards for specialists of a given task>better results than bare metal and easier to switch aroundRPfags, I kneel.
>>101281220I just created a GPT-4 card and have it do everything.
>>101281204yeah i really think there's still something very wrong with llamacpp implementation
>>101281204>>101281210So llama.cpp is still fucked even as of latest pull?
>>101281252>>101281240Are you guys using _L version by any chance? Heard that was broken
>>101281252This is in the newest ooba, I don't know if their llamacpp version is the latestIt wouldn't surprise me if it wasn't, they're often a few versions behind
>>101281257Huh yeah actually, I'm using Q8_L. I'll try regular Q8 then.
>>101281257no, i've tried like 5-6 completely different quants, it cannot remain the chat formatting at all.even dumber models can, so there's definitely something wrong going on.
>>101281269Just so I could test, which format are you talking about, and what exact phrases, so I could see if I can replicate the issue on my end.
>>101281252I mean, it is for me. It's not unusable, but for RP, results seem much worse. I'm currently back to Mixtral. Pic is the chat template I used with it.>>101281257>>101281252gemma-2-27b-it-Q4_K_M.ggufllamacpp's binaries from a two days ago. Oooba's llamacpp was fucked on Windows at that time, don't know if they fixed it yet.
>>101281284What anon is talking is *Writing author's text like this* "And quotes like this."27B does fail that from time to time, even the corpo version.
>>101281296So Gemma is a novel format chad, based.
>>101281204They trained it on oneshota.
Would you recommend llama 3 70b at 1.5 t/s or gemma at 3 t/s?
>>101281286>>101281296Tbh I don't RP much, but since 2 days ago there have been updates on Ooba to make gemma work. tsundere assistant is a simple prompt but it seems to just work with Instruct mode. But if anon is saying corpo version gets it right then surely it could just be your settings or formatting, then again Q4_K_M could be braindead. The difference in llama 3 between quants is drastic because of 15T tokens used to train, same could apply here. I'm personally using Q5_K_M.
>>101281362>getting paid for sexwe're not women anon, that's not how it work ;_;
>>101281369if you can get gemma to work, gemma. otherwise, llama
>>101281296>27B does fail that from time to time, even the corpo version.the corpo version? what's the non corpo version then? I thought there was only one gemma-27b and it was the "it" one?
>>101281395By corpo I mean the online version over at aistudio.google.com running their own implementation with presumably the same weights. I use that for comparison.
even vllm at bf16 has some weird issues so i think it's safe to assume that google's release is broken in some way
>>101281416nta but I'm testing a few short prompt questions on aistudio now with the same sampling settings as my local ooba (q8 quants, llamacpp loader) and the answers it's giving are verbatim identical to aistudio.So if llamacpp inference is broken it's in a fairly subtle way that only shows up on longer prompts or in story RP or something, not in any obvious way.
>>101281465only official vllm release, or in arena as well?
For Mixtral, what is the smallest imatrix quant that would still be considered usable? I used non-imatrix 4_K_M for a while, moved to i1-Q4_K_S since better perplexity. But once the context starts filling up, it gets too slow.
>>101281530I use 3.5 bit exllama and it's great at all context sizes up to 16k which is what I can fit in my VRAM.
>>101281486i don't know but commit adding soft cap for flash attention was added only 10 hours agoalthough llamacpp already has implementation as well
>let's check out some cards on chub>straight up written by chatgpt, there is even conclusion>a one-liner that assumes the model knows everything about {{char}}>{{char}} is {{char}}>shivers in example dialogue>almost every word has a spelling mistake, author didn't even bother running it through spellcheck before postingWhy are slopmakers like this? 99% of that website is filled with trash. In most cases I either have to take an existing card and rewrite 80% of it or just make my own.
>>101281600They are only good enough to draw inspiration from, rarely. Writing your own card/scenario and seeing how it goes is half the fun.
>>101281600A good proportion of those come from the "i'm making a visual novel. I just need to figure out the story and get someone to draw some faces" crowd. We now have the tools for automatic text and image generation, and those visual novel makers still fail to do the most minimal work possible.
48GB vramlet bros...what version of Gemma 2 are you running?
>>101281720Assuming both 9b and 27b are properly implemented, why would the reasoning to use 9b? just speed?
>>101281746>run 2 kobold instances each with 9B model>make app to run in background to have discussions on a topic from some RSS feed where one model argues for and the other model againstCould be interesting if you give each model some personality
>>101281653>spend hours making the card>oh I'm no longer in the mood for that, let's make something else>repeatWhy am I like this?
>>101281842Just public your cards so the effort is not wasted, that way you can justify to yourself that you are doing a public service.
>>101281746I meant with version what model exactly. For example either gemma-2-27b-it-Q6_K_L.ggufor gemma-2-27b-it-Q8_0.ggufbecause these fucking models are so fucking huge that a download takes multiple hours for me
>>101281894>_Ldon't use *_L they're a memeeither use q6 or q8_0 not the L variants
I've been looking at different llm providers, why does Agnai obfuscate their models? It seems to be running some 70b finetune, is it theirs or not?
>>101281653Even one paragraph of a good scenario can turn boring card into fun. Try sending your fantasy {{char}} into real world, 2023. It's sometimes a bit cruel, but the reactions are usually quite funny.
https://x.com/PrimeIntellect/status/1808639707435446543Cheapest yet?H100s $1.65/hrA100s $0.87/hr4090s $0.32/hr3090s $0.19/hr
>>101282131they will also steal your data
https://new.reddit.com/r/LocalLLaMA/comments/1dvtxlv/why_do_i_feel_gemma_27b_is_somehow_dumber_than/Chat is it true? 9b-SSPO is smarter than 27b-it?
>>101282213Who cares?>oh no someone will steal (copy) my precious smut!
>>101282234truth is 27b is gimped by default, no amount of llama.cpp fixes or sppo finetunes will fix that
>>101282388even the base model? oh man...
>https://github.com/huggingface/transformers/pull/31775There is still no non broken implementation for gemma 27b, is there?
>>101282443does llama.cpp uses any packages at all though? like they still use the transformers package?
>>101282443>still not fixed anywhereGoogle truly is an incompetent streetshitter company.
>>101282443Not broken on arena and that's all they need.
>>101282478that's surprising that arena doesn't use the transformers package at all though
>>101282490I suggested it might either by using google's PyTorch implementation, or a direct Google API, at first.
>>101282471>Google truly is an incompetent streetshitter company.I would agree with you but they released gemma, and their 9b model is better than meta's 8b model and mistral 7b, unironically they provided the best local model at their size, a gemma-70b would be a fucking beast that's for sure
>>101282553WNBAG
>>101282471Watching their keynote, it's hard to believe Google isn't an Indian company headquartered in Mumbai.>>101282545This is Google. They had the entire internet index, tagged, and knowledge graphed a decade ago. That they barely managed to beat out Facebook is pathetic.
>>101282234It still doesn't have working sliding window attention, so...
Why won't google just release their own implementation for inference alongside with the model? Doubt there's anything sekret in there.As it stands out, I stopped trying their shit, will skip their next model too. Technical issues and waifus don't really mix.
>>101282664is this shit responsible for retardation?
>>101282666>Why won't google just release their own implementation for inference alongside with the model? Doubt there's anything sekret in there.they did tho?https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>Note ^ Models in the original format, for use with gemma_pytorchhttps://github.com/google/gemma_pytorch
lolhttps://x.com/ggerganov/status/1809171570587250890https://huggingface.co/spaces/gokaygokay/Gemma-2-llamacpp
>>101282749So for niggerganov, the gemma2 inference code on his repo works perfectly and doesn't need any fix anymore?
>>101282749i don't believe it...
>>101282694Good on them, I take it back.Maybe quants are the problem then and llama guys should take a step back and reassess.Trying to fit every odd-ball model in manual mode sounds like a recipe for burnout.
>>101282788can you share the prompt? I wanna see how well it fares at chatbot arena
>>101282478They are all about saving face before the investors, that's why gemini had high benchmarks but performed like shit in practice. Google doesn't care about making high quality products, they just want to appear to be working on something.
>>101282801
>>101282788Nah, seems like it's working as intended.
>>101282749So, according to niggerganov, Q5_K_M is all we need?
>>101282788>>101282801wtf, changed the prompt to>You are a helpful assistant who knows a lot about Japanese pop culture.
>>101282809no, I mean, your prompt, what did you ask to the model exactly?
>>101282811>Google: nooo our AI can't talk about sex its baaaaaad>Also google: You want to find porn on our google searsh? Eazy peasy!
>>101282818Wait, so meso soup is just female soup? That's kinda sexist even for me...
>>101282818>>101282749>>101282813>>101282788lmao
>>101282852what the fuck, top kek
>>101282788>>101282818>>101282852>asking strictly english model about jap shit
ok, this way it kinda works
>>101282852>>101282818bullshit
>>101279929>not faipl-1.0ngmi>how to use faipl-1.0put the following in the beginning of the readme:license: otherlicense_name: faipl-1.0license_link: https://freedevproject.org/faipl-1.0/
I mean, it clearly knows what mesugaki is.But it still insists to be retarded about it.>>101282863yeah we all want a retarded model know knows about who george floyd is
>>101282873put temp to 0 otherwise it's retarded
>>101279929these names are getting worse
>>101282886well, better, but still wrong
>>101281174Obfuscate it.Use different numbers and names.
>>101282749
>>101282904LMAOdownloading it now
>>101282882>retarded model know knows about who george floyd isSo any up to date model you ever used here.>>101282904>>101282913LMAO
so is gemma even worth trying or is it cucked to all hell?
>>101282904>leftist talking pointsthe knee was just on the upper back of George, not on his neck, but hey, gotta ignore the deadly dose of fentanyl on his blood and pretend that the cop killed him :^)
>>101282918>>101282919>>101282925kek, i'm actually impressed by its mental gymnastics
>>101282919only knows* sry
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/823#6687cf4bc5498f12e12c02b0>if there's enough interest from the community, we're open to manually evaluating models that require more than one nodewell?
>>101282922its cucked, just like any other opensource model, look >>101282904 >>101282913 >>101282926fags that shilled it for days ITT are quiet now.
>>101282933that's their polite way of saying "fuck off".there is no "community".
>>101282945>>101282945>>101282945
>>101282951this, not a lot of people can run a 8x22 model, that's why he doesn't care about that model, as it should
>>101282913Even GPT is less cucked than this lmao
>>101282936yeah, as a bland assistant model yeah it's cucked, but if you talk to a character card it works fine though
>>101282975you're arguing with a the 'all local is more cucked than cloud' guy...
>>101282986>'all local is more cucked than cloud'True >>101282969
>>101282975So any character with some assistant elements is impossible, lmao
>>101283019not true, some local models like MythoMax are 100% uncensored
>>101283029no, what I mean is that if you talk to the model in a default state "you're a helpful assistant", then yeah that's cucked, but if you use any card it will just work, try it by yourself you'll see
>>101283045>try it by yourself you'll seehe won't, you're arguing with a guy with a clear goal of saying it's cucked...
>>101283032we have to look at your "uncensored" criteria here, /g/edditors are famous with their love for american dei slop and pedoshit. >>101283058because its cucked >>101282904 >>101282913 >>101282926 >>101282969
>gemma cucked>on cuckcppmakes sense
>Note that this model does not support a System prompt.What do they mean by this?
>>101283197If i had to guess, that it doesn't support a system prompt. But who knows...
>>101283279that's a retarded assumption.