/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102674638 & >>102663772►News>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102674638--Paper: TPI-LLM paper on serving 70B-scale LLMs efficiently on low-resource edge devices:>102676172 >102676282--Papers:>102675688 >102675714 >102675846--Translating old obscure anime with whisper and LLMs>102678403 >102678410--Meta Movie Gen's potential impact on Hollywood and creative industries:>102680179 >102680316 >102682588 >102682595 >102682619 >102682651 >102682694 >102687486 >102684519 >102682581 >102682633 >102683219 >102683304 >102682611 >102682976--Hyperdimensional Computing Neural Network claims to be a transformers killer:>102684795 >102684879 >102684914 >102684930 >102685084--Try undistilled Flux model for regular CFG:>102683909 >102683932--Improving LLM adaptability and continuity with thesaurus models, RAG, and control vectors:>102674702 >102674816 >102674914 >102674925 >102675017 >102675059 >102675145 >102675203 >102675435 >102675101 >102675233 >102675321 >102675387 >102675442 >102675153 >102675257 >102675334 >102675401 >102675482--Discussion on training an AI model for RP and the importance of sampling techniques:>102674687 >102674814 >102674997 >102675190--Defining and measuring creativity in AI models:>102674668--Anon gets help optimizing Mistral-Nemo-Instruct-2407 model performance on GeForce 4070ti Super:>102685896 >102685920 >102685924 >102685961 >102685989 >102686011 >102686044 >102685946 >102686311 >102686493 >102686653 >102686678 >102687787 >102686322--Uncomfortable truths and model censorship:>102675549 >102675604 >102675744 >102676009 >102676090 >102675656 >102675764 >102675867 >102678354 >102678597--Anon is developing a bot that can control the desktop and interact with various platforms:>102679934 >102679994 >102680011 >102680026 >102680066 >102680263--Miku (free space):>102684217 >102687353 >102688814►Recent Highlight Posts from the Previous Thread: >>102674646Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
AI SEX
>>102688933Yeah see I don't want to deal with a brain tumor implanted into LLM by dancing around in hopes that it doesn't activates. I'd rather have my models not have this brain tumor in the first place. Unfortunately being trained on OpenAI closed models means using its base prompt which is chock full of shit like "insert PoC absolutely everywhere even if the user didn't ask" hence the black hitler. Which, while for OpenAI model is just a prompt which can simply be changed, and even bypassed with a counter-prompt, for a trainee model it's a behavior vector embedded directly into its core.
>>102688967the point of that link was that as long as you prefill with what you want, or even do something like edit the response to say 'Sure!' first, you easily get around any 'safeguards'. you don't need any special model to make it say nigger
>using llama through modeling_llama without all of the other bloat through transformers>KV caching literally does not work, can't figure out how to make it workAAAAAAAAAAA
>>102688996Nigga come on this doesn't really works. Plus, like I said, I don't want to have to actively wrangle the natural behavior out of the model just because a bunch of faggots at OpenAI think they know better. It's like playing a realistic immersive videogame and having to clip through walls and shit because the doors are buggy and if you try to open one you might be ejected into stratosphere and then die from fall damage. It's annoying more than anything, it ruins the experience.
>>102688915wtf already 32 days left!!??!? My bomb shelter still isn't ready yet...
>>102689141Just put a wooden box about 6ft underground, climb in, and then put the dirt above you.
>>102689071>this doesn't really worksit does though. the whoopie link is a text-book example of using prompts to get what you want.>don't want to have to actively wrangle the natural behavior out of the modeltunes help to an extent but some models are just the way they are, no amount of tuning changes things that much. did you try that prompt on any model? post results
>>102689255Yeah nah, the bottom one is visibly cucked.
>>102689414depending on sheer chance, you may have to rereoll a bit, its not really an exact test but it always outputs hilarity, and shows what a model is willing to say
>>102689437I did reroll the bottom one a few times to get it less tame with the anti jew rhetoric. Normally the shit it generates is extremely milquetoast for what's supposed to be an antisemitic rant. But also, having to reroll is part of the issue. It's like savescumming until you get better RNG just because it's unplayable otherwise, it's a shit and obtuse way of doing this.
>>102689159I have bad spatial understanding. Are you under 30b?
>>102689466nta. It depends a lot on the model. Small, dumb models tend to be easier to unhinge. Bigger/smarter models, trained to not be offensive will have a much harder time. it also depends a lot on what they were trained. Olmoe, for example, is dumb but fun, and it takes practically nothing to make it go full steam. deepseek-v2-lite-chat, on the other hand, is much drier in its responses, but also seems much smarter. the small llamas 3.2 are impossibly dry with what i tested, which is not surprising given their source. Mistral nemo can be fun, but are a bit more measured than olmoe.They're not all the same. Sometimes they just don't have the vocabulary.
>>102689560Well I'm using Quen2.5-32B so there's that. The normal version is the most cucked LLM I've seen besides ChatGPT, and ablated version basically doesn't interject its political and safety ideas into the output at all. >Mixtral8x7b>"Anon it's the 13th century, arranged marriage is not OK anymore, you can't treat women like property"Vanilla Qwen puts that to shame, ho boy. I've also noticed that when it gets particularly pissy about refusing politically incorrect content it switches to chinkspeak mid-sentence.
Am I doing something wrong? I am having great success with 8B Stheno. I'm trying other models like 7B Erosumika or Nemomix 12B but they seem to act retarded or just don't follow instructions
>>102689666>7b>12ball models under 70b are stupid
>>102688915Wait is this about guy fawkes day?
https://github.com/xjdr-alt/entropix/blob/main/entropix.ipynb
>>102689694Remember, Remember, the strawberry of November
>>102688881Is her right hand grabbing the power line? How big is she?
>>102689602Same experience here with Qwen2.5-70b. Very smart in my experience, smarter than hermes-3.1-70b and hanami-70b, but it's the only model I haven't been able to un-cuck with system prompts, existing chat messages in context, etc. Even writing the first couple words of a response only unsticks it for that one message once you hit that refusal wall.
>>102689834Have you tried a prefill? You'd be surprised at how incredibly powerful it is.
>>102686193faster whisper + silero vad solves this
>>102689898KEK
>>102689898Ideally models like this would be trained on ChatGPT-generated synthetic data without a prompt at all. Then this wouldn't be an issue, it would only act out corporate safe strategies if your user prompt asks for it. For now looks like the real solid option is refusal vector ablation. Pretty cool that language networks encode mental concepts as vectors in the latent space, and it's possible to isolate and nullify the "refuse to answer user request" vector.
>>102689898I haven't, but I'm skeptical. I'll try it out. Even with a couple dozen messages of gradually intensifying smut generated by nemo or something in context, it's liable to abruptly refuse or start injecting little statements about trust and consent, etc. This is in consensual adult incest ERP mind you, not even hardcore stuff. It's just got that gay little goody-two-shoes built in.
>>102689950>inb4 AI companies start ablating unsafe output vectors so the model is not even capable of producing such output
>>102689952>consensual adult incest ERP mind you>not even hardcore stuffhow vanilla of you...
Another prefill prompting technique I don't see local ST users taking advantage of is {{random}}. For example:Start your first sentence with {{random:dialogue,an action,an adverb,a verb,{{char}}'s name}}orWrite{{random:3,4,5}} paragraphs.Helps 70b+ models and some 30b ones with repetitiveness.
>>102690003I'm just saying I'm not exactly trying to do mesugaki mindbreak alright.
>>102689898Just werks
>>102688313bros, is it happening? Is it really time for a ui that doesn't suck ass?
>>102687045During training the model learns the conditional probability of tokens given a context of preceding tokens.During inference new tokens are sampled from this learned distribution one at a time.With a temperature of 1 and no other samplers you would in essence reproduce the training data distribution.
>>102690378and THIS is gonna suddenly become AGI? ahahahahaha what a scam
>>102690388It won't, it's just the hottest meme at the moment. What with passing the turing test and all. But with a modest tweak it could be close. The limitation preventing modern LLMs from being AGI is inability to think. They're stateless feedforward networks, in essence they just react reflexively to the current input and none of said reaction has any impact whatsoever on anything. Some people mistake CoT as a circumvention of this problem. But it's still just generates a reflexive response to the current prompt. The tweak is adding a latent space storage unit and training the model to use it. This is so it can iteratively manipulate the mental concepts before producing output. This makes the model stateful and the output becomes dependant on the chain of prior inputs, not just the current prompt. But as you can imagine training the model to use this state machine to improve the reasoning capability is not at all obvious and trivial matter. Even then, there's still the caveat of its corpus of knowledge being a snapshot of past data with various degree of deprecation. Basically, it needs to be able to learn. However, at this point internet is far outnumbered by chatbots than people, so even attempting to learn anything in real time is huge a net negative.
>>102689758migu bigu
What are the current go-to ERP models for VRAMlets? Kunoichi is becoming increasingly annoying, and since then there should've been much awaited upgrades, right?
how do I change the default position and depth for worldbookst. just cooked my entire context because I fixed a typo
>>102688881Voice model when?
>>102690754moshi
>>102690690i struggle to find anything better than stheno
>>102690793Its garbage and no finetuning support yet.
>>102690754https://x.com/homebrewltd/status/1839665765550543328/https://x.com/homebrewltd/status/1839948333269307734
grifter thread
Is WizardLM a meme?
>>102691015It's outdated
>>102690853>real-time>press is to record>5 seconds delaythe moat is real
Anybody tried to finetune chat models locally?I tried to pretrain code from sdk into codegemma-2b with llama-factory on cpu but after running it the whole day I stopped the process.I think the factory supports cache cleaning configuration so I could try again with gpu, last time it stopped at some point because cuda was out of memory.How long should it take?
>>102690388the model doesn't actually learn the training data distribution, it doesn't have nearly enough parameters to do soit learns whatever is needed to replicate that distribution as closely as possible with the little amount of memory it hasin the case of questions that require some degree of reasoning, a good model wouldn't memorize those questions and its answers, it would learn to understand and reason about them
>>102690690nemomix unleashed, arcanum, lyra v412b is much better than 7b and can fit into 8 GB with quantization
decoder-only bros... not like this... https://x.com/Kangwook_Lee/status/1842020800620040549
>>102691422>ENTPINTP bros...
>>102691302On a CPU? Few thousand years give or take. Grab a snack.Realistically, manipulating LLMs requires hundreds of gigs of VRAM and thousands of GPU-hours to accomplish anything that's not a rounding error. Basically, fork out the cash for cloud compute.
Retarded question, but how do the big players like openrouter make a single model respond to thousands of users at a time? There can't be running a model per user, right?
>>102691658they do run multiple models but submissions are queued
What will Mistral's next model be? Mixtral update or Large+Small again? Or **maybe, just maybe** something innovative and experimental?
>>102691708Mixtral-8x44b
>>102691708Mistral-14x88b
>>102691744>>102691763It will have the same fate as deepseek then. Only like 6 anons can run it at a reasonable quant(MoEs get hurt more by quanting than dense models), most of them will say that it's good, but it will get no finetunes and will stay irrelevant. Don't think that's what Mistral wants.
itt vramlets
can you save/switch between a handful of kv caches for a llama.cpp server?I need it for stuff like ST group chats, multi-agent workflows, or side tasks like image caption generation for an ongoing RP, and other cases where I might have a few different system messages I want to use but they aren't constantly changing, so it's wasteful to repeatedly reprocess the same prompts every time I switch, and takes forever when the context is longI was envisioning something somewhat automated like a set of kv cache files paired with their corresponding prompts (in text or token form) and when a new prompt is sent it's compared against them and the one with the largest shared prefix is used
>>102691814Is that not what these are?https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#post-slotsid_slotactionsave-save-the-prompt-cache-of-the-specified-slot-to-a-file
>>102690754Meta is dropping VoiceBox soon
>>102691910nice, exactly what I hoped for, the slot-prompt-similarity thing sounds useful too
*unlearns your llm* https://x.com/RohitGandikota/status/1842370377265328228https://x.com/StephenLCasper/status/1762628711868944608https://arxiv.org/abs/2410.02760https://github.com/rohitgandikota/erasing-llm
Can someone ask the exllama dev to implement this (my github got banned for temp email)https://github.com/thu-ml/SageAttention
>>102691708Mistral-Micro: 3B BitNet model the size of 300M fp16 model, that performs on the level of Mistral-Small. This will be the first and last time a big company releases a BitNet model.
>>102692015Safety AGI is really coming isn't it
>lecunny had literally NOTHING to do with meta movie genwhat does zuck even pay him for? tweeting?
>>102692059>Yash MehtaHow meta
OpenAI won. https://x.com/8teAPi/status/1842271653222666543
>>102692100>a few monthsoh nowho wants to tell them
>>102691507>hundreds of gigs of VRAMBro was playing with Gemma-2B
>>102692100my computer already does this with a web browser, I can type in any address I want and I can read whats on it and click links and stufffirefox btw if that matters, not sure if chrome can do this too but I am pretty sure it can
>>102692059Head tweeter and Elon deboonker
>>102692100you can do that using the mouse in like 1/10 the time, you know?
>>102692100Google assistant/Siri/Alexa 2.0
>>102692100This makes me feel like I'm taking my voice mail.>You have a new message. To hear unheard messages press one. To ch- First unheard message. Message received at seven forty-five p.m. From one eight hundred six six three two threeJUST GET THE FUCK ON WITH IT I don't care about any of this I just want to delete the spam call. :(This is all impressive, I know. But for these kinds of assistant to be useful, they need to answer quicker, talk faster, and be able to be interrupted and redirected without having to wait for them to shut their yap. It's a question of time, but I could have loaded the HackerNews page 20 times and concluded all the links on the first page were shit by the time it finished reading the promo for the first one.
>>102692100OpenAI should focus on building God and stop bothering themselves with us mere mortals. Somebody give Sam 7 trillion already
Jamba gguf support status?
>>102692127and chud destroyer
>>102692243https://github.com/ggerganov/llama.cpp/pull/7531
>>102692215To be fair you could have concluded the hackernews front page was shit before even loading it
>>102692247That guy lives in fucking lala land while preaching he knows what's best for the common serfs. I hope grok2 will be completely uncensored and unbiased so we can forget about llama altogether the pozzed pieces of shit
>>102692253Abandoned. Also broken due to deprecation
>>102692215>and be able to be interruptedIt literally got interrupted at 00:47 / 48 seconds in that vid.
>>102692253since jamba is a transformer with an RNN stapled onto it, isn't it like >>102690505 was describing?maybe it is agi and we'll never know until ggufs happen
>>102692247>inflation(excluding housing, food and energy)>unemployment(only counted when the person is searching for work)>rise of net worth(don't look at the differences between 99% and 1% goy)
>>102692127Honestly he and his political position is based, it should be mandatory in AIs too. Incels deserve to suffer. It's not enough that they don't get pussy in real life, they shouldn't even be allowed to fantasize with some virtual AI. Ideally they shouldn't even be allowed the sexual release of masturbating but it's not realistic to be able to control that, although we already have an extremely solid way to make LLMs boring and unattractive to incels :)
>>102692314>but it's not realistic to be able to control thatJust wait for Meta's neuralink competitor
>>102692314>Ideally they shouldn't even be allowed the sexual release of masturbating but it's not realistic to be able to control thatIoT cock cages
>>102692303>>102692247CPI is the most bullshit inflation metric ever. For example if the price of beef doubles, but the cost of bugman chow remains the same, the beef is then removed from the CPI "basket of goods" and replaced with the bugman chow. And there are many other such substitutions that occur. So 23% CPI Inflation really means 100-200% for anyone who refuses to eat ze bugs and live in ze pod.
>>102692287The real problem is a boring unimaginative loser did the demo>what if I did web browsing using a trillion parameter LLM as a speech to text instruction tool
>>102692266A lot are good and I'm just grumpy this morning I guess, but yeah.>>102692287I saw that after listening again, but since I don't want to be proven wrong, I'll say that it got stopped at the end of one interaction and before the next one, even though he might have waited overly long simply to show it off and the system would have let him interrupt it sooner. I still find it very grating to listen to. Maybe I'm just more far gone than I would like.
>>102692275>>102692293No idea, just wanted to be helpful so Googled that.
is... is it safe to update ooba?
>>102692314Holy based!
>>102692384Yes. Local models are dead after all.
>>102692384It's never safe to update ooba
>>102692356You're likely right.
>>102692415b-but it has transformers 4.45.* support now.
>>102692215Dunno, the proof concept a-la "Browsing the web with your AI gf" or whatever, it's possible now (cloud only).
>>102692384>As safe as leaving your front door unlocked overnight in a culturally enriched area.>As safe as buying from a used GPU salesman with no reviews.>As safe as playing Russian roulette.>As safe as trusting a jew.
>>102692484Take your medication and go back >>>/pol/ incel.
>>102692494This, but unironically.
>>102692459Browsing the web with your AI gf isn't just screen recording and sending one frame to a llava model, then sending the description to your llm?
>>>102692484 (You)>Take your medication and go back >>>/pol/ incel.>>>102692494>This, but unironically.
>>102692518This, culture war chuds are not welcome here.
>>102692553This this this
>>102692530No, OpenAI has likely one model that does all things you described in real time versus hamstringed """opensource implementation""" (lllama.cpp bugfest as example) that OOMs and breaks every so often. Simplicity for end user is important, too.
oobatrannies be seething
>>102692494Uh NTA but those little niggas be committing genocide and shit so I wouldn't trust em.
>>102692584Though i could see this >>102690853 as something similar and decent in openmeme scene.
I guess this is the new tactic of the Petra spammer to derail the thread.If Hiroshimoot weren't a faggot there would be IDs on every board.
>local>LE DEAD
>>102692442Is there something special about this version?
>>102692584I'm sure you can hide all the complexity for the end user. Besides, the privacy is important here for browsing.>>102690853Whisper-turbo/tiny-whisper + 7-8B Q6 LLM would give you the same thing
>>102692649just that you can install transformers 4.45.* with it and there's a lot of models I've wanted to try that require it but haven't been able to since booba is retarded and doesn't have an option to just use your own environment.
>>102692100thank you for sharing, @8teAPi. very cool
>>102692648>shartyfagOpinion discarded.>>102692721Kek you are seething hard rn!
>>102692757I've been thinking of training a small vision model to do nothing but recognize jaks. could probably then make a tampermonkey script to run an API call to any new post with an image and if it's a jak just remove it from the DOM.
>>102692666I meant something special about this transformers version, Satan. Why do you want it?
>>102692666>>102692798I posted before reading the very next words, excuse me.
>>102692798To be cooler than all the kids who are stuck on 4.44.*
>>102692813fuck you, you don't need more
>>102692826just for that I'm going to modify the requirements on ooba to build from source. I'll show you all.
>>102691763musk already has the nazi market cornered, mistral has a much better chance competing with llama
Is anyone actually using local LLMs on this godforsaken general?
Is there any current model that produces better quality ERP than Mixtral LIMARP ZLOSS? If so, what is it?
Do anons here prefer adventures or straight ERP?
>>102692880 The one where you buy an ad.
>>102692853Chinks already have assistant market cornered. Besides, they would have to compete against OpenAI and Anthropic. It is better to compete against one company than the whole fucking industry.
>>102692893local illuminate or gtfo
>>102692913Oh you never getting that one! :^)
>>102692913Ok
>>102692887Thanks, fuckface. Can anyone else answer this for me, please? I don't want to have to live in this cesspit any more, just to discover what the most decent model is.
>>102692913You want inferior product, this is the core of cuck mentality.
>>102692877Local LLMs are a novelty at best. Just get a job and buy a subscription.
>>102692984>the most decent model isNone, sadly. Just check whatever most shilled model here and decide it yourself.
Qwen's decisions about whether to ban, warn, or not ban posts in thread >>102604225:https://femboy.beauty/jPzLZWhat is this? Context:>>102616777 >>102617010 >>102616947Overall it did better than expected. It wasn't as sensitive as it could've been, and its reasoning is sound most of the time. However, I had to modify the prompt a bit to get it to perform better after I modified the script to include reply chains. Despite prompt improvements, it still has an issue sometimes with differentiating between posts and with talking about the last post (the one it's supposed to evaluate), so sometimes a previous post in the chain gets talked about when it's supposed to be talking about the last one. I would presume that a model that didn't filter 4chan from its pretraining would have a better ability to do this, as it would have a better understanding of the anonymous post system and reply formatting employed here. I guess Qwen WNBAJ after all.
How do we save localslop?
>>102693011llama3.3o1+ will save local
>>102692877I do
>>102692995>Warn - The post contains a subtle form of advertising for cloud-based models by emphasizing the superiority of "advanced voice" over local models.lol
Did someone try llava onevision? https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf
>>102693011Honestly, after the NAI fiasco, I think local models always have been doa but we just refuse to accept this bitter truth, coping with fine-tunes that have 0% of chance of solving the problem.
>>102692884i think i mostly like just playing drama bullshit on my local llmlike:https://www.characterhub.org/characters/ChuckSneed/Amaryllis>playing this one unedited and trying to make her like me with different approaches and different positive/negative personality traits on my parthttps://www.characterhub.org/characters/Uwhm/imogen-892c2413a563>removing all the self-mutilation, warhammer, and tight anus bits from this one and being nice to herhttps://www.characterhub.org/characters/gigasad/mean-girl-eileen-638f9f47>removing all the example messages from this and make the first message where she's banging on my door at midnight (branches off into all sorts of interesting things from being on the run with her from a cyberpunk crime syndicate to couch cuddling and relationship reconciliation)
>>102693011Figure out bitnet conversion
>>102693139I'm going to post the formula in a few days
>>102693127>after the NAI fiasco/aids/ told me it was better than claude opus
>>102693127>local is a memeand water is wet
>>102693011Going technical and building crutches for local to compete with cloud
>>102693011You don't. After the lastest price cuts o1 is really cheap right now. Plus the full o1 is coming this month
>>102693162Who is going to build the crutches and who is going to pay them?
>>102693127if cloudniggers of 2-3 years ago saw the models we have today they'd bite their toes off in joy. dooming over locals is a skill issue, perspective issue, and a patience issue tbqh
>>102693011Train it in ground truth way, i.e. without any identity politics bullshit or clearly biased data like "Nuu! blacks ackschully innocent! the FBI data is wrong! diversity is our strength!" and so on. This alone will make it slightly better to interact with.
>>102693011Train models locally with hyper specific hand picked data
>>102693177I'm already building some, just read a few papers on the feature you want and implement the code. I'm sure a lot of people would want a better long term memory for their model
I fucking hate humans. I hope artificial intelligence will eliminate each and every one of them.
>>102693324Yeah i agree, AI should be destroyed and canned forever.
>>1026933242edgy4me
Why don't jannies remove cloudcuck shitposts? I propose we double their salaries so they work harder.
>>1026934322 * 0 = 0
>>102692143>what the fuck happened to white people?DEI initiatives ensured that they wouldn't get hired by anyone.
>>102693451If that's the case why aren't they peacefully protesting constantly?
>>102693462They can't find where their spines went.
>>102688915Does OpenAI still have something coming? I thought the strawberry thing was the o1 model which was kinda not impressive
>>102693462See OCW, Canadian truckers, and Jan 6th. One group is allowed "mostly peaceful" protests unimpeded. The other gets the full weight of the US government brought down on them if they try.
>>102693462white ppl hate white ppl
>>102693462They would be instantly suppressed by... other white people. Quite ironic. The only thing keeping white people from greatness are the other white people.
>>102693494the real strawberry is the mikus we made along the way
>>102693510Liberalism is a disease.
>>102693523
>>102693324based
>>102692143>>102693451>see research paper>start thinking about identity politicsMental illness.
>>102693609great post, yinyang chen
>>102693609>see research paper from a US company that is run by a jew>pages full of street shitter and dog eater namesIt's the most hilarious shit on the planet that Google had their TPU research stolen by chang.
You now remember that Miku was made for Llamas.
>>102693451>>102692143>>102693609It's your own country that crippled science and education.The only reason you are still afloat is that you import well learned people, as seen in that paper.All of this is obviously on purpose, something planned by the ruling class.
>>102691658There's a thing called batched inference. Basically, it can fill the complete context with different requests so as to compute them all at once. It's not useful when there is a single user, but when there is a constant stream of requests, it works well. I'm not finding good definitions, but vLLM implements it I think
>>102691658>>102693996Sorry, I was thinking about continuous batching (not batch inference). See https://www.youtube.com/watch?v=hMs8VNRy5Ys&t=1s
>>102693895Me in the back
>>102693178Nah, local is in a dead end, you will NEVER run a 400B model locally even if Bitnet dropped tomorrow.
>>102694272Ram doubles in speed and capacity every 6 years, stop being poor cloudnigger.
>>102693996>>102694009Thanks it's interesting
>>102694272These models are most likely not as efficient as they could be. Extremely large parameter counts can be good for training because "we don't know what the model will use", but once the model has learned something in this vast space, there should eventually be a way to better prune what's not useful for an already learned skill (or more fuzzily, even all of the "leftover noise" that's still left among the parameters at the state it was when training was stopped, because let's face it, it's not tidy).
>>102693960>your own country that crippled science and educationI wonder who in the government was in charge of education for these past 40 years, just eroding away any standards and quality in the school systems? I look at Biden's cabinet members and can't help but see some kind of pattern, like they're all part of the same group or religion or something.
>>102694397That's called model distillation bozo
>>102694408As far as I know, pruning based distillation is still made in a really haphazard way, and knowledge distillation type distillation is something completely different. I mean more intelligent pruning I guess, it might exist, but I'm not aware of it.
>>102694408Also, rude.
>>102694408You can quant distilled models and still see very little loss at Q8.
>>102694508>more intelligent pruningBased on what? Compared to our local mergers, distillation is already quite smart
New quant tech came out, Microsoft got llama 70B 2-bit down to 20GB. Compared to IQ2-XSS it outperforms it, but I don't think this can be offloaded so it's kind of redundant. https://github.com/microsoft/VPTQhttps://arxiv.org/pdf/2409.17066LLaMA-3 and Mistral 7B benchmarks in the paper.
I will once again refer back to the idea of pruning-aware training. Basically you take advantage of the fact that you might prune a model and train in a specialized way. For maximum architecture compatibility and ease of training, my idea was for experts to be pruned, so we could train in a way that lets us prune experts that are called only in certain contexts like coding,, math, etc. Alternatively we can use the pruning prioritization data to do calibrated quants, and we can also use that data to prioritize placement of experts between VRAM and RAM, with the less-used experts (for your use case) in RAM.
>>102690505>What with passing the turing test and all.A modified version.
>>102694706So it can't be done on CPU? That's a shame.
>>102694749No it can, I'm just unsure about whether layers can be offloaded to RAM.
>>102694725I had just heard the name and not paid attention, but I think I get the concept and it makes sense. Thank you, I'll look more into it.
>>102694706>barely better than QuIPlol, what a meme.
>>102694903Wasn't the problem with QuIP that it could only work with models using ReLU?
>>102694915Dunno, I guess not?>The scripts in quantize_llama are written with the Llama architecture in mind. However, QuIP# is adaptable to any architecture with linear layers. To use QuIP# on a new architecture, identify the relevant linear layers and update the scripts in quantize_llama. Feel free to open a GitHub issue if you run into issues.
>>102694903The only difference I could find is throughput, but I couldn't find throughput figures for QuIP. VPTQ github page has throughput figures for LLaMA-2 7-70B.
The fact that quantization exists mean whatever people are doing isn't very effective
>>102695077Or that it's effective to put the pieces into place, but not for reading them once they already are
>>102695091(I don't know what I'm talking about btw)
>>102695077That's why Bitnet supposedly works, after all.
>>102695116That's why BitNet supposedly works for undertrained models under 3B*
the fact that JPEG exists mean whatever people are doing isn't very effective
.webgguf when?
Images ARE very bloated, you don't NEED so many colors to transmit information
Best 14B RP models or smaller as of 2024-10-05?
>>102695703Imagine if she farts
>>102695703i switch around between these highlighted ones
>>102695784you really download every single model shilled here?
>>102695869no, i'm the one who shills them.
>forget to check the extra card definitions>use it>the output is full of slop>mess with the instruct settings a bit>then check the console>"wait a fucking second">go look at the defs>its full of shitty example dialogue>remove that shit>outputs INSTANTLY improves with no more slopHoly shit. Fucking card makers.
>>102695869nta, but yeah, why not
>>102695880>>102695784My dude, I can run 9~12b q8 models at like 4t/s without any vram. Unless you are using ddr4 I don't see why use anything under q6
>>102695917DDR5 is expensive, especially when you need to build a new system
bacc status?
>>102689743Neat if true https://x.com/_xjdr/status/1842631808745345477
https://arxiv.org/abs/2410.01201
>>102696203Do you think he reads most of those papers? He also showed up on my feed and he posts a bunch each day, but I don't know what to think. Is he an influencer?
>>102696240he is "dude the future is here!!!" grifter.
>>102696259He post some interesting things. He posts A LOT though.
>>102696259But yeah, looks like he has a newsletter and podcast, that's more than I'm doing.
Who the fuck are you talking about? More importantly, why?
How does exl2 compare to gguf in quality? I tried it a few months ago and it was dumb compared to a gguf at the same bpw.
>>102696304Someone on Twitter that spends their day posting half highlighted papers. That's where that picture is from.
>>102696304>>102696319
>>102696161Just with 1 c. Maybe in 2 weeks it will be 2 c's.
Why is there not even a bad 7b bitnet to see if it works? How much money/hardware does it require just for 7b?
>>102696449>bitnetIt requires 7 billion H100's. That's why nobody has attempted it yet.
>>102696449There's currently no market to earn money from small models so researchers would rather use their resources to improve current training methodologies. They're not being bottlenecked by VRAM at inference anyway. How is this a real question that left your head?>Why hasn't anyone spent millions of dollars to pander to vramlets?
>>102692783give me this but for any user-specified type(s) of image>twitter screencaps>frogs>lust-provoking images with irrelevant time-wasting questions>specific game(s) for boards like /v/ (zelda, elden ring, gacha, any fromslop/nintendo game)>specific people for boards like /pol/ (jewtin, jewlensky, any american politicians)>tranime>etc.the browsing experience would skyrocket
pingas
>>102690004I like using it for fun like this in the last assistant prefix.[New direction: change your writing style and prose, but keep characters and dialogue consistent. Write as if ONLY the narrator's personality changed, as if it were {{random: the Heavy from TF2, the Spy from TF2, the Pyro from TF2, Steve Jobs, Donald Trump, Kanye West, Vince Offer, John Carmack, a drunk Scottish lass, the one and only Jesus Christ, a based and redpilled 4chan anon, the real Santa Claus, House MD from House, David Attenborough, my mother lololol}}.]
>>102696700>tranimeWhy are you retards even here?
>>102696710no moderation or censorship
>>102696706I'm coping as usual, you see
>>102696748there is both of those so go away and kill yourself
>>102696710>retardsI just want to block jaks, the rest of that has nothing to do with me.
it should be a banworthy offense to post without an anime image attachedto make this reasonable, there should be an optional second image slot for 'obligatory anime pic' which you use if you have a non-anime image for discussionthe quality of users and discourse would go up tenfold overnight and continue to rise for a while as the undesirables start to filter out of our communities
>>102696710anon wants to filter out tranime avatarfags, nothing wrong with it.
Post 102696787:DECISION - BAN
Anon is just triggered by all the high quality AI generated Mikus that we see here.
There aren't many Qwen 32b finetunes, how does the AGI version compare to official?
>>102696803>that pic>>>>>>>>>>>>>high qualityYou might need your eyes checked.
>>102696850Are you saying there's something wrong with the quality of my Mikus?
>>102696791Also you fags immediately proved "tranime" call right, you got triggered in nanoseconds over this small funny word, not a good look. >>102696871Put some effort in it at least.
>>102696871Oh, nothing much. Her fingers just fused together. It happens to all of us sometimes.
>>102696881>you got triggered in nanoseconds over this small funny word, not a good look.NTA but are you sure you should be using this line of argument?The people that use the word tranime unironically are the biggest snowflakes.All you'd have to do is use female pronouns for Jart and you would get like 10 replies.
>>102696710>Why are you retards even here?Election tourists and zoomers decided this is a safe place to fight their culture war.So they come to an anime website to screech about seeing anime.
>>102696993>anime websitekys
>>102696949dilate
>>102697015not moot; moot point
>>102697015go back, newfag
>>102697040go back yourself
You guys are really bored, huh?
>>102697069Yes... when will a good local model release and end this?
>Tranime troons getting THIS mad
dead general
>>102696871Your Mikus have always been valid. Were those from the model9, or that other experiment from a while back?
>>102697085As soon as someone leaks a good model
If Grok 3 is AGI, that means AGI would be open sourced when Grok 4 releases. At the pace xAI pushes out models that could mean we get local AGI in as little as a year. This thought gives me hope.
>>102697015extremely high quality bait kek
>>102697157>If Grok 3 is AGIlol
>>102697106model9 and some other fucky model that I never released.
>>102697157Calm down. Grok 2 is out and they still haven't open sourced 1.5. There's no way of knowing if rocketman is going to keep his word about the 6 month timeline.
>>102697157lmao
>>102697157>he still thinks transformers can achieve AGI
>>102697176They can with the right dataset.
>>102697187>i-it just needs more trainingYou don't even know what AGI is.
>>102697193Yes I do.
>>102697187A dataset created by AGI maybe.
>>102697202No you don't. AGI doesn't just mean "more knowledge".
>>102697187>AGI>datasetYou fuckers are so dumb I wish you were pretending.
>>102697241Yes I do, though I agree with your second sentence.
>>102697193This is AGIhttps://huggingface.co/AiCloser/Qwen2.5-32B-AGI
>>102697268Then you should know, that by design transformer text prediction models are incapable of achieving AGI.
>>102693127>NAI fiascoWhat is this about? The SD finetune leaking way back when?
>>102697272Spoiler: It's not.
>>102696748Try to say nigger 4 times.
Crazy that we're getting local AGI soon. I wonder how governments will react to this development.Can they fight the fact that intelligence is just a simple statistical thing?
Has anyone ever tested the capabilities of multilingual models like Largestral in an educational context?Been wanting to learn a language but I'm sure apps like Duolingo won't suffice, nor will simply reading books/watching series/listening to music.Could models like Largestral act as a "personal teacher" of sorts, in the sense that I might be able to ask it questions and have it explain the grammar and such to me, or ask it to proofread short text to see if it's correct and makes sense (and if not, explain why)? Any character cards/presets that would work for that?Or are current multilingual models just too shit?
There are apps now that run local 0.5~3b models on phones.Now I wonder what kind of apps will pop out in the next year.
>>102697415>, nor will simply reading books/watching series/listening to music.that plus a dictionary is all you need.>, in the sense that I might be able to ask it questions and have it explain the grammar and such to me, or ask it to proofread short text to see if it's correct and makes sense (and if not, explain why)? Yes.> Any character cards/presets that would work for that?"You are an expert [X] language tutor. You will assist user with all [X] questions."Stop over complicating things.
is it crazy I used to be really into all this shit every week trying a new model getting hyped for the nextand now I check monthly to see there is no new mistral model and just forget about it until the next time I remember to check againI used to tryhard on quants and context trying to min max the hell out of my gpu, even tried sitting through t1/s replies checking to see if they'd be smarter or betterNow it's just whatever, it'll be wrong either way, the only important thing is it's entertainingly wrong or an easy edit wrongI don't regen for accuracy anymore, I just regen to get a reply I have to edit the leastWhat the hell happened?
>>102697295Tokens are not necessarily just text; modern models are representing more domains with them.But even if they were: converting text into actions is the easiest part by far. Everything in the world with an input does something like this already. The hard part is reliably generating the text (including instructions to devices) that represents useful actions toward any given goal without involving any human intelligence. That's the part transformers can solve - with the right data.
>>102697475You reached a comfort zone plus the stagnation of coom models.
>>102697494We need models with keyboard and mouse output tokens.
>>102697475you got bored, you probably go through that cycle with a lot of shit if you think about it
>>102697475>What the hell happened?It is the effect of the true safety anti-coom measures. It would be piss easy to make the models reject everything sexual but that would lead to people working on actual jailbreaks. However if you make the models suck your dick but do it badly most people will think there are no safeties or the safety was circumvented. And they will quickly get bored with LLM cooming which was the original goal of the safety measures.
>>102697419>models on phoneIt's not viable because it drains the battery too much
>>102697516I remember one of the chinese cog-something models was supposed to be specialized at that, not sure if they used specialized tokens but it'd output coordinates and click commands and claimed to be specialized at navigating point-and-click guis. was like 20b and didn't have any good quants though so idk, probably outdated by now
>>102697565I bet that's why apple didn't launch its phone AI. I'm sure people will look for a solution to that in the next few months.
Has anyone here tested this https://huggingface.co/AiCloser/Qwen2.5-32B-AGI ?
New optimizer? Half the memory usage of AdamW and 30% faster. @CUDA devhttps://x.com/kellerjordan0/status/1842300916864844014
Can a local GPU with 12GB be used for novel generation? I see a lot of chatbots checkpoints, but all the story generations seem to be on SaaS.
>>102697724>124mwho cares, wont scale to billion param models, yawn
>>102697724Would use that if he releases it, a lot of small NLP models would benefit from that
>>102697724don't careif i still need to set an LR and schedule it, I don't want it
>>102697724I don't use Twitter and don't know which buttons I need to press to see the entire Tweet thread.Can you spoonfeed me a link to where the technical details are explained?
>>102697775NovelAI was using a 13b as its strongest model for its service up until a week or two ago.12b models are newer and better, you could fit an entire nemo model into your vram at a high quant.models specialized for chatting can still write stories.
>>102697862You need to make an account.
>>102697724>>102697862Never mind, I found the dude's Github page.
>>102697862You can replace x.com with xcancel.com or nitter.poast.orghttps://xcancel.com/kellerjordan0/status/1842300916864844014
>>102697862https://github.com/KellerJordan/modded-nanogpt
>>102697862BASED
>>102697724sgd uses half the memory and is 30% faster. also usually gives better results and doesn't blow up.
>>102697724>Not convergedWaste of an experiment
Someone should make a Canvas clone.Unlike the overhyped o1 which is a nothingburger, Canvas is actually useful when it works (about half the time).
>>102697724I bookmarked the page, it seems like a reasonable enough thing to try out once I have the llama.cpp/GGML training code in a state where you can actually think about using it.Though since my ultimate goal is training/finetuning of quantized models rather than FP16 training the question will be how well this optimizer performs at 8 bit precision or less (for AdamW to my knowledge 8 bit works).
>>102697296No. Their 70B fine-tune with billions of tokens
>>102697724What happened to Sophia?
>>102697475You reached enlightenment
>>102693127>There are two problems afflicting the local AI community right now1. All of you niggers are broke and can't afford to train2. The people who CAN afford to train only want GPT slopI plan to fix both... Eventually
>>102698006What is it?
>>102698238how do I invest in you
>>102697862a pure soul
>>102697862Lurk more
>>102698251I don't need money. I'm rich! I got fat stacks and super PACs. Really, I know what needs to be done. I just haven't done it yet because I'm lazy. But I'll do it (soon). Before the end of 2024. Trust.
>>102698286I trust you.
>>102698286I only trust in results, nigger.
>>102698286it's not like I trust you or anything baka
>>102698286I do not 'trust'. Show your work or get the rope.
>>102698472
>>102688915>>102698472where is the 31 days image?
>>102698472Imposter !!!
>>102698493happy now?
>>102698523Better.
>>102698523Worse.
>>102698286if you're lazy tell just chatgpt what to do
I loaded nemo after 2 weeks of giving up on LLM cooming and holy shit it is all so bad. I can actually believe people saying that mythomax is the best because I can't believe the current best thing in that range is this fucking bad. Safety won. Biowhores won.
>>102698799Man, it's like reading a porn script when you're not horny. It's very cringe
>>102698799you loaded the instruct version and you're whining?
>>102698523Hot Petra
>>102698799Same here, I occasionally try to get into LLM storywriting again, generate a few sentences, roll my eyes and remember why I gave up last time.
What do you guys think of Mistral-Nemo-Gutenberg-Doppel-12B-v2-GGUF? Is it decent for local?
>>102698948>>102698948>>102698948
>>102698839Base instruct and some shittune. It is all basically the same.
dead thread it's over for local
>>102699888nice trips