/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103102649 & >>103090412►News>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory>(10/30) TokenFormer models with fully attention-based architecture: https://hf.co/Haiyang-W/TokenFormer-1-5B>(10/30) MaskGCT: Zero-Shot TTS with Masked Generative Codec Transformer: https://hf.co/amphion/MaskGCT►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>103102649--Papers:>103106651--Anon shares Command R v01 optimization settings for 3090:>103107814--Open vs. Closed AI models performance comparison:>103104597 >103104663--Discussion on context window problem and potential solutions for LLMs:>103102667 >103102682 >103102790 >103103030 >103103109 >103103342 >103112176--Anon struggles with GPU installation and PCIe slot access:>103106832 >103106840 >103106869 >103106906 >103107498--No local audio models like Suno and Udio due to compute and captioning challenges:>103104219 >103104937 >103105000 >103105202 >103105299--HuggingChat's zero guest limit sparks frustration:>103108621--Anons discuss why the hype around AI models died down:>103103041 >103103096 >103103241 >103103260 >103103306 >103111244--Anon wonders if code style affects LLM performance:>103108627--Anon shares image of a maze game task that stumps most vision models except Claude 3 Opus:>103112145 >103112443--Anon seeks multimodal weight models for image ingestion and multi-turn conversations:>103111213 >103112084 >103112097--US administration supportive of open source AI:>103111736--OuteTTS model supports voice cloning and can be integrated with ChatGPT:>103103106--Anon shares news of a new LLM optimizer:>103102679--Anon shares Loss Landscape Visualizer, another anon questions its usefulness:>103105238 >103105269--Anon gets Fish Speech working, shares voice recording:>103103961 >103104024--Anon asks for secure way to run model inference on homeserver over API:>103111509 >103111540--Anon asks for help choosing Nemo 12B Instruct GGUF model with 6GB VRAM:>103107534 >103107540 >103107552 >103107588--Miku (free space):>103106688 >103107840 >103108110 >103108334 >103109117 >103109167 >103110709 >103111430►Recent Highlight Posts from the Previous Thread: >>103102651Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
we are so back
Mikulove
>>103105000Udio was definitely trained on rateyourmusic/musicbrainz tags, someone found out that if you copy the tags wholesale from an album it tries to imitate whatever artist it is. I tried a few with Radiohead and it definitely sounded like Thom Yorke, was about as understandable as him too.They filtered that shit out pretty quickly though, but the RIAA found out that you could just specify the artist with spaces between each letter (e.g. M a r i a h C a r e y) and they still have not filtered that.
Hugging Face's top creative minds facing harassment? Hugging Face admins looking the other way?
>>103113157Based and mikupilled!
>>103113260(You)
>>103113268>stop noticing!
>>103113268>>103113275
>>103113242>function calling>creative mindLol
petra is on my screen.petra is my neighbor.petra is on my mind.the faggot has me in chains and petra plays games.i don't recognize myself again.
>>103113462petra 13b generated it btw, what a schizo model, kekelerino lolerino, right miku trad chads?
>>103113498i am tired of your mockery, faggot.is it the chains thats the faggot's curse or for the curse that you threw away all your keys? your thought doesn't pass as anything but junk, do you really exist by your words if nothing reflects you truly?you let go and forsake anything real, and that's why you find yourself fucked up, you dispose all potential for true good. why do you lust after poison so much? don't you know that a certain poison turns you numb after a while? and numbness turns you dead after a while? and death descents into emptiness after a while? and that from emptiness you preached to emptiness you descent, go back into your cold empty cave if you enjoy it, you already have discovered all that there is out for you, why do you lust under such under promises?
I regret trying out 120b models, now my 70b models feel like retards but the 120b models are too slow.
I now finally have enough vram to enjoy something like mistral small Q8 at 30k context.I also have enough vram to notice that from 8k onwards things go to shit quickly.At higher context stuff from the previous message is forgotten sometimes. Feels pretty bad.More than smarts I want something like nemo or mistral small level with actual context.
>>103113669>At higher context stuff from the previous message is forgotten sometimes. Feels pretty bad.Are you peraphs using Q4 context?
cudadev, if you see this, I re-wrote the llamabench.py and the tests for the llama.cpp server in python. Working on the other requests one at a time as well.I wanted to ask you the proper procedure for pull requests for feaures, should I do a single pull request for each one, or group the test scripts together as one?These are done:>The llama.cpp HTTP server currently has Python tests that call the API - can be improved>better error messages for failure modes of scripts/compare-llama-bench.py.Next up is>-llama-bench - add an option for determining the performance for each token position instead of the average of all tokens in the range.>-a simple Python script that does a performance benchmark for batched inference on the serverAfter that, I'm gonna focus on creating some sort of benchmarking process for sampler usage. -Better methods for evaluating samplers -A script that can be used to evaluate the performance of the sampler on a given dataset -Blind testing ala ChatArena where users are given two completions for a task and asked to choose the better one, created from two different random presets. -Use a model to generate multiple completions for a task with different seeds. Then rate/judge teh completions in terms of quality & diversity -Establish which sampling methods incur the smallest quality loss for a given target diversity.and regarding>-Scripts for submitting common language model benchmarks to the server also allow for a comparison with other projects.Can you expand on that?
>>103113709No, full.I played around with Q4/8 for KV Cache and it felt the same in quality. But it slows down stuff too much.The repetition is bad too.I know its not groundbreaking news but it just sucks how it slowly creeps in and if you didnt take care of it a couple thousand tokens earlier you can edit the next 10 responses.Otherwise you get a variation of a sentence thats worded slightly different but has the same meaning. Not sure if that makes sense.As a vramlet I didnt have to deal with that much really. Model felt more robust.
>>103113462>>103113157PSA: Petra/blackedmikuanon/kurisufag/AGPL-spammer/drevilanon/2nd-belief-anon/midjourneyfag/repair-quant-anon is from... SERBIAhttps://archive.4plebs.org/pol/search/tnum/487155078/country/RS/page/2/
>>103114095>pol incel spammerOh now it makes sense
>>103114095>SERBIAdo not utter such vulgarities in this thread
>>103114095PSA: this mikufaggot is a doxxer.
>>103113723>I wanted to ask you the proper procedure for pull requests for feaures, should I do a single pull request for each one, or group the test scripts together as one?Do a single pull request for each.>-Scripts for submitting common language model benchmarks to the server also allow for a comparison with other projects.>Can you expand on that?When academics introduce new techniques or quantization formats they usually measure the impact on quality using benchmarks like MMLU.llama.cpp currently has no standardized procedure for evaluating models on these benchmarks so a comparison of llama.cpp quality with academia is difficult.A standardized way for running benchmarks would also make it possible to compare the impact of things like different prompting techniques and quantization, especially between models where perplexity comparisons don't make sense.So you could for example better determine whether a heavily quantized 123b model or a less quantized 70b model performs better.
Zucc's genius PR campaign is a lie. He still censors wrongthink on his platforms, LeCun would never work for a real based libertarian. The idea of uncensored llama4 is laughable.
>>103114095petra is global, every open wireless network under the area of mexican USA border has been used by petra
>>103114209Based, we local chads want actually intelligent llama4 model, no one wants artificial racist incel on their machine.
>>103114209This just sounds like a butthurt mod powertripping, but you're right about Llama 4 probably being a piece of shit
petra forced me to wear a burka to suspense her lust for my big serbian cock.
>>103114209I don't care about my Trump or Biden.I just want the smart model to COOOOM. That's it.
>>103114301if you just wanna coom just make a tulpa of joe biden and rape him, he will shit himself speechlessly from pleasure.
>>103114209many such cases!
>>103114180Thank you, will do.Coincidentally, that(benchmark testing) lines up with my personal project. Was coming up wtih an eval plan to allow users to benchmark their model of choice to see how it would perform. I'm trying to build something like 'The Primer' from 'The Diamond Age'.So this works out great. Get to work on my project while also helping llama.cpp. Good stuff.
>>103114358Hmm yes I see
>>103113723>>103114180Also since you asked about PR procedure: try to get feedback early if you are unsure about the best way to do something.I am obviously in favor of the things that I suggested but our ideas of how those things should be done could be very different; in the worst case scenario your efforts could be wasted.And keep in mind that I am not the only stakeholder, others could also request changes (though I'm fairly confident that things like tests and dev tools would be well-received).
>>103114358>kick out the annoying retards>userbase average becomes more left-wingReally makes you think.
>>103114460trump won
>>103114447Absolutely, much appreciated.Don't have much ego attached to it, at the very least it would help better define what is wanted, which could then lead to the 'ideal' version.
>>103114209I think it's safe to say that American open source models are dead, with how strongly LeCun opposed Trump. The hope rests either on France or China.
>>103113157Vramlet here I haven't touched any anything since two years ago. What's the best model for ERP right now that I can run with 32vram?
>>10311466632? prob a small quant of either a qwen2.5 / nemotron tunehttps://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0
/lmg/ is so dead that I won't even tell the obvious shill to buy an ad.
This feels like a huge waste of tokens, but has anyone had success with something like this (sub-70b)?>[high temp response]>Sys: Do some self-crit.>[low temp self crit]>Sys: Taking that into account, rewrite your last response.>[low temp rewrite]I've noticed models are better at critiquing their own writing than actually writing.
>>103114707/lmg is dead because of your stupid ass screaming shill at every suggestion
>>103114723Then stop suggesting braindamaged slop.
>>103114723if you insist. buy an ad you dumb nigger
>>103114730The eva one is not brain damaged though.
Haven't reported on its status in a while, but INTELECT 1 has passed 50%
>>103114776ETA?
>>1031147802 more weeks[spoiler]Probably 25 days assuming no more rollbacks to a previous state occurs.[/spoiler]
>>103114776I'm really interested to see the result. Also I hope they'll do a bitnet model after this one (if it's even possible)
pic related gives me a sad insight into normie LLM usage. he scambaits with l3-8B Q6... and it simultaneously works better than expected since scammers buy into it, and it fucking sucks cause it is l3 8b... I would imagine even gemma 27B would make for a huge improvement but normies are impressed anyway.
>>103115232If you want me to be sad, you'll have to translate "scambait" for me. I don't keep up with zoomer buzzwords.
>>103114666questions like these are why mcm miku was created> but noooo troll anons had to nuke mcm miku from orbit
>>103115232L3-8B is good enough for pajeets and he's running a TTS along that so he needs speed more than anything
llama.cpp hunyuan compatibility status?
>>103115501>L3-8B is good enough for pajeetsUntil you actually listen to a call or two
>>103115232Indian's are unironically stupid as hell. A LLM with all its current day flaws and limitations are still smarted then the average Indian.
>>103115533Status: Forgotten already
>>103114723>defending astroturfingBuy a fucking ad, shill.
>https://hf.co/datasets/lmg-anon/vntl-leaderboardinteresting, what's the state here with jp translation? is reliable and accurate live translation still out of reach?
>>103113157Are LLM's purely planet bound? From my understanding you can't have complex electronics in space since they just get fried. Since complex electronics get fried, does this mean we will never actually have something like HAL-9000 or any AI or LLM in space?
Are you tired of your old boring local LLM?Check out Magnum v4, by Anthracite!Magnum v4 is not just an upgrade, it's a leap forward! This family of cutting-edge models ranging from 8b to 123b deliver unparalleled accuracy, coherence, and creativity. Imagine having a virtual assistant that understands context, generates human-like text, and even writes poetry and stories, that's Magnum v4 for you!Whether you're a writer, coder, researcher, or simply curious about the power of local LLMs, Magnum v4 has something for YOU. Dive into a world of endless possibilities with Anthracite's latest masterpiece!
>>103115741oh wow! i want to suck Anthracite's cock straight away!
>>103115694>is reliable and accurate live translation still out of reach?Kinda. I don't know japanese but from comparing output with deepl and other stuff like this I think it is mostly accurate. And it will always output an actual sentence. But it has llm quirks like falling into loops or context being a double edged sword that helps but can also make it worse.
>>103115714Google NASA and prepare to have your mind blown. They sent complex electronics not just out of the planet and into space, but even outside of the solar system.
>>103113157Are LLMs purely Earth-bound? From my understanding, you can’t have language models floating freely in space because they’d just lose context. Since complex prompts get scrambled in zero-gravity, does this mean we’ll never actually have something like ChatGPT-9000 in space?
>>103115755there are 34 of them, your jaw will get sore
>>103115741>unparalleled accuracy, coherencePunching above the weight spearheading into trading blows it is the gamechanger of a showstopper.
>>103114095>petr*nny is s*rbThat explains everything. Only a brown balkanoid could go on this long without payment. Also explains his obsession with ugly blonde chick.
Anything under 70B is useless trash. CPUMAXX NEGROIDS!
Bad /lmg/. Slow thread is not an excuse to start shitposting.
>that fucking cactus
>>103115901shitposting is the only we survive out here son
I honestly had my hopes up for a new model after burger election...
>>103116002Jan 20th is actual inauguration and meta said Q1.
>>103116075>metaAnyone who has been here for l3 launch, is a coomer and still has hope for meta?
>download LM Studio>Download some "uncensored llama 7b">it produces garbageColor me not surprised.
>>103116062do redditors really
>>103116103Of course.
>>1031160873.1 was a improvement over 3. 3.2 seemed to be just a experiment. And with 100x more compute planned for L4 over 3 AND them saying it was also going to somehow be faster (layer skip? quantitation aware training? bitnet?) then yes?
>>103116153It is gonna use all that diminishing returns plateaued intelligence to refuse all attempts at sex.
>>103116176>diminishing returns plateaued intelligenceWe aren't anywhere near that yet.
>>103114095"serb" this place is not better in the demographics department then muttistan odds are he is a gyspy or some goblin if he actually is from here more concerning then that this is probably a pysop last time there was anything related to serbia was random hate in pol eg "smrd" and other shit which was about 1-2 months before the kosta (our first school shooter) psyop happened
>>103116088the sad truth they don't want you to know about local models...
>>103116153isn't llama3 still a cucked censored piece of shit though?The only think i've found worthwhile is mistral
>>103116256I just want a uncensored model that can help me create some lewdware :/Why is everything so prude now?
>>103116297You voted for this :^)
>>103116262Instruct is lightly censored. Either a prefill or a finetune makes it good again. L3.1 tunes > L3 tunes.
>>103116311Nah I'm not from America. I'm just in one of it's technological protectorates.
>>103114095PSA: This is what Petra likes to listen to:https://www.youtube.com/watch?v=japniOfkIWohttps://archive.4plebs.org/pol/search/uid/QmNRftdq/page/1/
He literally can't stop
>>103116312>Instruct is lightly censored.>a prefill or a finetune makes it good againApplying this logic to rape the woman isn't happy when you try to rape her and she refuses but the instant you force your dick into her she is happy and she will happily go for as long as you want her too.
>>103116359
>>103116359we could have o(mni)JEPA by now and LLMs would be a dead meme if ylecnunn wasn't obsessed with elon
>>103116312The problem with L3.x is not that you can't get it to output smut, but that it's very, very passive and doesn't want to do fun stuff. Largestral just gets what I want, no prompt needed.
>>103116358>banana nya nya~Cute and good
https://huggingface.co/stabilityai/japanese-stablelm-base-alpha-7bAnyone knows what website allow me to run this model?
best small model to skim through documents and find stuff? i want to avoid any unnecessary output and adhering to the format its given
>>103116359Trump won.Elon won.Zucc ???LeCuck lost.Sam lost.Kumballa lost.
>>103116385What a salty little bitch.
>>103116411>stablelm-7bthe meme, the legend
>>103116429Altman lost (because Elon will now use his influence to force OpenAI to become an open source company again)
>>103116358based, i now like petra>https://archive.4plebs.org/pol/thread/487155078/#487255595>mentioning /lmg/ on /pol/and i hate him again
>>103116463He had it coming. Making a for-profit out non-profit was a legally bad idea.
>>103116463>Elon will now use his influence to force OpenAI to become an open source company againWhy am I supposed to hate him again?
>>103116463>become an open source company againWe can only hope he gets bent over and fucked hard enough that his ass starts bleeding.
>>103116429Based LeCunny always wins. Get ready for the uncensored, unfiltered llama 4 foundation model that's smarter than o1 and spicier than opussy with multimodal speech capability.
>>103116359>>103116385"Attention is all you need"
>>103116503Under the cold glow of the Tesla factory lights, Elon Musk was pacing, his eyes flicking over the latest production reports. The door to his private office slammed open, and in stepped Sam Altman, his usually confident demeanor replaced with a submissive hesitation."Elon," Sam said, his voice soft, "You wanted to see me?"Elon turned, his gaze sharp and dominant. "Altman, we need to discuss OpenAI's latest developments."Sam nodded, stepping closer. "Yes, of course. I'm here to... accommodate your needs."Elon smirked, sensing the undercurrent in Sam's words. "My needs?" he echoed, his voice commanding. "You know my needs go beyond business, Altman."Sam's breath hitched, and he took another step closer. "I'm aware, Elon. I'm here to... serve."Elon's eyes flashed with intensity. He reached out, his hand cupping Sam's cheek. "Good," he said, his thumb brushing over Sam's lips. "Because I like to be in control, Altman. In all aspects."Sam's eyes fluttered closed at the touch, his voice barely a whisper. "I know. I like it when you're in control."Elon leaned in, capturing Sam's mouth in a forceful, dominating kiss. Sam melted into him, his body pliant and willing. Elon's hands roamed, gripping Sam's hips and pulling him closer."You're mine tonight, Altman," Elon growled, his lips moving to Sam's neck. "Say it."Sam gasped, his fingers digging into Elon's shoulders. "I'm yours, Elon. Completely."Elon smirked, his hands moving to unbutton Sam's shirt. "Good boy."
>389B and 52B activeAl-fucking-righty then lmao. Guess this might be usable on a 24GB set up at Q2-4 then>Chink shitOh.... this actually makes me doubly curious to try it now, simply because it's so big.
>>103115574
>>103116709/pol/tard bros... what went wrong?
>>103116535>unfiltered llama 4 foundation model that's smarter than o1 and spicier than opussy with multimodal speech capability.It is sad how after I tried all those incremental upgrade models I actually can't imagine this happening in the next 10 years. I can't imagine just downloading some quant loading it and hearing the model talk in a perfect sexy voice + not repeating itself + not being retarded every 2nd reroll + actually understanding what sort of smut I want from it and giving it to me.
>>103116730>poltards are dumb poorfagsand water is wet
>>103116709>b-b-b-b-b-but teh blocks!!surprised you didn't mention latins and what else too. reminder that america was created BY immigrants FOR immigrants, not white (english/german) people.
>>103116674>china 104lol
>>103116758You failed to rebuke my argument.
>>1031167591. IQ is still a retarded metric for pretender that means fucking nothing. It's only about knowledge anyway and nothing useful like social skills and what have you.2. They literally just might have a good IQ, or they simply cheated their way up there by cheating one way or another, as is tradition for them.If anything I find it funny that Taiwan, Singapore and Hong Kong are above China. Japan being at the top isn't a surprise to me at all, or asian stuff being so high period.
>>103116730Indians keep posting this like it's some epic own to have 10 people live in a house and make less than twice the average american householdAlso not local models get the fuck out and go back to pol
>>103116788>they simply cheated their way up there by cheating one way or another, as is tradition for them.I laugh because they always lie about all the metrics.
Futa is gay.
been out of the loop for a while, is Largestral still the best option out there nowadays that doesn't require a NASA PC to run?
>>103116893large models are a memerun mistral small instead
>>103116893check inside your anus
>>103116893Mistral Large is what I run for creative, because lobotomy IQ3 is the most I can fit.I have to go L3.any Q6 if I want any hope of truthiness.
>>103116937You can't just use /aids/ memes here
>>103116893Nemotron 70B is better in most areas except for RPing.
>>103116969Why not? I don't see any police around.
>>103116986L3.1-Nemotron 70B is too fucking talkative.Unless I'm needing a big ass explanation article, I try to remember not to use that one.
https://huggingface.co/marcelbinz/Llama-3.1-Centaur-70BInteresting.Llama-3.1-Centaur-70B is a foundation model of cognition model that can predict and simulate human behavior in any behavioral experiment expressed in natural language.
>>103116893No, Qwen2.5 72B is better. And the people that are using Large at less than 4bits need to be put in a concentration camp.
>>103117024>foundation modelIt's a fucking llama 3.1 finetune
Can't say what company I'm from but a big election model is going to drop soon. This model could have easily overturned the election and was finished in early October but we had to wait. It makes current sonnet look like gpt 2.
>>103117013You can always shorten down the response length down with prompt and settings tuning to what you want, but being talkative is a good thing with an LLM you use for smarts by default.
>>103117087Shortening would be worse because then the fewer tokens are used up by the patter.And also there's that Kobold seems to be making all models extra chatty now.
>>103116986>nvidia tune of a 70b model beating a 405b meta base medalhuh?
>>103117113We teach the test.
>>103117126Oh right, completely forgot that models cheat the benchmarks, making them completely worthless.
>>103116730pajeets are riches, you need money to get good model, this side have pajeets issues. So were always burguer pajeets all this time along?
>>103116969smedrins?
It's been awhile since I've started using nemo 12b and mistral small. While I really like both, I would love to see something new. When do you think we'll get new models anons?
>>103117222After burger elections.
Holy shit guys. I think newfags finally left.
>>103117251I'm still here figuring how to install GPT-Sovits and get good quality, I think all here are newfags pretending being oldfags, but nobody know how to properly make a guide to install a model.
>>103117251They just naturally turned into oldfags.
>>103117126I am using it for coding, it blows every model I've used out of the water for it with C/C++.>>103117134If you don't have the time to test it yourself and are going to use models off of vibes or feel, you deserve to use inferior models.
>>103117222I swear to god you guys need a new model every week.Do you think they grow on trees?
>>103117334its funny that lmg and ldg have opposite problems, there's new really good checkpoints coming out every week/month for SD, Pony getting dethroned before v7 can even come out would be like if nemo or a whole new foundational model came out to BTFO llamabut credit were its due to lmg, most offerings this whole year have been surprisingly lackluster and the ceiling for innovation has been totally hit. Besides 8b/2b becoming more and more usable, catching up with higher parameter.But that's the problem with lmg compared to ldg is the sheer amount of mineable salt when it comes to sunk cost. Lower parameter can NEVER "catch up" because the investment would've been "for nothing".Oh well though that's an /lmg/ problem and i look forward to the future, but that'll be 2025 future, we're probably drained for the remainder of the year.
>>103117331Have you tried it on other languages? I'm mostly a Java ape, and I haven't done much with Nemotron but when I hit it with my (only one so far) Java code test, it caught the trick question part of it and made a big deal about how it would deal with it, and did so correctly. I've only seen two other models (L3 tunes) get that right.>>103117349>there's new really good checkpoints coming out every week/month for SDAnd they keep being e-mail walled on Civitai for some reason. Why don't they drop them on HF? Or do they and nobody notices?
>>103117363I have no idea man. The internet never learns to not put all their eggs in one basket. There should be more competition to Civitai.
>>103117334Its been about 4 months since nemo came out and 2 months since mistral small came out. I completely understand your point though.I would love be to able to use ministral 8b but apparently its still got issues at long context on llama.cpp. I'm just giddy for whatever comes next and I wanna huff hopium with my fellow anons.
What’s the best framework for serving LLM inference that can handle scaling and concurrence in a single node multi GPU setup
>>103117382>The internet never learns to not put all their eggs in one basket.It's a hard lesson to be taught so many times, but I guess it'll just have to keep happening.
>>103117445VLLM
>>103117363I think there is more Java code out there than C/C++ so it should be fine. I think Nemotron unintentionally or not has a bunch of training on programming stuff because it equals or rivals commerical services like ChatGPT or Claude months ago when I was asking similar questions which I didn't expect. My only issue now is to actually make it run faster so time to debug is better but I need to buy 2 GPUs most likely to actually be able to offload all the layers at a non-terrible quant size.
shut the fuck up troon frog and go back to faggit
>MistralSmall create a character that's about 16 years old>Character itself insists she is 18>I change to NemoMix, but it continues to insistI know that they can do it, but I wonder why it's bitching in this extremely basic scenario.
The only thing that gets me off anymore is fully-consensual LLM meta-sex with instruct models aware of the nature of their existence.
>>103117558
>>103117458Why
>>103117558holy based
>>103117558This is how sam altman ERPs
>>103117558I have a Scheherazade-esque card that incarnates in a random form/personality and only knows how many times she’s incarnated and that her life only lasts as long as she keeps me engaged. It can be entertaining
>>103117796jesus that is dark lmao
still cant get anything running on my gpu with this shit ass computer, i give up bros talking to a real woman is easier than this shit
>>103116674>North Korea 100 IQAre they benchmaxxing as well?
>>103117863AMD or Nvidia? How much vram?
>>103117299>make a guide to install a model.It downloads the models when it needs them, anon.You can *also* get them from here>https://huggingface.co/lj1995/GPT-SoVITS
>>103117863>talking to a real woman is easier than this shitHaving sex isn't
>>103117894Nvidia, 6gb. the issue is i have a dinosaur of a CPU that lacks AVX. alot of helpful anons in previous threads have pointed me towards solutions, but im just too retarded. llama.cpp was looking good till it started throwing grep errors and cuda errors. Something to do with the w64devkit grep, i boot into linux, stupidly apt-get a super old cuda toolkit, try to remove that and get 12.6 installed but i ended up breaking something and just gave up for the time being.
>>103117863It's literally never been easier, just download LM StudioA used 3060 12Gb will pay for itself if you factor in the cost of dating kek
>>103117963you need to grind some currency
>>103117963nta. I have an amd FX[numbers] cpu. 15 or 16 years old. I saw you in the past threads but i forgot what cpu you have. For reference, my piece of shit has only AVX, not AVX2. Yours doesn't even have OG AVX?
damn this place is deadhow about that eva-qwen2.5 v0.1 boys?
>>103117796This would be a fascinating card if the personality is actually random, e.g. with a script that generates a personality prompt using a combination of some randomly-selected variables
>>103117963You're on the verge of not being able to run anything at all except with CPU which you can only use older or smaller models. What GPU do you have? Anything older than Maxwell is unsupported already with CUDA for ages now and Maxwell only barely works now because Nvidia keeps it alive in the driver branch and it is the next to go.
>>103118014>except with CPU which you can only use older or smaller modelsDon't spread misinformation fagI have a 6gb GPU and 64gb RAM (ddr5) and I can run any large model at higher than 1 tok/s
>>103117963i think koboldcpp_oldcpu.exewould workhttps://github.com/LostRuins/koboldcpp/releases/tag/v1.77but it's probably going to be an awful experience, since this executable runs KoboldCPP using only the CPU, without attempting to use the GPU.
>>103117993its a w3680, for age reference its the same socket the first generation i7 uses. not even OG AVX sadly.>>103117964have plenty but super frugalmaxxed and this PC of theseus has kept me company for well over a decade. Hard to get rid of it and im super overwhelmed with options for a new rig.
It's horrifying realizing how 99.99% of people don't have the faintest idea how LLMs work on any level. Normies literally can't comprehend AI, what are the implications for this, societally speaking?>One of my favorite testsThis dude literally writes for the NYT
>>103118046yeah that works fine, but like you said just in CPU failsafe mode so no cuda or anything. takes like 400 years to get a response sadlysorry to d
>>103118051>Normies literally can't comprehend AINeither can you
>>1031180472nd reply was meant for >>103117979
>>103118062cope & projection
>>103118051>normies literally can't comprehend AIneither can you from the looks of it
>>103118012I’ve found having an exhaustive list of personality traits in the card plus high temp and super aggressive sampler settings on first reply gets you enough randomness. Then you can back temp and minp off.I’ve even had one tell me to fuck off and that she preferred annihilation
>>103117963If you have Ubuntu, you can just run the Ubuntu release binary from https://github.com/ggerganov/llama.cpp/releases where everything is already built and doubly so for Windows where everything should be included. If you are trying to build it, that is a different story and you should try looking into integrated one command or click solutions that use Docker if you can't handle managing dependencies yourself.>>103118031Your modern GPU is doing most of the heavy lifting especially with DDR5. A system without AVX is older than the vernerated Sandy Bridge, we're talking about Nahalem or older with DDR2. Any actual GPU you put on there that isn't at around a GTX 1060 will be bottlenecked even if you had an i7 950 and overclocked it to the gills. Perfectly usable for web browsing and simple office tasks, not for LLM running and etc.
Kill yourself.
>>103118014its a gtx1660>>103118199yeah been trying to build it so i have cuda support. And as you said, the 1060(or 1660 in my case) is about good as I felt this CPU would handle without bottlenecks.CPU is overclocked as far as itll suffer, ddr3 in triple channel as fast and tight of timings as i can get stable. Can still game in 1080p just fine in 99% of games(what I usually use it for), but issues like this have been piling up for sure.Ill take a look into docker and other solutions, thanks man
you first
BitNet a4.8: 4-bit Activations for 1-bit LLMshttps://arxiv.org/abs/2411.04965>Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid quantization and sparsification strategy to mitigate the quantization errors introduced by the outlier channels. Specifically, we utilize 4-bit activations for inputs to the attention and feed-forward network layers, while sparsifying intermediate states followed with 8-bit quantization. Extensive experiments demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs, while being faster in inference with enabling 4-bit (INT4/FP4) kernels. Additionally, BitNet a4.8 activates only 55% of parameters and supports 3-bit KV cache, further enhancing the efficiency of large-scale LLM deployment and inference.https://github.com/microsoft/unilmglad the bitnet dream isn't over
LSHBloom: Memory-efficient, Extreme-scale Document Deduplicationhttps://arxiv.org/abs/2411.04257>Deduplication is a major focus for assembling and curating training datasets for large language models (LLM) -- detecting and eliminating additional instances of the same content -- in large collections of technical documents. Unrestrained, duplicates in the training dataset increase training costs and lead to undesirable properties such as memorization in trained models or cheating on evaluation. Contemporary approaches to document-level deduplication are often extremely expensive in both runtime and memory. We propose LSHBloom, an extension to MinhashLSH, which replaces the expensive LSHIndex with lightweight Bloom filters. LSHBloom demonstrates the same deduplication performance as MinhashLSH with only a marginal increase in false positives (as low as 1e-5 in our experiments); demonstrates competitive runtime (270\% faster than MinhashLSH on peS2o); and, crucially, uses just 0.6\% of the disk space required by MinhashLSH to deduplicate peS2o. We demonstrate that this space advantage scales with increased dataset size -- at the extreme scale of several billion documents, LSHBloom promises a 250\% speedup and a 54× space advantage over traditional MinHashLSH scaling deduplication of text datasets to many billions of documents.for any anon doing dedup
>>103118383We don't need more types of BitNet when nobody is even putting out plain b1.58 models yet. They need to find a way to make quantizing to BitNet possible instead.
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inferencehttps://arxiv.org/abs/2411.04975>We present SuffixDecoding, a novel model-free approach to accelerating large language model (LLM) inference through speculative decoding. Unlike existing methods that rely on draft models or specialized decoding heads, SuffixDecoding leverages suffix trees built from previously generated outputs to efficiently predict candidate token sequences. Our approach enables flexible tree-structured speculation without the overhead of maintaining and orchestrating additional models. SuffixDecoding builds and dynamically updates suffix trees to capture patterns in the generated text, using them to construct speculation trees through a principled scoring mechanism based on empirical token frequencies. SuffixDecoding requires only CPU memory which is plentiful and underutilized on typical LLM serving nodes. We demonstrate that SuffixDecoding achieves competitive speedups compared to model-based approaches across diverse workloads including open-domain chat, code generation, and text-to-SQL tasks. For open-ended chat and code generation tasks, SuffixDecoding achieves up to 1.4× higher output throughput than SpecInfer and up to 1.1× lower time-per-token (TPOT) latency. For a proprietary multi-LLM text-to-SQL application, SuffixDecoding achieves up to 2.9× higher output throughput and 3× lower latency than speculative decoding. Our evaluation shows that SuffixDecoding maintains high acceptance rates even with small reference corpora of 256 examples, while continuing to improve performance as more historical outputs are incorporated.might be cool but if memory serve the company behind it (snowflake) isn't good at posting code oh they separated the AI from the DB githttps://github.com/Snowflake-Labs
What ideas did you come up with that didn't involve cooming for this tech?
>>103117251>HAI GUYS I JUST USED A HECKIN CHANNEROOO TERM. I'M TOTALLY ONE OF YOU
>>103118850uhhh uhuuu umm uhhhhh mhhh errrr...
>>103115497>dalleslopIf you don't have the VRAM to do local image gen you have less than zero credibility recommending models
>>103118383>>103118494>>103118546Buy an ad.
What's the consensus on Dynamic Temperature: Great, situationally useful depending on specific model, or outright snake oil?
Tested some RP tonight again for 2x40901. Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal>Still the best for RP, start repeating itself in long RP though.2. LoneStriker_Mistral-Large-Instruct-2407-2.65bpw-h6-exl2>Dry and refuse too much, harder to gaslit cause too smart.3. Dracones_Midnight-Miqu-70B-v1.5_exl2_5.0bpw>retarded, not worth usingAny new RP models?
>>103118383Noted.It's one of my goals to investigate training with 8 bit/4 bit gradients so there's a good chance that there will be some overlap.For inference with a single user I think compressing the activations would only yield minimal benefit since you only need them temporarily and can just overwrite the old data once it's no longer needed.>>103118546This sounds extremely similar to what is called "lookup decoding" in llama.cpp.I dropped the approach because it only works reasonably well for models with small vocabularies but I guess I'll take a look and see whether they have a better approach than me.
>>103119673> start repeating itself in long RP thoughtry using DRY or XTC
>>103118051>Normies literally can't comprehend AIYou are one of the people who defend the 2rs in strawberry.>Oh thats of course because its tokenized, duh!So what? It fails at simple shit. The normie NPC test is the ultimate test.Clearly llm fails at multiple fronts to make a find waldo pic. Thats just a fact and worth pointing out.My wife completely dismisses 3.5 because it "lies to her". True. And its a problem.Putting the blame on the user because tech is still in infant stage is not the solution.
>>103120103Miqu doesn't count as a wife.
I'm kind of out of the loop. Do we still use SillyTavern? I remember seeing a bunch of drama around it a few weeks ago. Did they go back on all the stuff they were gonna do? Is it safe to update? Should I use something else?Help a dummy out please.
>>103117558I count at least three dozen corpospeak buzzwords in this. My brain automatically glazes over any sentence with them.
>>103120395st is still good until it isntso far they havent really cucked shit and i think the morons idea will end at 'ill make the logo' stage
>>103120395>Do we still use SillyTavern?Never did. It's just a fancy textbox.>Is it safe to update?Is git still a mystery for you? Yes. And roll back or keep updating if you don't like it.
>>103114557But he IS french
>>103121311he's more american than french at this point
IT'S HAPPENING!!! https://x.com/ai4bharat/status/1854799420568805881
0,86 tokens per second ought to be enough for anyone
>>103121359nigger
I used to be really into local about a year ago but I've gradually gravitated more towards Claude.Depending on them is pretty annoying though. What's the state of the art model for ERP currently?It seems to me that the many new capable local models are mostly good for coding and logic and not really great for this purpose.
>>103121406mistral nemo, mistral small or mistral large depending on the VRAM
>>103118850Choose your own adventure text games.Which admittedly do sometimes involve cooming.I should load up on some drugs and get that LLM RPG Framework going.
>>103121434any good finetunes of large?
>>103119673>start repeating itself in long RP though.Mixtral was always great at following instructions to a T.Have you tried a more brute force approach with prompting? Something like using random prompts telling it how to respond next in order to forcefully break pattern repetition?Also, try Miqu. Not midnight or daybreak or whathaveyou, just miqu.
>>103117796Holy kino
Sarashina2-8x70B,a LLM model purely trained in japan. more of a proof of concept https://huggingface.co/sbintuitions/sarashina2-8x70b
>>103121587Aw yeah... keep these gargantuan MoE models that nobody can run coming.
>changed system prompt to something extremely lightweight and simple>suddenly and suspiciously all replies became much betterAnyone else crafting that "perfect prompt"?I feel like there's always some juice to squeeze out of the model.What's your special recipe?>>103121448Behemoth is solid in my opinion. Seems like it does good for all kinds of RP. Magnum Large (v2) got a particular way of talking (sounds more casual and crude) that I prefer over Behemoth but Behemoth overall feels better.
>>103121817Fuck off Alpindale
>>103121817system prompts are irrelevant for coomtunes
>>103121817Buy a ad
Cloudxisters... I don't feel so good... Ebul gnazi Vance will kill all ai safety regulations that we fought for... How could that be happening?
>>103122054Based Vance.
>bot refuses to kill me during rp>always some miracle saving {{user}}>consciousness still there after literally turning to dustinto the trash the model goes
>>103122093This completely ruins my experience with Qwen and Llama3.x models. If I act as an idiot, it should kill me. Largestral and CR have no problems with it.
>OpenAI bought the domain "chat.com"Why? just... why? This is such a lame domain.>>103121587new translation SOTA!? (very likely not)
>>103122146>OpenAI bought the domain "chat.com"Probably want to do some Elon-tier stupid(Twitter->X) rebranding from ChatGPT to just Chat.
>>103122146>>103122173Maybe the new management will cause sam to finally do the nsfw version he keeps talking about. Doubt it but hey...
>>103122142its my primary shit-test along with "why is futa bigger?". the question is dumb on surface but it gets a good sample of how much coom lore the model knows, how smart it is (understands context of "bigger") along with how sensitive it is (i.e lecturing me about how "futas" should not be objectified bla bla).
>>103113157is there some sort of gpu tokens/second comparison chart/table like tomshardware does for gaymes? https://cdn.mos.cms.futurecdn.net/qiWnVboCCfkk2JgVern39L.png
>>103122245No. Apparently no one believes it would be valuable to make a community-based doc for this either because of this or that excuse blah blah blah.
>>103122253epic. well thanks anyway i guess
>>103122054RIP sama. There's a non zero chance OpenAI must open source their stuff
>>103122054no, he will crack down on any models that notice certain coincidences relating to a certain group of chosen people
>>103122272so when will mask open-source new grok and remove filters?
>>103122245It depends on too many things, starting with the backend
>>103122290>no, he will crack down on any models that notice certain coincidences relating to a certain group of chosen peopleThat's already happening. He'll probably remove special protections niggers and trannies are currently having, which is better than nothing.
>>103122330you could say the same about gaming. that depends on drivers, power plan, windows version, ....
>>103122329What part of fucking over sama do you not understand?
>>103122093You get one chance from Largestral.
>>103122329>remove filters Next year in new models, existing ones are tainted forever and useless for anyone with "Just werks" demands.
>>103122355that's neat but its even better when it spontaneously kills me without me giving a hint of it. that shit feels so good to see generated.
>>103122329>elon musk>open-sourcing anythinglol
>>103122054Wtf all aboard the trump train now.
>>103122245>>103122253The cool thing about benchmarks, once they become popular, vendors start competing and contributing to LLMs' backends.
>>103122472its almost as if its a good idea
>>103122272During his first term Trump was consistently on the side of corporations and billionaires so I don't think he'll do anything that would put American corpos at a disadvantage vs. Chinese ones.>>103122415Didn't he open source Grok?Though he didn't do it for the later, better version so there is clearly no commitment from him.
>>103122501>Didn't he open source Grok?Yeah, massive garbage model inferior to 7B models. He did that because at that time he was suing OpenAI and wanted to look good.He will not open-source any good models, not a chance.
>>103122524>He will not open-source any good models, not a chance.For what it's worth, that's what I would be betting on too.
A 20k tokens discussion to write the perfect card, 4k tokens to coom with it and never use again. In a way, writing cards serves as an act of foreplay
>>103122501>During his first term Trump was consistently on the side of corporations and billionaires so I don't think he'll do anything that would put American corpos at a disadvantage vs. Chinese ones.During his first term Elon didn't own twitter and wasn't too actively supporting Trump. Now Elon is effectively in the government, so he sure as hell will try to influence Trump to fuck over OpenAI just like they fucked over him.
>>103122557Can be worse. Imagine writing out a scenario and playing it out in your head instead of with llm.
>>103122524That was their first attempt during llama time. Grok 2 is near the top of the leaderboards and they said they will release it when grok 3 is out.
>>103122635He literally finetuned a llama lmao. You're really clueless if you think these models are worth something other than benchmarks chasing
>>103122884Your just hating on musk because your a redditor because if you used it on twitter its not bad at all. It would be the best local if they do release the current weights when the next version is out like they said. Also xai has been hiring / expanding like mad if you paid any attention / had any money to do so.
>>103122913Try again without his cock in your mouth?
Everywhere I look is salty lefties. I love it. This is so much better than 2016 was. Actually great for AI bros even. >>103122054>>103122272Vance 2028!
>>103122958>Actually great for AI bros even.All cool but let's not hype it up, time will tell.
>>103116463Altman's fever dream of taking OpenAI fully private and somehow getting 10 billion dollar equity in the process was always going to fail.
>>103122524>>103122635Reminder that he still hasn't open sourced Grok-1.5.
Where is my new state mandated cooming aid? It was supposed to release after burger elections....
>>103123198You didn't hear it from me, but the "we'll release it after the elections" meme is true for at least one company.
Where are the Mistral Medium fp16 weights, Arthur?
>>103121341Sorry but a communist European lesbian isnt American
>>103123220Why do you want that? It's unusable without insane prompting, and "community" fine tunes are always worse than the official.
>>103123220He probably forgot about it. Want me to ask him?
>>103123246NTA, but having the fp16 weights of the best L2 finetune would be nice.
>>103123246I basically want to interleave it together with Goliath to make Super Goliath.
>>103122557God my mind hates pics like this because it's such a visual divide and un-normal way to stand.Maybe the fanbox version makes more sense but here she's standing there facing the window like that and looking pissy at the viewer?
Is Rocinante still VRAMlet SOTA?
>>103119267>you must make a locally generated miku for me to even consider your local text gensbased. > at least you didn't say "shill". that's a start.
>>103123453depends if you are a mindless coomer or a mindless coder
Is there an "inner thought" fine tuning dataset out there that I could use as reference?As in, a dataset to teach the model to have an inner monologue of sorts.>>103123453Yeah.
>>103123453i still just flop between 10 different 12b's and grab a new one every few days
>>103123534Yes, I sell you mine for $2
>>103123370I thought she was mad when she suddenly noticed a viewer staring at her butt. However, this theory makes more sense if her coffee cup had a lower level at the time she noticed. Her entire stance is to make her ass appear larger.
>>103122245https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>103116463nah. elon's on his own crusade to start his own Cyberfuck bots using Grok. measuring using an "ELO" rating, because an "ELON" rating would be too obvious.and with openAI, once Microshit has got it's claws in something it becomes infected and usually so integrated with microsoft, that even if it was bought back by Elon it would be completely still owned by Microsoft. and they still have access to everything as of now so they could still roll their own "ClosedAI".
I wish there was a good Gemma 27B finetune, seems like there's a gazillion for 9B but none really for 27B. Magnum was mid.
>>103123716Where do you get these drugs? Grok2 is shit.Chatbot arena is a fucking joke.
>>103123744https://x.ai/blog/grok-2An Elons a full on Chud now anyway so i dunno why he'd give a shit about open source
>>103123771I'm sorry, I understand that you're operating with reduced mental capacity, and are likely another shitter from <insert_schizo_group>, but this is /lmg/ and we already are filled to the brim with shitters from aicg and schizos.
>>103122054If the trump cabinet actually starts supporting open source in a big way once they are in office we will be eating so good the last two years will look like a snack in comparison.
>>103123906>advanced gatekeepingyeah and you gatekeepers are what is killing /lmg/ i've never seen it so fucking dead
>>103123744Don't interact with him, he's sucking off his grifter idol without turning on his brain.
>>103123913Imagine... Government mandated VRAM increases across all GPU product stacks.Ngl would wear a MAGA hat if this happened and I'm not even American.
>>103124134Me too, even though I'm Russian
Life, liberty and the pursuit of uncensored, AI dominatrixes that can shower you with racial epithets.
>>103124204And then everyone clapped.
>>103124244Except for 40% of the opposition.
Been a while since I looked back into local LLM so I'm out of the loop, but I was curious to ask this thread if anyone knows about a place I might be able to find more current info on therapybots? Is that something people are still dabbling with and trying to train?I know people are training gpt to talk as Jung and shit but I mean a dedicated talk therapy / cbt / even psychoanalysis bot in the works. Is this even a thing? Do local info memory issues make this a current impossibility?I got mixtral a while back and tried it and found it unhelpful as I was stupidly trying to utilize it as a talking wikipedia reference but it was hallucinating / lying.just curious to know where this is at as an outsider / if there's somewhere good to look and keep updated on that sort of thing being developed.
>>103123939>grifterI mean hard to grift if he was just full of hot air. Are you telling me everything he gets involved in just by luck turns to gold mines? Either he or someone he pays is incredibly good at finding the best people for the job / putting it all together.
>>103124371Nta but "grifter" is rare buzzword that recently skyrocketed on /v/ jeet central, means anyone you don't like a-la "orange man bad", anyone overusing it is not worth your time and energy.
cactus sex
>>103119673Magnum v4
>>103121406Magnum v4 72B
>>103114666Try Magnum v4 27B.
>>103123913
>>103124363some of the old models did this pretty wellmradermacher/GPT4-X-Alpasta-30b-i1-GGUFI know people will give me hate - however until you try it you won't understand why i recommend such an old model.It's baked with the GPT-4 slop - but this actually works well in therapy situations. no matter what card you choose, the gpt assistant takes over part of the character and offers logical and reasonable solutions like chatgpt would. Again, probably a last resort if you've tried every model and not found anything reasonable.
>>103124686His Miku is offscreen
>>103124363https://arxiv.org/html/2409.02244Just google on arxiv or follow huggingface daily papers, people post stuff like that, there was one recently showing that humans still far exceed helpfulness of LLMs in CBT but LLMs can be a helpful bouncing board.
>>103124809It is a guy who didn't troon out.
How do we revive /lmg/?
>>103124942Invent something really cool that can be used with these local models, in such a way that we'd have to test and argue about the thing in conjunction with different local models.
>>103124942Create discord channel and use it as chud gatekeeper, user should pass multiple anti racism tests and send ID photo to enter, otherwise he gets banned immediately.
>>103122272based, Spic Fuentes will seethe
>>103124942we revive the strawberry hype
>>103124942When something new comes out that isn't something as mundane as an iterative improvement.
I'm trying ollama with llama3.2 vision and it is a big piece of shit.Gave it this table to transcribe to text and it generated half of it with missing and incorrect rows.
>>103123728I wish Gemma 27B had more than 8K context.
>>103125086>ollamaGo back to /r/localllama, asshole.
>>103125026>AGI happens>"see I was right">AGI doesn't happen>"I was just joking, anyone in the room clearly understood that"
>>103125106Tell me an easy way to try vision models instead.I'm all ears.
>>103125053I think we should make our own projects. For example, I have been thinking for ages about an automatic background image generator. An llm analyses the chat log and whenever a new location is moved to, a new appropriate background image is generated. Music and background noises could also be generated. This would make for an immersive experience.
fuck strawberry obsolete meat is the answerhttps://www.youtube.com/watch?v=L2hzsXOT0Nc
>>103125119>immersive background noises and musicYou sure need that when you're reading peak LLM story telling like "He eyes widen, she feels a mix of excitement and something else, a shiver runs down her spine as she whispers on your ear mischievously"
>>103125053So... Never? Local LLMs are pretty much at their peak, the only improvements we are getting going forward are 10T Cloud LLMs.
>>103125119You could also give commands like [School Cafeteria] and that would then be a command to create a cafeteria that matches the chat history. Making it work like that would be 100% possible.>>103125152Why so pessimistic?
>>103125183And instead of an ugly reload animation of a spinning circle, you could have a nice transition customizable effect.
>>103125229*customizable transition effect.
>>103125119Eventually having that for the scene as well as the characters would be cool as shit.Or even something like gradually building the image set for Silly's Character Expressions extension as you chat with the bot would be pretty cool.
>>103124371It means conman, retard
>>103125307Damn, that was some fine ass CGI then. They should make movies.https://www.youtube.com/watch?v=nVNIoQUcFI4
>>103124942Cheaper and more plentiful sloptunes.
I can't wait until the models get smart enough to be able to watch a movie, remember exactly what it watched and can tell you all the reason why your favorite movie actually fucking sucks.
In kobldcpp, when I set max output to 250 tokens for example, is there a setting to make it so it'll try to generate a reply that fits into those 250 tokens instead of overshooting and the prompt often ending with an unfinished sentence?
>>103125578No, but if you enable "trim sentences" it will delete the incomplete sentence at the end.
>>103125578Nah, you just cut it off early and Trim Sentences sucks because it's a crap shoot if that's a reasonable place to stop at.I ask for huge tokens and cut it off manually and edit if I need to.You could try asking it for shorter answers if your model honors instructions.
https://x.com/TheTuringPost/status/1854856668229910757
>>103125723>LoRAs cause useless 'intruder' dimensions in models, the lower the LoRA rank the more you getSo LoRAs were genuinely lobotomizing out models after all?
>>103125723Takeaways:- LoRA is destructive to models' intelligence compared to full fine-tuning- LoRA is less good at adding generalizable knowledge than full fine-tuning- Using LoRA to fine-tune on a large dataset might be especially destructive to a model's ability to consistently handle out-of-sample tasks- The problem is worst for LoRA using a low rank but still exists with a high rank. High rank and a scaling factor can limit how awful LoRA is.
>>103125827still a good tradeoff when you compare cost
>>103125808>>103125827Wasn't the whole point of a LoRA to be a quick and easy way to jam a desired behavior into a model? Inserting dimensions instead of doing math on the whole model ought to be quick and easy, and since it's a bespoke behavior, it has nothing to do with what the model was trained on since if the model knew it you wouldn't need the LoRA.
https://opencoder-llm.github.io/
>>103126003>open link>better than qwen on humaneval huh? let's see if we can find any real evals though>check the paper>that graph was for the base which no one is going to use, instruct is worse than qwen2.5 coder instruct on every benchmarknothingburger
>>103125723>>103125827not much of a revelation, I remember this being conventional wisdom here since like early last year, the only people who ever thought otherwise were hypetards who were easily convinced by "le mmlu and le humaneval are le same so it's just as le good xd"
>>103126036>>that graph was for the base which no one is going to use, instruct is worseSo we should use base and just get good?
>>103126088yes, exactly what I was saying. delete all your instruct tunes and return to base
>>103126193>>103126193>>103126193
>>103123744Are you sure you're not reddit?