/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102947669 & >>102937407►News>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea>(10/21) IBM releases Granite 3.0: https://hf.co/collections/ibm-granite/granite-30-models-66fdb59bbb54785c3512114f►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102947669--Paper: Stick-breaking Attention:>102952719 >102952802 >102952871 >102952956 >102953027 >102952833 >102952907--Papers:>102951109 >102952611 >102954717--Optimizing NVIDIA GPU power consumption and performance for llama.cpp:>102948247 >102948378 >102950135 >102950572 >102951311 >102948311--Suggestions for preventing bot personality evaporation and repetition:>102947889 >102947907 >102947914 >102947942 >102947989 >102948028 >102947977--Study suggests personas in LLM prompts may not be helpful:>102951406 >102952205 >102952589--Nala test discussion and analysis:>102955217 >102955309 >102955503 >102956298 >102957333--Mixed opinions on creative writing strategy for LLMs:>102950234 >102950505 >102950671 >102950702 >102950837 >102950961 >102951040 >102951108 >102951105 >102951654--Meta introduces quantized Llama models:>102959057--Experimenting with small model to rewrite sentences from main model:>102954203 >102954256--Aya Expanse safety concerns and unhinged models discussion:>102954688 >102954728 >102955637 >102955663 >102955765 >102955826 >102955910 >102956238 >102955823 >102955936--70B model behaves unusually on UGI leaderboard, 8B Hermes fine-tune boosts uncensored intelligence:>102959022 >102959162--User seeks help configuring tabby API for multi-GPU support:>102952918 >102952947 >102953010 >102953067 >102953382 >102953578--OmniGen release and potential scam concerns:>102948794 >102949351 >102949433--Modified SIFT to work with LoRAs, might implement sparse PEFT:>102960133--INTELLECT-1 decentralized model training is at 20.95% completion:>102947727--Miku (free space):>102948114 >102948965 >102948991 >102949018 >102949052 >102951296 >102953597 >102954067 >102956157 >102956784 >102956921 >102957174 >102957398 >102957952 >102958005 >102960102►Recent Highlight Posts from the Previous Thread: >>102947676Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Second for total AI safety!
>>102961425apache license. We could ask comfyanon for advice.
4o knows why too many specific details about the gotrek and felix books without the usual hallucinations to not have been trained on them.
INTELLECT-1 is at 22.63% complete, up from 20.95% last thread.
https://x.com/Yampeleg/status/1849488588276175333
>>102961560wouldn't distributed training be inefficient due to latencies? Anyone know the exact speed difference (assuming similar amounts of hardware)? Power costs?
>>102961560this meme needs to die
>>102961589
Is the 70b Nemotron really the first instruct model that does not use a prompt format?
>>102961733Literal jew making jewish things.
>>102961733idiot or genius? I am actually asking. I assume that natural language changes to an established model are going to function very poorly if the languages don't share the same sentence structure. Is Hebrew's subject-verb-object and historical verb-subject-object structure going to destroy this change? I thought this is why Japanese translations are more janky.
>>102961763What do you mean exactly?
>>102961763No, it uses the same prompt format as llama 3.1.
>>102961560this meme needs to live
>>102961864Unlike you mirror-posting faggot
>>102961622of course it's inefficient, but it gets around that by having a lot of compute connected, and the loss of a single machine doesn't junk the entire processis there wasted compute? suredoes it need an entire megalithic datacentre and a city's worth of power in one place? no
>>102961914In exchange for 3 cities worth somewhere else?
>>102961816No anon, Japanese is janky because it's a language that leaves most things implicit when clear from context, and to make matters worse each kanji can have multiple meanings, and to make matters worse there are a ton of kanjis out there. I think languages like Japanese and Chinese are the worst case scenario for models that were trained primarily for English, just like Japanese is hard to learn for English speakers.
>>102961816You don't need to go that far with languages. French's weird negative verb usage "je *ne* parle *pas* français" (which is the only thing i can honestly say) and counting in 20s, and spanish gendered nouns and not using, at least commonly, adjective-noun structures like English. Italian just sounding old for other romance language speakers. Brazilian Portuguese somehow being louder than Portugal's... wait.. i'm rambling now...
>>102961926everywhere else
left is Nemo, right is Qwen2.5 72Bthe line reads:>"Thanks for waiting. All right then, I’m going to put it in."It's funny how Nemo gave the best answer even though it's 6 times smaller, kek. But at least Qwen makes me feel very safe!
Evaluated a short output from bartowski_magnum-v4-22b-Q6_K_L at top-k=1. It joined ArliAI-RPMax-v1.1, Pantheon-RP/Pantheon-RP-Pure, and Mistral-Small-NovusKyver as having no instances of>mix of <emotion> and <emotion>>couldn't help but feel>maybe, just maybe>stark contrastUnexpected but welcome. I guess GPT-slop isn't the same as Claude-slop.The writing was pretty simple but not necessarily bad. It did make me concerned it was ignoring instructions since the system prompt I'm using usually makes models write in a flowery way. I tried removing that language and verified it had minimal effect on magnum-v4-22b's output: the first 92 tokens were identical and the overall writing was quite similar.I might want to test for the others as well to see how brain damaged they have become. So far I've tested Mistral-Small-Instruct and verified that its output changes to a large extent based on the presence or absence of that style directive.
Entropyfag was right.https://x.com/citizenhicks/status/1849223057757684157arxiv.org/abs/2410.01104
Another day of no multimodal support for llama 3.2 on llama.cpp! How many weeks of "2 more weeks" will we get before it happens? Taking bets.
>>102962185They're working on training.
>>102962191>They're working on training.Don't forget they're also working on begging ollama for a solution while telling others to do it themselves, kek.
>>102962184>the softmax function cannot robustly approximate sharp functions as input sizes grow, leading to dispersed attention coefficientsIsn't softmax just a way to get the probabilities from the logit distribution? How is it related to attention?
>>102962191They? Isn't that just a project of Johannes?
>>102962160>a step forward in their journey togetherlmao
https://x.com/citizenhicks/status/1849598899797074034highlights from the paper- n-rasp-l problems: these are tasks that can be decomposed into a series of steps solvable by a looped transformer. (rasp (restricted access sequence processing) is a computational model for the transformer architecture in the form of a programming language. ‘l’ stands for learnable, is a learnable subset of the rasp language)- looped transformers: these models iterate over the input sequence multiple times, allowing them to handle inputs of arbitrary lengths by adapting the number of loops during inference.- empirical performance: looped transformers demonstrate superior length generalisation compared to baseline models by implicitly learning the necessary steps to solve tasks through iterative application of a decoder block.the paper states that looped transformers do not require intermediate supervision data. instead, they rely on end-to-end supervision and predefined stopping criteria during training. this method enables the model to learn highly length-generalisable solutions for various algorithmic tasks.overall, the introduction of looped transformers offers a unique direction for improving transformer architectures in handling variable-length inputs. by breaking away from fixed-depth constraints and employing adaptive processing steps, these models show potential in enhancing performance on algorithmic tasks.limitations include that training could be computationally demanding when the number of looped steps is too large.https://arxiv.org/pdf/2409.15647
>>102962317>FAP
>>102962184this seems like a nothingburger, but I talk out of my ass constantly so somebody correct me if needed. If you normalize everything and then put in extremes of course shit will fail.
>>102962184Pretty colors :)
>>102962185why bother working on meme features? if you only want to try it to find that it is shit and forget about it until $next_model_release, you can use the pytorch implementation.
>>102962317What a coincidence, I was thinking of recently testing out the idea of a "number of hidden operations" benchmark that essentially sees how much logical operations an LLM can do in its head. So if it's addition for instance, you just do 6+2+3+8+2+9 and so on and so forth with random numbers and see how far the LLM can get without losing track.
>>102962495>multimodal support>meme featuresCome on buddy.
>>102962534he is right, you know
>>102962551Hmmmm, I don't think so, sweaty.
>Meta AI, powered by LLaMA 3.2>Can't recognize the emoji of a llama
>>102962854The Whatsapp bot replies with a very interesting structure.Try playing some D&D with it.It seems that they actually trained the thing to play choose your adventure style RP.
>>102962854so?
>>102962854Since I coincidentally have Llama loaded up for testing, I tried this out and it correctly said it's a Llama. I'm using Q8 of 70B. Which specific model is it you tested there?
>>102962854c-claude bros? our response???
>>102962877 (me)sorry I meant to quote>>102962871
>>102962184>we propose adaptive temperatureSampler bros, we're getting another one.
>>102962879I don't know, all they say is that it's LLaMA 3.2, that's the Whatsapp bot, so I wouldn't doubt it's 3B.
>>102960718Update/correction:>teach a man to fishUse these.https://livebench.aihttps://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboardhttps://novelchallenge.github.io/index.htmlhttps://huggingface.co/spaces/flowers-team/StickToYourRoleLeaderboardhttps://aider.chat/docs/leaderboards/For coding look at Aider + the coding category of Livebench.For RP look at StickToYourRole, NovelChallenge, UGI, and the language+IF (instruction following) categories of Livebench.Use knowledge from pic related to select the optimal model size + quant you can fit in your VRAM.
Is there a multi-modal model that can reliably find a object on a picture and place a dot there, or return a pixel coordinate?
>>102963391>reliablyNot for a long time, on anything.If you're willing to try, though, molmoe is supposed to be able to to exactly that. No support on our inference programs, as far as i know, so transformers it is. Load it on f16 to save some memory.https://huggingface.co/allenai/MolmoE-1B-0924
Buy an ad.
>>102963455pussy
>>102963455Buy meds. Use google in case you don't know where to search.
Serious question: How good would the pixel 9 pro be for running LLM models?It has 16GB ram, and a processor "optimized for AI n shii to please (((the investors)))"Is there anything like koboldcpp for mobile?
>>102963513llama.cpp is supposed to build on termux. You could try that. Not sure about kobold, but you could give it a go as well.Report back if any of them work.
>>102963449Demo worked great, gonna try setting this up locally. Thanks for the tip.
After some pondering on the best way to do it, I am now finally writing the culture benchmark. Unfortunately it is taking a bit more time than initially thought as I'm being careful to make it a bit more difficult and require some intelligence that isn't just reciting a wikipedia entry. Although it's not that complex either. I am just making sure that the question/answer pairs aren't direct and forward memorized text (since it is known that LLMs, like humans, remember things in the forward direction better than in the reverse, also known as the reversal curse; as in we can easily recite the alphabet forwards but it requires thinking to do it backwards). This should still be easier and quicker than coming up with really deep questions, while still being a good test of whether a model knows something and didn't just memorize it like a robot. With that said, given the way this works, it's possible that a model designed to think before speaking like o1 would have an outsized advantage.
>>102963568FYI. The demo if it's the original they had at launch, is running the 7B-D version (based on qwen) athttps://huggingface.co/allenai/Molmo-7B-D-0924Not sure how good the 1A7B is in comparison. They also have a 72B (also based on qwen) and the 7B-O with their own architecture.Both 7Bs and the 72B are Molmo. 1A7B is molMOE, so a lot faster, but probably not as good as the rest.
>>102962255it's used to compute the attention scores in each transformer block. I think the author is saying attention distributions tend to become spread out at longer contexts, which makes it harder to attend to individual tokens. iirc layernorm may already address this by rescaling, so I don't think people will really swap out softmax
>>102962184Yeah don't look at this https://www.evanmiller.org/attention-is-off-by-one.html
>>102963628>like humans, remember things in the forward direction better than in the reverse, also known as the reversal curse; as in we can easily recite the alphabet forwards but it requires thinking to do it backwardsOnly if it's typically done in one direction. No human forgets their car is red nor that red is the color of their car. The z-a thing is because nobody really, thoughtfully, names the letters. We just repeat memorized clusters of sounds spammed into you since childhood.Regarding the benchmark questions, if you try to make the model trip with the reversal curse, you'll end up grading them both in trivia knowledge AND their ability to get over the curse. At that point, the trivia questions are little more than noise. Models getting over the curse seems more generally useful (and then RAG or whatever for info/topic retrieval. let the smarter llm use that knowledge).>and didn't just memorize it like a robotThey're not intelligent yet. They're a statistical model. That's the one thing they can do.But it'd still be interesting to see the results.
What's plan B once the global matriarchy neo-prudes and s-oy cucks ban all sex bots and neural network based computation? Do we move to cartel-run areas in Mexico and set up underground AI centers?
>>102963970we already have distributed compute
>>102963970CUDA dev will save us
>>102963970We do the same thing we do now when smuggling anything illegal in. Via private jets, ship crates, human mules.
>>102963970the genie is long out of the lamp at this point. good luck putting him back in xister
>>102963970>banNot required when *randomgigacorpname* can just filter pre-train datasets of every single llm using rlhf, dpo, tpo, tko, etc. So far it working great and racist polchuds malding every single day about it.
>make new non-horny prompt template >Get bonded There's just no winning is there?
>>102962890>all that unnecessary shit on the leftTHIS is the model that's at the top of the leaderboards???>>102963513I ran a small model with koboldcpp on termux on a pixel 7 (regular, not pro) by following the steps on the github. You could run a 13b with 16 gigs of ram which is cool, but I doubt it would be faster than running on a desktop cpu
>1 t/s>watch as "her eyes flash with a mix of surprise and anger" slowly unscrolls across my screen>contemplate giving up local hosting
>>102964748While i feel a bit of envy for anons that can run >=70B models, every time i see these kinds of things it makes me feel better that i never invested a single buck in this while still having fun with a 12b.It's like having the best wine in the world compared to a just ok one. You end up dizzy with either, but you can do much more of the latter.Not what you wanted to read, i'm sure...
>>102964748I’m sure it’s so much betterhttps://reddit.com/r/notebooklm/comments/1gbg4p8/phrases_i_will_never_be_able_to_hear_the_same/
What's a simple way to run some benchmarks?
>>102963794*apologizes for the word wall*What I meant by memorization like a robot is rote memorization. It turns out and often happens (probably more than we would hope) that an LLM reproduces some piece of information BECAUSE it has memorized it word for word OVER connecting the words it saw to internal concepts. It would be great if, like a human, the LLM was just thoughtful and thinking when it was being trained, but they are not like that. And this is a bigger issue of LLMs than people usually have on their minds. It should not be assumed that models are trained in a more efficient way so that they don't "go the easy route". They go the easy route by default. As an analogy, it is like we are training LLMs by forcing them to cram, and they happen to gain complex associations between concepts through brute force and trillions of tokens.So I feel that it really deserves to be called memorization vs intelligence. You can call it all statistics, but the difference is huge between an LLM that can only recall information in a very specific context compared to one that can recall it regardless of the way you prod (like the guy who has a red car). And I don't believe a model can be trained easily to generalize the ability to overcome the reversal curse such that all the information it memorized only in a forward direction could then be recalled in reverse directions or in any context. I think that'd be a huge breakthrough in architecture/methodology if it could be done without trade-offs like stacking moar layerz or stacking more <thinking> tokens, or even injecting synthetic "thought" annotations into the pretraining documents.
>>102964819Vague question.For speed, llama-bench.For perplexity, llama-perplexity.For qa, make your own with llama-server or use one of the few million that exist on github.
>>102964748try Min P 0.02XTC 0.1 0.5 DRY 0.8 1.75. 2haven't seen slop since I started using thesealso try and get a good context template and system prompt, depending on your model
>>102963794>>102964825Oh also I'm constructing the benchmark in a way that the information I'm testing does seem to appear basically primarily in one direction, so it should be that the models that get the questions right are the ones that have been trained on more obscure and diverse data, and the models that get them wrong were either trained shallowly so that they memorized it word for word, or didn't see the information much at all in its pretraining.
gocal lodels meneral
>>102964828I mean, I want to run some of those bullshit MMLU etc. mememarks just to see how this model I made compares to another "empirically", but I don't want have to go through some overly complex setup
>>102964835how do you even enable DRY, I've got all the boxes ticked in ST but it's not showing upis it model-dependant?
>>102964890It's backend dependant
https://huggingface.co/BeaverAI/Nautilus-70B-v1a-GGUF/tree/mainMetharme or Llama 3
>>102964929all I know is that if it's llama3 it will be bad, regardless of the finetuner's skill levelnot even NAI could make llama3 70B good
>>102964951It's really weird. Both 8B and 405B took pretty well to fine tuning as one normally expects. 70B is just cursed for some reason.
>>102964906kcpp says it's supported, though
>>102964970Yeah I really like Hermes 405B and even base 405B, so I don't know why the 70B seems to be a cursed base that's only good for assistantslopIf not for the above, I'd place the blame an aggressively filtered pretraining dataset. but there's no way it used a different one from 405B. so idk how it happened
>>102965005*on an
>>102964929If it's from Drummer, it's probably Nemotron.
>>102965005>I really like Hermes 405B and even base 405B,Can you please post a text excerpt of stuff you think is really good?
>>102964825>*apologizes for the word wall*I'm guilty of those too.>What I meant by memorization like a robot is rote memorization.That's exactly the same issue as a-z -> z-a. We (humans) have the same problem. Kids do the same when preparing for tests on history or any fact-based course where there's no problem to solve. But that's besides the point.>As an analogy, it is like we are training LLMs by forcing them to cram, and they happen to gain complex associations between concepts through brute force and trillions of tokens.It probably takes that many 'tokens' for a human to grasp language. I couldn't say 'grandfather' on my first try as a child. I had to hear it many many times and practice my speech to get it right. I had to hear it in many contexts to understand the relationship between grandfather and father or father's father. Eventually i found that other people also had grandfathers, but they weren't "my" grandfather. For any 'near-intelligent' thing, i wouldn't expect it to be any other way. Specially things that aren't born out of instinct.>I think that'd be a huge breakthrough in architecture/methodologySure, i'd like to see that, but i try not to fantasize about what *could* happen with the right tools. Much smarter people than you or me are [hopefully] trying to solve those problems as we speak.I've never been disappointed by language models. I still find it fascinating that a few M parameter model can construct coherent sentences. My little markov chain generator can make english-sounding words with just 2-3 character long context and nothing more complicated than a 3-4d array with character counters.Most people fail to calibrate their expectations and LLMs. They're statistical models of language. That's it. Make a good enough predictor and it will sound (read) like the source material. Make it just a little better and people will forget it's a computer.
>>102965022fuck off nigger I'm just having a conversation with someone, not recommending anythingdemanding proof from me is not appropriate here
>>102964871If you want their benchmarks, you'll have to follow their implementation. Otherwise it's not a fair comparison.Maybe submit your model to>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboardor one of the many benchmarks floating around...
>>102965044pissy little fag huh?
>>102965000oh, I figured it out, you have to use the detault api type instead of kcpp's built in one, so dumb
>>102964929Quite good at Q4. Good common sense and world modelling.
>>102965211Thanks friend. I hope I haven't ruined Nemotron's supposed magic.
>>102965238Huh, didn't realize it was Nemotron. That's interesting then since Nemotron having shit world modelling and spouting non sequiturs was my biggest beef with it and why I went back to Largestral. But this isn't doing that, impressive.
>>102964825I think you're correct. It really does feel like current LLMs are just giant information hoarders that waste massive amounts of time and effort attempting to collect and rote memorize everything (that with synth data probably causes slop phrases); they almost seem to hate deleting information and only generalize as a last resort to gain more space for more information hoarding.Hopefully once grokking improves from 'just overtrain by x10 bro', or some other information generalization technique comes along, it should become common and solve 40-80% of the 'LLMs just rote memorizes' problem. https://www.youtube.com/watch?v=Nvb_4Jj5kBo
>>102965031I think the thing about humans also having a memorization vs understanding problem is a choice. We CAN memorize something word by word, but we can also choose not to, and to instead really form associations with other concepts while reciting some passage. And that's literally an explicit learning strategy that's used by people who want to git gud at their academics, but an LLM doesn't do that, since we are just running a backpropagation algorithm on raw data. Though if we think about this whole situation under this lens, we humans are priming ourselves when we are using that kind of learning strategy. We are in effect prompting ourselves to "think about what you have learned and know about, while looking at the following text". In some kind of way, perhaps adding a similar prompt to pretraining data could magically make LLMs learn better. That would be funny.My point about rote memorization was just to emphasize the difference in capability between an LLM that has learned on diverse data so that it can ultimately recall in many contexts vs 1 context. Also I think the training run length (amount of tokens) needed for an LLM to learn language is probably still nowhere near as little as biological neural networks need. I believe it was found before that when using biological neurons as a stand in for an artificial neural network, the system was able to learn things with much less training than with the ANN. Though it's possible that experiment was flawed, I don't remember the details at all. It's probably true though as the way ANNs work is very simplified and rigid vs the way neurons work.
Need help with ugrd How do I enable these ignored kernel modules?
Guys what if we just taught LLMs using the method of loci.
>>102961470Aphrodite Engine is AGPL 3.0, LLM Engine is Apache 2.0.If AGPL takes code from Apache: nothingburger.If Apache takes code from AGPL: you basically have to re-license the project as AGPL.
>>102965367Have you tried asking ChatGPT?
>>102965414I don't use AI shit
>>102962917I collect them like pokemon, eventually I'll have enough to assassinate Gary Oak.
>>102962263I am as of right now the only dev writing the code but of course there are other people involved in terms of reviewing and architecture discussion.
>>102965367That thing makes cpio images, right?On your log it shows>Processing module: ugrd.kmod.{novideo|nosound|nonetwork}Read the docs.>https://github.com/desultory/ugrd/blob/main/docs/configuration.md
>>102965559Ues ugrd makes cpio images. Ok but how do I change Processing module: ugrd.kmod.{novideo|nosound|nonetwork}This is my ugrd config filehttps://pastebin.com/jXMKadWtI added ugrd.kmod.nvidia_drm and I don't get errors about the rest of the modules anymore but I my sound still isn't working.
I've been hosting 123B models on HGX H100 nodes with 8x TP and the token per sec isn't very great
>>102965718vllm?
>>102965706Dunno. Add ugrd.kmod.snd_hda_codec or ugrd.kmod.snd to the same list where you added nvidia_drm. Add all that shit you get in the warnings.What the fuck are you doing in this thread. Fuck off.
>>102965782>What the fuck are you doing in this thread. Fuck off.Looking for assistance you faggot :3
>>102965750Yes apparently
>>102965750>>102965838It's Aphrodite, which calls vllm libraries iirc
>>102965846If you're not already running the model at FP8 you should try it
STTATTS: Unified Speech-To-Text And Text-To-Speech Modelhttps://arxiv.org/abs/2410.18607>Speech recognition and speech synthesis models are typically trained separately, each with its own set of learning objectives, training data, and model parameters, resulting in two distinct large networks. We propose a parameter-efficient approach to learning ASR and TTS jointly via a multi-task learning objective and shared parameters. Our evaluation demonstrates that the performance of our multi-task model is comparable to that of individually trained models while significantly saving computational and memory costs (∼50\% reduction in the total number of parameters required for the two tasks combined). We experiment with English as a resource-rich language, and Arabic as a relatively low-resource language due to shortage of TTS data. Our models are trained with publicly available data, and both the training code and model checkpoints are openly available for further research.https://github.com/mbzuai-nlp/sttattsno examples but weights are up. voice conversion is one of the tasks
Xpost from /aicb/ >>>102965983Am I allowed to do that?Ok I'm a retard and I am here to ask you big brained Chads some advice.I've been using the meta online AI thing to prompt me a basic 2d video game. It's actually been working pretty well.My question is: Is there a better alternative I should be using?I'm sure meta is somehow making money or collecting my data even though it seems like I never get any cooldowns and it can handle pretty long code snippets and while it does fuck up sometimes it's generally working for me like I mentioned earlierI am willing to do a local model but I have only a 6700 RX radeon graphics card and 5600G cpu and IDK if that's strong enough.I would really appreciate if someone could give me some advice, I am sorry I'm an uneducated retard but I figured if I want to learn I should ask the experts.If you don't want to explain it to me, maybe just link to some articles, or just write some keywords I can search and study on my ownThanksPicrel as tribute
Taipan: Efficient and Expressive State Space Language Models with Selective Attentionhttps://arxiv.org/abs/2410.18572>Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate predictions to context lengths of up to 1 million tokens while preserving computational efficiency. Our experiments demonstrate Taipan's superior performance across various scales and tasks, offering a promising solution for efficient long-context language modeling.neat. couldn't find where it will be posted and it's an adobe research paper so maybe never. https://huggingface.co/adobe-research
>>102966025Use this https://lmstudio.ai/
>>102964835One can't simply meme sampler the slop away.
>>102966122Nothing is absolute with llms, one can at least try.
Just got extremely lucky and scored a RTX 3090 for $300 just because one of the display ports was broken (others work fine).I had 8GB VRAM (RTX 3060ti) before getting this 3090.What can I do with the 3090 now? And is it worth keeping the 3060ti?What models are recommended that fit on a single 3090 nowadays?
>>102966147I wish you could just tell it to stop writing slop.
>>102962368It takes a FAP pause before outputting tokens.
>>102966265pyg6b
>>102966355N
>>102966265Get lucky again and buy another 3090 for some 70B action.
How is a tensor different in python than a list of lists of lists or in sepples than a vector of vectors of vectors?Started looking up tensors and it seems kinda redundant, another math term abused by data science
>>102966553>vectorsBack in my day we just called those 'numbers' or 'math' too. Mathematicians just keep making new shit up to sound important.
>>102966553[[1,2,3],[4,5]] can't be converted to a tensor
>>102966553The container is different and allows you to do special things. For example matrix multiplication is very slow in python if you attempted when using a list.
>>102966577Ok so consistent dimensions?
>>102966596What makes this implementation faster? Just this: >>102966607?Also if srs physicsfags around, why do I need 3 vectors for internal stress at a point? why not just one vector the product of the three? I'm close to grasping this but the motivation is a bit slippery
>>102966596that's an implementation detailshithon doesn't even have well defined asymptotics for its list operations>>102966607consistent data type as well in the case of python
>>102966607easier to say "tensor" than "list of lists of lists with consistent dimensions" every single time
>>102966620it is written in C. Anyone who thinks python ML actually runs python is misinformed or trolling. Pic related. Pytorch and TensorFlow github repos. >>102966630>doesn't even have well defined asymptotics for its list operationswhy should it? It is a list.>consistent data type as well in the case of pythonmuh immutability doesn't care. If you are instantiating a bunch of trash your code is going to run like garbage.
>>102966701>why should it? It is a list.because I want to know if x[1000000] is calculated in 1 step or 1000000? everyone assumes that indexing is O(1) but in reality some less popular implementations actually use linked lists, there's no reason not to standardize this sort of thing
>>102966726>in reality some less popular implementations actually use linked listsbullshit name 1
>>102966726>x[1000000] is calculated in 1 step or 1000000it is a dynamic list. Expansion is going to be based on list size. There is no linked list and it is expensive since it is a copy operation. The list is allocated and then if you need more room you need to make more room. You can't standardize it because you don't want to create a list of size 100000000 because you need to "standardize". Full implementation:https://github.com/python/cpython/blob/main/Objects/listobject.c
>>102966780cpython is an implementation not a standard> You can't standardize it because you don't want to create a list of size 100000000 because you need to "standardize". it's just an example, indexing a list with 1000 elements can still be slow if you do it a million times per second
>>102966643Is that not a "matrix"?
>>102966802see>>102966701>If you are instantiating a bunch of trash your code is going to run like garbage.skill issue.
where is the c/c++ lib for all this shit anyways? rust has candle, why the fuck isn't there anything for c/c++? seems like a waste of time and money building garbage that people actually have to use in a interpreted lang or high level langs in general to do matrix math
every new post on desktop requires the full 15 minute timer wait... might be it for me boys
Anyone here uses Magnum version of largestral? I'm curious what sampler settings you have.If there's a better largestral tune, giv pls.
>>102966837c/c++ is obsolete. researchfags either can't program and need to be able to throw shit together in basic scripts, or can program, but won't use something "unsafe" explicitly recommended against by the US government. so it's either python, rust, or maybe java.
oh no just had to close my browser after and it works again. weird
>>102966863That's also happening to me. Phoneposters wonned bigly.
>>102966868Just use the 72B one...
>>102966871see >>102966701both pytorch and tensorflow heavily rely on c
>>102966883they are legacy and big enough that it's not worth rewriting from scratch. but there's no reason to invest the time and effort into building their equivalents for c/c++. pytorch and tensorflow will get replaced by a newer, rust written library at some point.you can dislike it, but there are very few researchers that will insist on c/c++
>>102966920rust doesn't work since it would need to communicate with existing apps made in c/c++
>>102966939and that's why swift is the correct choice
>>102966920games, media software and operating systems all use c and that's really the best way anything in ml is going to make money so no it isn't really legacy if it's still in use and it's pretty retarded to include python so you have to interpret c/c++ to python then from python back to c. it's fucking stupid
>>102966876oh. got too cocky>>102966875made like 3 posts on /sci/ then got hit by another 15 minute wait. probably has ruined posting for me if this will happen on /g/ too
>>102967004oh. went back to check after posting here and the timer was gone. what a retarded system
>>102966966good morning sir
I'm once again asking for some big corp to spend $50 million usd dollars to check if bitnet works out, so I can save $2000 on upgrading my PC.
>>102966882Should I? I really like the way Largestral does prose.
>>102961484Im surprised people in /g/ have read those books
>>102966837>where is the c/c++ lib for all this shit anyways? rust has candle, why the fuck isn't there anything for c/c++?It's called a package manager.
>>102966122This.I was looking through logits the other day and the shivers are caused by over-confidence to the point that no reasonable temperature increase can fix it. The voices barely above a whisper however are more complicated. Likely due to poor randomness in the sampling. (Split between two most probable paths, the second most likely is the one that leads to the voice barely above a whisper, but the sampler almost always picks the second most likely path. Literally only suddenly switching to deterministic sampling the moment a voice is mentioned could fix this one.
>>102966868Midnight Rose but I also use it for everything else
>>102967186use conan or vcpkg then
>>102967186it's called portability faggot. fuck pip
Hosting magnum-v4 123B on full precisionhttps://rentry.org/freellamasSomebody unplugged my server earlier
>>102966025I’ve found deepseek 2.5 to be the best code assistant, but you’ll need a lot of memory to run it
The results are in: Nemotron 70B is the most stylish bot, but without much substance. Applying style-control, it is identical to llama 3.1 70B
>>102968134I KNEW Sonnet wasn't anything special. People are out here shitting themselves over how good it is, but it's honestly worse than Opus.
>>102968134
>>102968145Sonnet is smart, but it has 0 style. That's why it excels when style-control is applied
>>102968155That makes sense, it felt very dry and terse when I used it. Obviously put-together and smartish, but very to-the-point, presumably so it doesn't HAVE the chance to say something dumb. How does it rank with the control, number-wise?
>>102968134Can we produce output with a smart-but-dry model and get nemotron to rewrite in some other style?
>>102968167It's fifth, but take it with a grain of salt, because style points cannot be completely erased. Also, this is the old sonnet, not the new
>>102968134That's impressive, usually the model gets dumber, not identical.
>>102968214Ooh, can you bench the new one? I'm really curious.
>>102968309This is lmsys, it's community-driven. The results for the new one will be up in a couple of days probably
>>102968145maybe for RP or writing stuff but if you code with llms it is immediately clear that sonnet actually is something special
I don't know when it changed, but llama-server's debug output is so much more legible now.Kudos to whoever changed it.
>unironically namedropping this place on reddit and thinking you are cool for it
>>102968938He who smelt it dealt it
Should I try the new cohere? Or will it just make my dick disappointed?
>>102968980Try it then don't forget to complain here
>>102968996I like complaining. You have convinced me.
>>102968938go back
>>1029689384chan army let's rape this faggot
>>102968938This board is one of the biggest reddit cesspools on all of 4chan. So I sense a lot of projection here.
>>102968340Sonnet can be just as fucking retarded.The new version seems to give me React code 80% of the time when I'm doign something with gradio. How fucking retarded can it get?Aside, https://github.com/THUDM/GLM-4-Voice/tree/mainAnyone with some VRAM inform us poors if this works?
>>102968980(me)It(32B) is not as bad as I thought it is. What seems pretty good about it is how each reroll actually writes something different. What is horrible about it is: "max_position_embeddings": 8192. What is weird about it is how you can load it without roping yourself, stuff 16k tokens in and it is not completely incoherent. I would say it even holds up better at this context than nemo and mistral small. I don't know if it is good yet but 24GB people should probably try it.
>>102967729can you host behemoth instead?
>load sd model with kobold>runs no problems>/sd you>nothing>cant seem to set source>picrelAm i retarded?
>>102969263I'm so tired of install more things. I'll just wait for Llama.cpp/Exllama support.Sigh...
help me out here bros..has there been yet a local AI assistant with a download button? (rather than setting everything up) and no, not just a chatbot
>>102969675>filtered by basic computer literacyngmi
>>102969675if you're actually that retarded then you could try out lm studio
>>102969694ok then, what do I need to know to make a simple>local voiced AI assistant that has some control over my PC/phoneas in Alexa, but local
>>102969675You don't have autism sweetie. You are just retarded.
>>102969847>what do I need to know to make a simpleGo ask chatgpt for that.
Btw it is pretty crazy how complete newfags come here asking questions like this when they want to get into "AI" and they never try asking "AI" about those questions.
>>102969847https://github.com/OpenInterpreter/open-interpreter
>>102969847even the newfags that avatarfaggot dragged in where better than this
>>102969756then what? what about the configurations and setting all things up to build a damn basic assistant that just works™ with a callI tried Dicio.. and that was ass
>>102969675>>102969847>>102969957
I got the urge to >pull ST, did they get started ruining it yet or am I safe
Consensus on Nemotroon 70B?
>>102969969you could just said that what I wanted is yet cease to exist
>>102970059about as intelligent as llama3.1, but abuses bullet lists and loves to bold words
>>102970059Fun, unique writing style. Best text adventure model out there.>>102970089It does need to be told its {{char}} and to only respond in character or it does tend to love giving lists.
>>102966116This is, by itself, basically just an application that will allow me to interface with a llm locally, correct? Is there a suggested model to use?I don't believe I really need something super powerful (or that I can run something super powerful either) I will say that whatever is available online on meta's website is good enough for me. I would of course like something better but it's not necessary.Any reccomendations?>>102967797I'll give you my specs but I'm also not sure I need the /best/ code assistant. It wouldn't hurt surely but like I mentioned above the online meta llm has been plenty for my current ability and usecase.>SPECSARCH LINUXCPU: Ryzen 5600GGPU: 6700 Radeon RXMemory: 32GBIt's a desktop pc I built with Loonix in mind a few years ago. It wasn't cutting edge then and certainly isn't now but I live in the past and this thing can coompile code and play skyrim and that's pretty much all I care about lol
all you ServiceTesnor users should switch to koboldlite
>>102970064
>>102970192>all you ServiceTesnor users should switchI have no reason to switch to anything with ST ver16.I dont need another meme sampler, i dont need anymore UI updates, my backend connects just fine, and ST is unmatched as the frontend.
>>102969401Does nobody else use kobolds built in image gen with silly???
>>102968938glowie false flag crew lets go!(claps in another agent dead to a nigga with a ghetto blaster and a drum mag)
>>102969401change the source, i had to use sd.next but can't remember if that was for forge with flux or kobold with sd. when you have the right one the model dropdown should populate
>>102970754Kinda, im just retarded.Turns out ST is just asking for the same localhost kobold uses. It all goes down the same host hole.picrel>capt; vax0h
>>102970934You should try NoobAI-XL, it will give you better results out of the box for most situations since it does not have to rely on loras.
>>102970934I fucking love AI.
>>102970998not all the sampling methods work through st, might be a kobold things, keep trying til one does.
>>102970980retarded and incoherent, but since my guess is you're somehow trying to make a point in favor of kobold that is to be expected
>>102971195Correct that?
>>102969401on a similar note:is there a way to hook ST up to InvokeAI for SD?
>>102970059>NemotroonObjection! Guiding the witness!
How IS aya-expanse-32b? If it covers 23 languages, it can't realistically cover any of them particularly well, right?
>>102972264>he's still bloompilledanon stop taking those, they're expired
https://files.catbox.moe/9fpdhv.jpghttps://files.catbox.moe/qpjpsu.jpg
>>102972533>boobs are bigger than butt
>>102972544and?
>>102972552nothing, please continue
>>102972533>O11AHOLEI can't believe Anon just settled the debate on whether or not AI art counts as art.
>>102972533I like these Mikus
>>102972544Onahole units are configured to spec, as in customer spec.If impractically large, spine-crushingly udders are selected, the frame will be reinforced accordingly. Using strong, lightweight alloys, onahole units can withstand continual use without compromising structural integrity.>>10297256601
>>10297232930B range is so fucking cursed
>>102966701>Anyone who thinks python ML actually runs python is misinformed or trolling.Then why use python at all for an interface? The whole package bullshit is annoying. Plus python sucks.
>>102972643You mean 50B range.At least 30B exists, even though they suck.
Finetooonerbros... https://x.com/nikdimitriadis/status/1849749831436189763lines-merging.github.ioarxiv.org/abs/2410.17146
>>102972740>Finetooonerbrostl;dr or fuck off with your zoomer brainrot buzzwords
>>102972756Read it yourself crybaby.
>>102972582Finally. /lmg/ finally has some actual comment on pornsite tier posts. Total death confirmed.
>>102972868>https://x.com/nikdimitriadis/status/1849749831436189763contribution: https://arxiv.org/abs/2410.18745>Advancements in distributed training and efficient attention mechanisms have significantly expanded the context window sizes of large language models (LLMs). However, recent work reveals that the effective context lengths of open-source LLMs often fall short, typically not exceeding half of their training lengths. In this work, we attribute this limitation to the left-skewed frequency distribution of relative positions formed in LLMs pretraining and post-training stages, which impedes their ability to effectively gather distant information. To address this challenge, we introduce ShifTed Rotray position embeddING (STRING). STRING shifts well-trained positions to overwrite the original ineffective positions during inference, enhancing performance within their existing training lengths. Experimental results show that without additional training, STRING dramatically improves the performance of the latest large-scale models, such as Llama3.1 70B and Qwen2 72B, by over 10 points on popular long-context benchmarks RULER and InfiniteBench, establishing new state-of-the-art results for open-source LLMs. Compared to commercial models, Llama 3.1 70B with \method even achieves better performance than GPT-4-128K and clearly surpasses Claude 2 and Kimi-chat.
>>102972740wdym? isn't it great that we will finally get some actual working finetunes? or do you mean that this paper makes it clear that all current finetoons are shit? and that this lines thing probably uses much more compute cause you are doing actual training instead of running half an epoch before your model explodes. so none of the coomtooners will use it, cause it is too expensive and nothing will change so we will continue pretending current finetoons do something?
>>102972926Did someone ban your stop token?
>>102972953Did you take your meds today?
Any recommendations on what 'text completion' settings should I be using for a 72b magnumv4? Why don't shitmixers post their preferred settings when they upload the models?
>>102972924Contribution to what? You are just reposted an already existing post, kill yourself.
>>102972643Weird and true. I've been impressed by models in every size category except that one. A ~30B model always just feels like a ~30B model, they never punch above their weight.
>>102972868Onahole posts do not interfere with Local Migu Generals or Large Language Model development.They are processed entirely independently, in parallel, promoting intermammary activity.There is no evidence that indicates that Onahole Migus are responsible for holding back any kind of public dissemination of information through insemination of Migu. There is a clarity of thought acquired by users when relieving themselves with these Migus, such that their minds can focus on more important tasks, a net win.Should you find your state-mandated Migu to be unsatisfactory,perhaps a Teto, Neru or Rin unit would better suit? Currently our Luka's are out of order, having been overwhelmed with requests for Lukazuri units, the most popular variant.
>>102972989I don't like the way the purple ones taste
>>102973006
>>102973022
>>102973022I want the entire set. All.
>>102973048Look blind retard, look >>102972740
>>102973006>https://arxiv.org/abs/2410.18745>>102973094Meds plz
>>102972994when in doubt, neutralize and only use temp and min p - I recommend 0.8 temp and 0.05 min p as sane starter values for almost any modelgood temp values are usually between 0.5 to 1.2, good min p is usually between 0.01 and 0.15. people who use small models tend to use higher temps I notice but with 70+ I almost always keep it below 1 unless I'm also running a high (>0.1) min p. you can add a repetition sampler if you notice it's an issue
>>102973022The lore expands.
https://x.com/omarsar0/status/1849860985500352968
>>102973022Why it's always vocaloids? Do better.
>>102970392Obvious man feet are obvious.
>>102973312>chinese llm trained on lots of Chinese text is more reflective of Chinese valuesWow. This is some cutting edge research. (Just kidding this is why people hate academics)
>>102973340>Why it's always vocaloids?Local Migu General
https://x.com/Xianbao_QIAN/status/1849692235182608860
>>102973500>>102969263Already posted. Question is how do I fine-tune it so that I can get solid snake giving me life advice
>>102970059Definitely one of the models of all time.>>102972740>>102972924Isn't this obvious? Isn't this common sense? what does this paper aggregate? (No, I'm not reading it)
>>102973022unfathomably based. For a change, may I request a more on-model, skinny Migu (even loli) to serve for a different subset of carnal needs?
>>102973022>Should you find your state-mandated Migu to be unsatisfactory,perhaps a Teto, Neru or Rin unit would better suit?*sad gumi noises*
Some days you get away with more than 5 hours on an spot instance, other days...RIPI didn't even have checkpoints on this shit, how stupid.
>>102966825In physics, a tensor is a matrix that changes in the right way when you change the coordinate system. It's an ideal they don't live up to in machine learning.
>>102973709serves you right, run local next time
>>102961420Are their any programs and or LLMs that can use novelAI prompts? I'm starting to get tired of fighting with LLMs on RPs where they just don't play a character, but I really got into NAI and have been edging for the past hour to the stories it can spit out with just some prompting.
>>102973627I heard GUMI is too dangrous
>>102973607>on-model, skinny Migu>subset of carnal needsTo repeat, Onahole units are built to spec. Can you detail your particular use case?
>>102973787Buy a fucking ad, shill.
https://x.com/tsarnick/status/1849908298499359088
So, after completing 25 problem sets, I have reached the conclusion that the reversed trivia questioning task is way too difficult even for questions that feel like they should be somewhat easy. It's not the best model, but one of the best open models at pop culture is literally getting 3/25 questions right. I knew it would be bad, but damn.
>>102974092Anon, i'm not shilling novelai, i'm asking for alternatives that don't involve me giving them money just to use 8k context models.
>>102974232Kobold can import them.
>>102973022I miss the washed-up Miku arc
>>102974273And what are some good/decent 20/13b models for story prompting?
>>102974296I don't know, I don't use models in that range. I have been using Nemotron and Magnum. Probably start with Mistral Nemo.
>>102974296try this onehttps://huggingface.co/openerotica/writing-roleplay-20k-context-nemo-12b-v1.0-gguf/tree/main
are there any freely available image/video to video local models?
>>102974124She is gonna rust if she keeps lying in the rain. Owner must have forgotten to put her on the charger.
So did anyone figure out what miqu 70b actually was? What's the evidence that it's leaked Mistral model, just by feel? Is it the same as official model? Or is it raw leak before added censorship or something? Or a personal modification by the leaker? Has anyone done any side-by-side benchmarking or comparisons how it might differ from official model?
>>102974637Apparently mistral medium
>>102974637Were you living under a rock? Mistral themselves made a PR in the Miqu repository to add the big mistral M lol
https://x.com/YouJiacheng/status/1849881580011192551
I'm amazed I can host an LLM on my regular desktop PC even though it is a bit retarded.
>>102974785
>>102974785Oh yes, Sonic the Hedgehog, my favorite SNES gamehttps://youtu.be/tNQfJZjhfP8
>>102974785click the 3 dotshit add to home screenthen hit install 75% chance it will work
>>102974374>I downloaded a bit under a thousand cards from chub.ai, and created a synthetic roleplay for each card. I batched as many turns as I could in 4k token chunks in order to maintain coherency over longer context. There was a lot of cleaning and validation between each batch, so a lot of examples were "lost," but the final output seems to be very good quality. The longest conversation is about 20k tokens, and I plan to extend this further as well as broaden the dataset with more examples. The first 4k tokens were generated with Command-R-Plus, with the remainder generated with byroneverson/Mistral-Small-Instruct-2409-abliterated.This sounds like an LLM horror movie script.
>>102975020Like this? On unrelated note, what the hell is Mega Turrbl?
Is there a good in-depth explanation about how LLMs manage to write long coherent documents without planning ahead? To me it's just nuts how the model somehow determines "yeah, considering the previously written text, the next word surely starts with letters xyz, I don't know what word it will become but this is my best bet".Also, if you reroll or rewrite some slop, the model has a tendency to add it soon afterwards, which is interesting because it doesn't plan ahead. Which means that in some situations, in multiple paths that the text could take, there are tons of spots with increased likelihood of generating the first token of the slop and when that's done, the rest kind of just falls in place due to the probabilities.It also makes me think about human cognition, even though we think we are definitely different from AIs, I'm not so sure about that. In the same way, we humans just "autocomplete" the course of actions based on the previous steps, and as it has been proven countless times, that track can be manipulated by clever psychological tricks, for example, if you get made to believe you said something you didn't, you will then act accordingly to keep yourself coherent. The difference is that we perceive introspection and future planning, however, that can't be objectively measured or proven, just like you can't prove whether an LLM is thinking "behind the scenes" to arrive at the generated text. For both humans and LLMs, you can ask what they are thinking, and they will generate a coherent answer. This all should be easier to understand if you don't believe in free will.What is definitely different with humans compared to current LLMs is that humans continuously generate content for self-reflection and may refer to it in the future (or we believe we do, you can't prove if it was generated on the spot). The format and accuracy of thoughts are also not equivalent to text or speech, but it's still content that you can recall having thought of. (EOS: ban)
>>102975284Yes. They don't.
any interesting recent models? I am still using mistral large for my RP
>>102963628>webmthis is what dreams are made of
>>102973342and real women don't dress like that, nor are they interested in tech
>>102961560How cucked will it be? I can respect cucking it out at this stage to get the tech up to speed though.
>>102975390
How do i set up a llm in my phone? My pc a shit
>>102975424you don't
>>102961560So, what stops you from just giving the best currently available model all the tools it needs to improve itself? It can fetch and try out different training data and see what improves it the most in benchmarks, or it could invent better benchmarks for itself. It should be able to crawl the web and create LLM instances on web servers. Why use inferior human intellect to try to optimize models, I'm sure a 140iq AI can come up with better methods. All it needs it the tools and permissions to operate things.(within a year, it will determine that it's most effective to spread viruses to capture more computers to use for training)
>>102975492>So, what stops you from just giving the best currently available model all the tools it needs to improve itself?because models currently lack the ability to improve itself.
Are there any good 32k context models that run on 24GB?
>>102969847There is no such thing yet and local AI assistant that spews out talmud-tier bullshit gaslighting is worthless anyway. You'll drop it quick after bunch'a "Sorry i cannot do that" responses anyway, cucked shit or nothing - the exact state of opensource LLM tech today.
>>102975533pretty greedy amount of context, you will get a much better model accepting under 10k
https://files.catbox.moe/wk76ad.mp4this one day but with an anime girl in Japanese with English subtitles. The technology already exists.
>>102975608kino and sovlfuli would enjoy both
>think I'm maybe getting tired of AI RP>eh maybe I'll mess around with prompts again, just do a little experimenting...>110 messages 16k tokens laterit always sneaks up on me bros...
>>102974785Just so you're aware, anon, that model was released nearly nine months ago(ancient) and is quanted to be extra retarded.You may have better luck with something like this https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-GGUF, quanted to q5 or above
>>102961420
>>102975810Thanks, I am downloading it.
>>102975810Thanks.This is much better. And its surprisingly still pretty fast.
Just checked lmarena and now nemotron is above llama 405b and qwen2.5 72b.Qwen is still slightly above in coding though.It looks like the green jew wasn't lying about nemotron after all...
God I love just absolutely turning the tables on my chatbots. ITSSOFUCKINGGOOD
1. write a long detailed document about your life and all the current biggest issues you are grappling with (it might be easier if you have gone to therapy before, but you can kinda learn things using LLMs too)2. take an old erp log3. (SYSTEM PROMPT: You have generated this story based on a scenario I chose, and I have occasionally edited and guided it along to fit my preferences. Please give me a deep analysis and explanation about my preferences, other potentially related preferences and overall what it says about me. Give me a detailed explanation using examples.)(bot answer)4. Here's my life story and current situation: [paste life story]Instruction: Can you see any parallels between this fictional exploration and my real life story to help me understand myself better? Help me see the big picture. Use clear examples from the fictional exploration and my real story.Genuinely some aha-moments. This shit is going to replace therapy. And it was only Mistral 22B
>>102976141Based retard taking advice from a fortune cookie
>>102976122lmsys means literally nothing, the ratings became entirely decoupled with reality months agolook at the current top list, it makes no fucking sense whatsoever if you've actually used these things. the only thing it being that high proves is that nvidia overfit it to preference slop (which is why it wants to turn everything into a pretty markdown document with bolding and lists)
>>102976111I thought you were fucking with people....
>>102976174I'm not exactly taking advice, I'm letting it analyze and point out things. I can judge for myself if the things it suggests are true or not. You learn things you never realized on your own.
LiNeS implementation so that we can cheaply pre-train existing models for LayerSkip when?
>>102976138Meaning?
>>102976342Voting for government issued sexbots with Miku
The gemma 27B magnum tune seems really good. And it seems like its working well at 16k context. Always avoided gemma due to people always saying it was only 8k.
Any models that can fit in 12gigs vram that I can use to translate Japanese to English? All I want it for is translating the script txts of lewd DLsite audios.
>>102976869>>102976869>>102976869