/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102987959 & >>102976869►News>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102987959--Multiple perspectives feature reduces bias scores across social dimensions:>102987976 >102988012 >102990681 >102988100 >102988132 >102988543 >102989070--LLM RAM Calculator helps estimate memory requirements:>102995865--Introduction to LLM sampling:>102990514 >102990633--Transformers.js v3 supports WebGPU and ONNX runtime models:>102989640--Open source models struggle with self-correction and output quality:>102993033 >102993349 >102994098 >102994218 >102994315 >102994381 >102994615 >102994913 >102995031 >102994946--Meta publishes open source music model but deletes weights:>102989563--Gpt-soviets trained on moe-speech sounds better than Tomoko:>102988359 >102988410 >102992236--Advice on using Koboldcpp to test different AI models and quantizations:>102988564 >102988717 >102988765 >102988805 >102988832 >102988754 >102988796 >102988844 >102988853 >102988849--iGPU and APU performance for inferencing compared to discrete GPUs:>102990238 >102990310--Tips for generating smut stories:>102989833 >102990006 >102990313 >102993566--Newfag guide to AI models and terminology:>102996361 >102996652 >102996683 >102996713 >102996779 >102997428 >102997550--Mistral, GPT-SOVITT, and improved models anticipated:>102997159 >102997173 >102997176 >102997185 >102997213 >102997203--LLM ERP capabilities poll results and discussion:>102993026 >102993613 >102993659 >102993982 >102993744 >102993800 >102994067 >102994113 >102994486--INTELLECT-1 training progress update:>102987982--GLM-4-Voice sounds great in Chinese, but may be cherry-picked:>102993857--AI censorship discussion and risks of different modalities:>102988230 >102988262 >102988351 >102988361 >102988387--Miku (free space):>102988359 >102989044 >102989186 >102991784 >102992608 >102996542 >102997941►Recent Highlight Posts from the Previous Thread: >>102989254Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
Where did the rentry go
>Flexora auto-selects which LLM layers to fine-tune, cutting training costs.>Average accuracy improvement: +7.21% on Llama3-8B, +8.33% on ChatGLM3-6B, +1.98% on Mistral-7B-v0.1https://arxiv.org/abs/2408.10774https://x.com/rohanpaul_ai/status/1850673624384168224
>>102998216>the real OP doesn't want to put it inIt's over...
>>102998257I guess I'll just post the latest one here: http://rentry.org/pcrkt9pa
Sovits server API is really a piece of garbage. You couldn't even send the reference audio over the request, you need to have it already on the server and provide the path to it. Patched that and got the inference working on a server CPU at last, miss me with that gradio shit.
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Traininghttps://arxiv.org/abs/2410.19313>FP8 training has emerged as a promising method for improving training efficiency. Existing frameworks accelerate training by applying FP8 computation to linear layers while leaving optimizer states and activations in higher precision, which fails to fully optimize memory usage. This paper introduces COAT (Compressing Optimizer States and Activations for FP8 Training), a novel FP8 training framework designed to significantly reduce memory footprint when training large models. COAT addresses current limitations through two key innovations: (1) Dynamic Range Expansion, which aligns optimizer state distributions more closely with the FP8 representation range, thereby reducing quantization error, and (2) Mixed-Granularity Activation Quantization, which optimizes activation memory using a combination of per-tensor and per-group quantization strategies. Experiments demonstrate that COAT effectively reduces end-to-end training memory footprint by 1.54x compared to BF16 while achieving nearly lossless performance across various tasks, such as Large Language Model pretraining and fine-tuning and Vision Language Model training. COAT also achieves a 1.43x end-to-end training speedup compared to BF16, performing on par with or surpassing TransformerEngine's speedup. COAT enables efficient full-parameter training of large models on fewer GPUs, and facilitates doubling the batch size in distributed training settings, providing a practical solution for scaling large-scale model training. https://github.com/NVlabs/COATGit isn't live yet. another step towards FP8 training runs becoming more common
kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi
DreamCraft3D++: Efficient Hierarchical 3D Generation with Multi-Plane Reconstruction Modelhttps://arxiv.org/abs/2410.12928>We introduce DreamCraft3D++, an extension of DreamCraft3D that enables efficient high-quality generation of complex 3D assets. DreamCraft3D++ inherits the multi-stage generation process of DreamCraft3D, but replaces the time-consuming geometry sculpting optimization with a feed-forward multi-plane based reconstruction model, speeding up the process by 1000x. For texture refinement, we propose a training-free IP-Adapter module that is conditioned on the enhanced multi-view images to enhance texture and geometry consistency, providing a 4x faster alternative to DreamCraft3D's DreamBooth fine-tuning. Experiments on diverse datasets demonstrate DreamCraft3D++'s ability to generate creative 3D assets with intricate geometry and realistic 360° textures, outperforming state-of-the-art image-to-3D methods in quality and speed. The full implementation will be open-sourced to enable new possibilities in 3D content creation.https://dreamcraft3dplus.github.iohttps://github.com/MrTornado24/DreamCraft3D_Plusbig improvement over the original in how long it takes to generate a model (from 3 hours to 10 minutes)
>>102998360I wonder why they're not exploring FP2 to FP12 models like Aphrodite engine did.
>>102998364
>>102998427A discord chat / reddit meme.
>>102998171Based.
>>102998444
INTELLECT-1 is at 29.50% complete, up from 27.60% last thread.
>>102998415>exploring FP2 to FP12 models like Aphrodite engine did>like Aphrodite engine didThis project really likes to copy other people's code and take credit for it.
>>102998559Nah retard he really did it himself
>>102998564hi Alpin
>>102998596cope
>>102998616hi cope
>>102998483>perplexity plateauing at 6.75grim
>>102998564>this file is copied fromhttps://github.com/vllm-project/vllm/pull/8751/files#diffLiterally most files are taken from this Github and the DeepSpeed library:https://github.com/usyd-fsalab/fp6_llm>We propose TC-FPx, the first full-stack GPU system design scheme with unified Tensor Core support of float-point weights for various quantization bit-width (6-bit, 5-bit, 3-bit, etc.)>FP6-LLM is already integrated in DeepSpeedBut suddenly Aphrodite takes all the credit. What a piece of shit.
>>102998623Not that I think this is going to be any good but the perfect coombot shouldn't have best perplexity cause that would lead to zero variety + lots of slop.
>we made it easy to support different quantizations>suddenly Aphrodite takes credit of the entire research
>>102998635>>102998674What the point of that research when it's not used pratically?
>>102998216>>102998257>>102998265>whole thread works together to make something for the benefit of the whole general>discord OP doesn't go with itTale as old as time
Newfag here. Seems like most people running local are using it to generate images which I don't give a shit about.Does anyone run one locally for code generation or analysis? Trained from stackoverflow or a codebase or something?
>>102998265I saved it for myself. OP is a fag as always
>>102998716>Does anyone run one locally for code generation or analysis?Yes. Deepseek is great for code and logic type work if you have the resources to run it at a high quant. coder 2.5 q8 is my daily
>>102998705But dare you bake a thread without miku pic.. be prepared for extreme shitstirring.
>>102998703>not used praticallyIt's already in the deepspeed library, and vLLM used it through the library for the FP6 support. Copying the files and adding the other combinations doesn't give you the right to take credit for everything, asshole.Go fuck yourself, grifter.
>>102998756you're just jealous
>>102998705>Whole threadNo it is just you newfags. Can you make /local c.ai exodus general/ please and fuck off?
>>102998747I wish mikufaggot prime would dox some newfags.
>>102998811>making a list of recommended models together to improve the general le bad and newfag... because it just is ok??
>>102998811This general is literally a offspring of aicg, stop denying the roots
>>102998820It is because oldfags know that finetunes just make models retarded or at best change style slightly. There is no objective best style. And if you are an oldfag you just download new thing and check it out yourself very quickly. By new thing i mean base models or instructs of course.
>>102998838>because oldfags know that finetunes just make models retarded or at best change style slightly.retards you mean, or if by oldfag you mean someone who only ever used undi models
>>102998834The roots of my country have nothing to do with me not wanting some filthy migrants.
>>102998838Here, try this one side by side with regular qwen2.5. Then tell me if its dumber / not any different:https://huggingface.co/EVA-UNIT-01/EVA-Qwen2.5-72B-v0.0
>>102998845Hi drummer. Buy an ad.
>>102998838The concept was to make the model table bigger and separated into best usecases in the future, and more importantly to give the average /lmg/ browser just something good to just use instead of constantly switching modelsWhile the rentry is still barebones, it went from crap month-old placeholders to 6 good model suggestions in just one threadI really don't get the seethe and hate at the concept, both from half the thread and OP apparently
>>102998739What do you use it for typically?Can you tell it to do something like:>give me a function that does x
>>102998867All the more reason for you to make your own refugee camp.
>>102998874>my seethe at a rentry has no footing so I'll just tell you to leave
>>102998870>give me a function that does xYou can use it for that. Also: you can ask for a full program, you can copy a program into context (if you have enough) and query about it or get it to make improvements or explain things, you can pair-program with it starting from ideation right to working product including build-chain/makefiles, you can copy the schema into context and get it to generate complex sql queries with a high degree of sophistication...for just a few ideas of what you could use it for.It is very good at instruction following in my experience. Better than largestral and a bit worse that 405b.
>>102998927>you can pair-program with it starting from ideation right to working product including build-chain/makefilesInsane.Ever use it to write tests? Or have it generate code from the tests you write?Also does your setup you run it with a GPU? Or can you do CPU only? My proxmox server only has an iGPU so I would assume it's dogshit, but worth an ask
I want to branch out a bit more for the culture benchmark. What memes do you want your LLM to know about? I've already included the iToddlers BTFO meme.
>>102998999Include ebussy
>>102998867What part of buy a fucking ad is hard to understand?
>>102999135dont feed the trolls
>>102999149Upvoted!
>>102999135You created the Rentry to make money with the idea of selling the spots in it.The existence of the Rentry will also make the thread be astroturfed to hell and back, everyone will keep spamming their models to appear as "organic word of mouth of the thread" to be put in the Rentry. This is a tactic that has been used before by Sao.
>>>102999135>You created the Rentry to make money with the idea of selling the spots in it.>The existence of the Rentry will also make the thread be astroturfed to hell and back, everyone will keep spamming their models to appear as "organic word of mouth of the thread" to be put in the Rentry. This is a tactic that has been used before by Sao.
>>102999179Go back to sharty, incel.
>>102999186after you take your meds
been out of touch for a yearwhat's the best uncensored model around 7B?
I saw people using multiple graphics cards to get more of the GGUF model on VRAM. Adding another 8GB VRAM card didn't scale as expected for me. With one card, I offloaded 15 layers (~6.5GB) to VRAM. Using two cards with Koboldcpp, I offloaded rougly around 24 layers (4.5GB each), but beyond that, I get OOM. It's not going to scale at 100%, right?
>>102998171anybody got xtts2 running on win10?
Small update on proofing of concepts for RPG Maker MV based LLM front end. I figured out how to put the LLM response into an in-game message box now. (you also have to manually code in your own word wrap function because for some reason it can't do that shit automatically) Thankfully it uses unispace text so it's pretty easy to deal with. Although justifying the text so that it doesn't look like shit is a whole other animal It's also really janky in that it only allows a maximum of 12 lines per script, so you basically have to minify everything.But I figure the way to do it would be to just make a big library of bite sized functions for everything and then use event calls in the games own built in event system to call them up.
>>102999419yeah
>>102998739>>102998927Are you using this? https://github.com/QwenLM/Qwen2.5-CoderFrom what I can tell DeepSeek is cloud only? Can you run it locally?
>>102998971>Also does your setup you run it with a GPU? Or can you do CPU only?>My proxmox server only has an iGPU so I would assume it's dogshit, but worth an askI only use my GPU for context processing and infer on my cpu. You need a lot of memory and decently fast ddr5 to have a chance at using big models effectively.
>>102999509>Are you using this? https://github.com/QwenLM/Qwen2.5-Coderno, I've never had much luck with Qwen and haven't seen any convincing results from any of their recent models>From what I can tell DeepSeek is cloud only? Can you run it locally?Yes you can run it locally if you have the memory, and since its an MoE, the inference performance is relatively good for the results you get using cpu (3x the speed compared to largestral)
>>102999463no
>>102999419>>102999463>>102999734Why are you like this?
>>102998855NTA but I did. Getting outbidded hard though.
>>102999751stop samfagging
something went wrong with anthracite's datasets after v2for all model bases, the v2 version is always smarter than the v3/v4 version
You guys aren't ready for what's coming
if they were genuinely close to AGI then people wouldn't be quitting, even if they were scared, because they'd want to be a part of it and maintain influence over itso the quitting doomers are just engaging in the usual EA/rationalist speculative retardation, AGI isn't soon
>>102998705>Discord You can tell it's true just by looking at certain posters ITT.
Good night /lmg/
RIP my catto. He literally just passed away.
>>103000026condolences
>>103000042It's too late at night/early in the AM to arrange for a cremation right now. Is it weird that I feel too paranoid and weird about putting him on ice before rigor fully sets in?
>>103000026>toxoplasma gondii incubator diedOh no, anyway.
>>102999437Not sure what you are trying to achieve anon. Hate to break it to you but isnt that kinda useless?Why would i run a rpg maker game where I then have inference from my typing something. Did you think that through properly?Rpgmaker would be perfect if you can show a llm the tileset and make it create a map. Still lots of fuckery for the json of the map.The new sonnet 3.5 can do it KINDA. First llm that i could show a picture of the tileset and say tell me the x/y where i should place it. make a diverse map.So currently this stuff fails at the start. With no maps no events. Premade maps are boring.
>>103000085>Why would you do something difficult when you can do something easy?Because I have a dick.
>>103000100I think you misunderstood me. Why would anybody use this?The reason rpgmaker would be great is for creating a premade (coherent) story thats connected.Why would you run inference through rpgmaker lolThis is like the guy who made npcs in skyrim talk through gpt. Its a gimmick and nothing more.
>>103000085nta, but...wut?Just so I'm clear: you're proposing that one should use an LLM as a way to do things classical algorithms already do better and berating anon for using LLMs for convos/text gen because that's "kinda useless"?
>>102999991Neat! I'd love to have that for my VR waifu.
>>103000116Because a few weeks ago I jokingly asked about the viability of creating an inferencing front end entirely in RPG Maker MV (back at the height of the SillyTavern drama). And now I'm going to make it a reality. Because I fucking can.Not because it's a good idea.Not because it's the best practice.But to make people like you seethe about its very existence.
How are translation models? I'm partially interested in translating shitty Japanese h-games.Accuracy is whatever so long as it can form coherent sentences and do it in a timely manner.
>>103000125Why would I run inference through rpgmaker? That doesnt even make any sense.How will the story be created for example? On the fly? Does NPC A know what NPC B said?Is it just a premade map and then just let the llm go from there? Just use sillytavern and RP. Why the need for rpgmaker is what I am asking. What I am saying is it that rpgmaker would be perfect to give the llm a prompt and it will make maps, events, etc. Only 1 map with a bit of dialogue would be really cool. Create images with flux to spice shit up.But currently it already fails at the start with the map unfortunately.LLMs are terrible with this. Sonnet 3.5 got a huge improvement but still nowhere near enough.>>103000137If its something for yourself and you fuck around I don't care, but you post updates for other people.Obviously I assumed you wanted people to use whatever you make.>>103000165You would need to look at f95. There exists some tool that already uses chatgpt for translation. Its pretty good but the slop smells. If you point to local you could translate I suppose.Keeps important stuff in context already etc. I thought gemma27b is good at translating for a smaller model but its cucked unfortunately. Not sure how well it will take ero content. Probably write in a style that not hot at all.
>>103000165> How are translation models?That reminds me…did lcpp get plamo 100b support?
>>103000181You are mega gay bro
>>103000193So are you for replying anon.
>>102999425pls respond. Why can I use more vram for a single card but less vram per card when I go multi card? The language I see everyone use is "I have 32gb vram (8gb+24gb)" as if it stacks up. Do I need to run llama.cpp in cli for that kind of efficiency?
>103000198Such a great way to own yourself lfmao
are used 3090s still the best now?
>>103000198>>103000193I'm actually gay and give you both the seal of utter faggotry.
>>103000207I don't know. I have 4 but I'd be weary about buying more.They're still the least overpriced relative to what they offer but their price should have declined a lot more by now than it has. They're getting rather old now which means long term reliability concerns beyond ebays 30 day money back policy.
>>103000236>I'm actually gayNo need to make this more cringe than it already is.
>>103000000
>>103000165Best is still the usual suspects which are the online models. Mistral and Nemotron are on top for local which shouldn't be surprising but what is surprising is that Gemma is surprisingly good for its size and punch above their size with the 9B and 27B models many times larger.
>>103000247Until AMD becomes more competitive with ROCm, 3090s aren't losing more value anytime soon. I expect it to slowly atrophy until AMD or Intel has a solid performance GPU that has 24GB that is cheaper and runs AI faster than 3090s.
>>103000322Isn't AMD desperately trying to become competitive in the AI training department.
svelk
>>103000335on the enterprise level. None of them give a shit about hobbyist AIMi300X is a very popular option for doing full finetunes. Doesn't quite match an H100 in terms of compute but 192 gigs of VRAM is 192 gigs of VRAM. And full fine tuning is quicker than lora training.
>>103000247>>103000322There's currently no competition in that niche, and NVidia has already cut production of the 4090. Those expecting the release of the 5090 to lower prices on the 3090 are delusional.
I can imagine that Huang cannot sleep at night, haunted by the memory of his greatest mistake: releasing the RTX 3090 with its excessive VRAM.
Than god there are no women in /lmg/ that the refugees could rape.
>>102998171
>>103000026:(
>>102999761Kek. What's the point of buying an ad for models you can download for free?
>>102998999At least all the ancient memes from here. If they did their job properly it should have been included in the dataset
>>103000484Because he has a crowd funding link on his HF profile. But he bought an ad. So he's a legend.Unlike the ones who inorganically shill their shit here all day long and don't buy an ad.
>>103000181You can actually do better by including function calling which is supported by llama3.1/3.2 https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/#user-defined-custom-tool-calling it's supported in vllm (not in llama.cpp afaik) https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#tool-calling-in-the-chat-completion-api
Is there an extension or something that works with sillytavern's and SD forge that allows the local model's output to be parsed and turned into a pony-friendly output that uses booru tags and pastes them onto forgeUI? I dont know how to make extensions and I have zero coding experience and just recently learned how to even write python shit. I wanted to use o1 to hold my hand through the entire process of making my own extension if nothing like that exists.
will anon buy 50series?
>>103000785Sure, when 70s come out I'll buy a 5060.
I am at the cutting edge of AI baneposting.
>>102999425>>103000200Generally speaking it is better to have the VRAM on a single GPU vs. spread out over multiple GPUs but the increase in the number of layers that you can fit should be roughly linear with total VRAM.If you use the exact same model and settings for both cases (context size and KV cache quantization most relevant) and there are no other applications consuming relevant amounts of VRAM that should not be happening.
>>103000785The value of GPUs for AI is primarily determined by their VRAM-to-price ratio, along with reasonable performance and adequate support. I do not expect the 50 series to fare well in these metrics.
good morning /lmg/. are we still at the peak of newfag infestation wave?
>>103001074One(1) "backend confused" newfren made y'all shit your pants for 20+ hours, think about it.
>>103000785yes
>>103001105you need to go back to c.ai. all of you
>>103000207https://www.digitaltrends.com/computing/its-time-to-bid-farewell-to-nvidia-rtx-30/
>>103001131"I talk to chatbots, but LOCALLY" is not a hobby worth gatekeeping.
>>103000903Nice tinyllama you got here
>>103001148yes it isyou're not wanted hereget the fuck out
>>103001148Don't forget limited context, too.
>>103001107Restraining the Pochiface in a closet until it dies from dehydration.
>>103001193What the fuck are you talking about, schizo?
>>102998171Fall into something new
>>103000932Thanks for acknowledging that vram usage scale near linear. It should be similar with either gguf or exl2, right?With that, I can focus on my settings and have more confidence getting another card.
>>103001393>It should be similar with either gguf or exl2, right?I'm not particularly knowledgeable when it comes to the internal workings of ExLlama but I don't see why it would be different.One thing that could as of right now make a difference with llama.cpp and derivatives: If you set --split-mode row the scaling will not be linear because as of right now the KV cache is only on a single GPU.
>>103001334Fall into a puddle of Leaku pee
>>103001699>dystopian robot referenceBased
>>103001699>not a cardaw
How CPU affects on performance (in general) in GPU inference and GPU training?I only found that link for inference where saying that it have almost zero effect:https://www.pugetsystems.com/labs/articles/effects-of-cpu-speed-on-gpu-inference-in-llama-cpp/And for training I see that if CPU have doesn't enough cores (at least 4 core per 1 GPU) and memory channels it would bottleneck training.How would affect GPU training for example difference between 2 x AMD EPYC 7763 64-Core and 2 x AMD EPYC 9965 192-Core?In the raw benchmark it is more than twice faster:https://openbenchmarking.orgDoes anon knows any other benchmarks/links?
>>103001952Thanks anon, I really appreciate.
What's the current best poorfag CPU model?I want something to talk to while waiting for a bitnet model to come out.
>>103002035Mistral Nemo 12b or one of its finetunes.If you want really fast, olmoe, but it has low context.But i'd still buy even a 16gb gpu if i were you... just in case...
>>103002035That thing doesn't even look like Miku, wtf are they smoking to pretend it's her?
>>103002097
>>103002035pyg6b
>>103002127Okay zoomer
>>102999437That's a really cool experiment anon. See how you can take the concept.>It's also really janky in that it only allows a maximum of 12 lines per script, so you basically have to minify everything.What the fuck.Can you at least break things into multiple script files?
>>103002097>That thing doesn't even look like Miku, wtf are they smoking to pretend it's her?
https://github.com/ggerganov/llama.cpp/pull/9702>added implementation of DRY sampler (post-refactor) #9702Was merged in on Friday.
>>103002205yeah, you can chop everything up into function definitions and function calls and attach them to common events as long as they are attached to the window object, otherwise it doesn't treat the scene as a contiguous environment.
>>103000181not that anon or any other involved in this conversation or project yet, but i am excited about anon's research.it would be cool as heck to be able to have a conversation with my LLM by walking around and interacting with an SNES looking map.sure, an RP experience would better just using a text to text interface. but think about it the other way around, this could embetter rpgmaker experiences.
I asked in a prior thread about llama-server's behavior when receiving a prompt larger than the configured context size : >>102991521>Back in the day llama.cpp server would crash if you tried to stuff a prompt larger than the configured context sized into it, it no longer does that.>Is it safe to assume that it's simply cropping the context at the top?>Is there a reason one would want to do that instead of just setting the correct prompt size in the frontend software?