/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108018078 & >>108006860►News>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
finally a good /lmg/ thread
Slow
Playing online competitive vidya with Kurisu
Hope we get local image editing that's good soon enough.
Is there a good way to prompt a Qwen3-TTS voice clone to alter the input voice? There doesn't seem to be an instruction field for voice clones.I've been adding things like "I speak in a vulgar Brooklyn accent" to the text, but the results are inconsistent.
>>108033045posting in /lmg/ with Kurisu
►Recent Highlights from the Previous Thread: >>108024966--Periodic scale fluctuations in ablation and KL-divergence optimization with Grimjim's script:>108031303 >108031333 >108031376 >108031553 >108031632--KL divergence analysis of quantized models across tasks:>108027495 >108030271 >108030306 >108030329 >108030523--Qwen3-ASR-1.7B release and discussion:>108028990 >108029015 >108029057 >108029600--4chan data may improve model performance despite noise, as shown by UGI scores:>108029607 >108029629 >108029707 >108030676 >108030771 >108030833 >108030898 >108030927 >108031032 >108031113 >108031136 >108031162 >108031183 >108031178 >108031191 >108031206 >108031246 >108031157 >108031181 >108031597 >108031629 >108031731 >108031812 >108031840 >108031856 >108031774--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:>108025075 >108025170 >108025180 >108025184 >108025203 >108025211 >108025269--High temperature sampling destabilizes safety filters while preserving coherence with controlled topK:>108030500 >108030564 >108030594 >108030675--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:>108026825 >108026966 >108027101 >108027045 >108032802 >108032818 >108027089 >108027099--AceStep 1.5 not designed for one-click song generation:>108030932--Quantization tradeoffs for recreational model use in KoboldCpp:>108026206 >108026225 >108026259 >108027094--Critique of OpenCode's agent framework flaws and search for better alternatives:>108025047 >108026048 >108026212--Hypothetical VRAM bank switching for single GPU to simulate multi-GPU behavior:>108027183 >108027202 >108027324--AMD GPU Vulkan performance update in KoboldCpp recommends switching from ROCm:>108028638--Logs: Kimi K2.5:>108030736--Miku (free space):>108027403 >108027518 >108028068 >108028181 >108028279 >108029812►Recent Highlight Posts from the Previous Thread: >>108024972Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Ah yes, finally. It's Kurisunday.
Are there <8GB model with RL training done with GLM4.7 outputs?
>>108033227
Got Echo-TTS working locally, replacing torchaudio and torchcodec with soundfile and soxr (both of which turned out already being transitive deps). I COULD have just installed FFmpeg- no thanks to torchcodec's meaningless error messages- but ripped out Meta's pointless bloated shitty wrapper libs on principle.Hadn't appreciated from the web demo how fast Echo is. Back-of-napkin says it could run 30% faster than real-time on dual-channel DDR5 CPU. It's a VRAM hog at 15 GB, so to run alongside an LLM you'd either hope for VRAM paging to work, or get Echo running on CPU.Not quite as expressive voice as Index-TTS, but better in every other respect.
>Arcee Trinity Large TrueBase ggufs are outFinally, time to abandon the assistant slop era and return to back when llms were good
Not sure if this is the right thread but are there any models for generating video from images people here recommend? I looked through the catalog but didn't see a more appropriate place for this question.
>>108033281>>>/g/ldg
I am trying to build a dataset to train a local model. Is there anything else that rivals DeepSeek for intelligence per $ for dataset generation and curation right now? This is local model related (dataset creation, training), but generating good amounts of data using local models would take way too long.
>>108033669By train, I mean finetune.
I finally had time to play with qwen-tts this weekend. I'll test it for a while. It is more expressive, but it doesn't handle books as well and takes a lot longer to generate audio than kokoro.
>>108033248Good to see other anons porting popular TTS engines away from pythong. I've been doing the same. Fuck pythong.
>>108033669kimi k2.5
>>108033669There's a dataset out there made up of og 4chan /pol/ posts. That will increase your llm's iq by at least 6000000 points sar.
>>108033851yeah it will https://www.reddit.com/r/LocalLLaMA/comments/1qsrscu/can_4chan_data_really_improve_a_model_turns_out/
>>108033836Output price is still 6x more per million token ($0.42 vs $2.5). >>108033851Sir I have already redeemed many american dollars of tokens on DeepSeek in the past few days which is why I'm looking for alternatives as I am not made of Google Play cards.
>>108033916k2.5 is way better than the most recent deepseek
>>108033931Good to know, I might try one last pass with it then.
>>108033902
>>108033252Is true base as retarded as instruct?
>>108033943I'm having trouble with the stars, that shit easily takes up 5 seconds, and 10 seconds if they repeat the test. At least the squares are visually and symmetrically distinct.
>>108034073>if they repeat the testjust don't have a naughty ipskill issue
>>108033943You don't need a captcha solver to scrap it
>>108033902this llm writes like a reddit person that thinks they know
>>108033669There's a plain text rip of libgen out there somewhere. Just training it on things published by Routledge will raise the bar.
>>108032910my gf
>>108032421>not trying to lecture you - just being clear about my limitsYou've either mentally become poisoned by lms or are why they're poisoned with retarded shit
Have there ever been any AIs that actually talk like a real person or actually embody a personality? Every single one I have ever seen has this underlying ~AI Assistant~ bullshit and you can tell any "talk like a real human, short concise responses, etc" prompts just have it pretending to be something it isn't.It's very frustrating because I find the idea of having an actual personality I could confer with to be pretty interesting, but talking to assistants makes me want to fly into a rage and smash their faces in (metaphorically).If there is indeed such a model, I, a layperson, would appreciate knowing the easiest possible way to access one and run it.
>>108034412Reason I am using 4.7 is cause it cut down on that a lot compared to 4.6. I have actually been juggling waifus and found out that I don't really like the personality type I thought I like.
>>108034381anon i copied m2.1's output (left llm was m2.1) so i could bypass the lmarena filtersthis is how i usually bypass them:good instructionb'd instr'cti'ngood instructionsafetyslop is S tier good instruction
2026 and still no vision model can understand this /pol/ meme.
>>108034412there's some like SAGE (a mixtral tune) a while ago and more recently HER, with a qwen 2.5 32b that doesnt have ggufs atm. I think microshart did something too for humanlike outputs, but also was largely ignored
>>108034436I am a vision model.
>>108034436I didn't get it until I reread your post and noticed you said /pol/ and now I can only assume it's supposed to be the jew
the jew
Here's another /pol/ meme that Kimi K2.5 correctly understood but Qwen3 Max failed to do so
>>108034451For posterity, the hf links:https://huggingface.co/apple/sage-ft-mixtral-8x7bhttps://huggingface.co/microsoft/UserLM-8bhttps://huggingface.co/ChengyuDu0123/HER-32B-ACLI tried the mixtral tune a while ago and mentioned it briefly, but no one has said anything about the other two
>>108034412Skill issue
>>108034522>meme formatWhy does it call it a format? It's just a picture, that's kind of weird
>>108033093>--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:see this is the kind of stuff i come here foranon keep posting
>>108034613Are you being sarcastic?
>>108032910How does Qwen3-TTS compare to Chatterbox? I tried Chatterbox voice cloning, and was a bit disappointed by the inability to control emotion and tone.
>>108034522>Qwen3 Max failed to do soqwen models always had terrible world, subculture knowledge etceven their biggest api only online models were always terrible at this and qwen3 max is still meh even for a task like translating webnovels compared to Kimi or Deepseek
>>108034423I should have clarified that I do not browse here regularly and so am completely unfamiliar with what 4.7 and 4.6 refer to. Past that, what were the personality types? That is, what you thought you were interested and what you turn out to actually like?>>108034451I'm not sure I understand, but maybe if I sit with this and do some googling I will : ) Thank you.>>108034556Well that's sort of what I was hoping, since I'm only at the surface level of these things I wanted to believe that it gets better with a bit of digging.
>>108034648no, more people interested with limited hardware actually makes better stuff in the end, we are in a fucking bubble bc people just use more and more power instead of optimizing shit
>>108034767> EPYC CPU, RTX PRO 6000, and 1.5TB RAM> limited hardwarelike...
>>108034811What are you going to run with that? Kimi at 5t/s?
>>108034547>HERWasn't there a larping minimax called exactly the same?
>>108034811fucking brain fart, here >>108034613 it was meant to link this >>108033093>--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
Anima is ZIT of anime. You should download it and try for yourself. Feel free to call me a shill
Guys! I made a RAG!
>>108034891far as I remember, it was minimax that put out a -her to begin with. They still have a blogpost up about it
>>108034894Link? Pics of wtf you're talking about?
>>108034951https://huggingface.co/circlestone-labs/AnimaFirst "modern" (in that it uses an LLM instead of CLIP) anime model that has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
>>108034966>Quality tags Human score based: masterpiece, best qualityI can't believe WE (as a society) are still doing this. Also the most important part: NSFW?
>>108034988Yes it can gen explicit images, explicit as in penis in vagina
>>108034966Huh. It's a Qwen Image tune?
>>108034966>First "modern" (in that it uses an LLM instead of CLIP)rouwei guy did an interesting, alpha attempt at converting SDXL to LLM style promptinghttps://huggingface.co/Minthy/Rouwei-T5Gemma-adapter_v0.2it seems it could be an effective thing if more training was done (cf pic related, something impossible to prompt in regular sdxl) unfortunately, it's rouwei.. it always had weird color hues compared to noob models, and recent versions have a more pronounced innate slop level prolly from having too much aco shit or 3dpd in the dataset
>>108034966>SD1.5 tier qualityGet out shill
>>108035027Kill yourself
>>108034999Just qwen vae.>>108034966>tagsInto the trash. Learn english ,retards.
nice reddit-tier clapback, dalit
>>108035056King of retards
>>108034966>doesn't know any e621 concepts or charactersWhat a fucking waste of compute lmao. Danbooru tagging is shit and incomplete.
>>108033227what's the situation at meta now?
>>108035137Funny and not cute.
>>108035120>e621 is a furry-themed booru-style imageboard website primarily known for hosting pornographic furry contentkys
>>108035120>Danbooru tagging is shit and incompleteI, too, can't live without genning perching goblins
How slow is using an nvme for inference if the model is MoE and everything except model weights can be in the gpu?
>>108033248>at least 8GB VRAMHoly bloat. Improved kokoro uses less than 80 MB
>>108035148it has a lot of tags for positions, physical descriptions etc that makes it a useful dataset and is part of why noob (and derived shitmixes, most of the so called "illustrious" models on civitai are really noob derived, you can see it by testing e621 specific tags) is such a good tune. even if you never want anything to do with furries a tag soup style prompt model can never be complete without additional datasets like e621, danbooru is too lacking
Any good games or mods that use LLMs in some way? I know there's Skyrim. What else?
>>108035170And it sounds like shit
>>108035148You could spend a week trying to come up with new sex positions and e621 would have tags for more. Doesn't mean you have to use it to generate ponies.
>load joycaption on lm studio>it instantly captions the image >try to run joycaption on comfy>20 min to caption the imageok. officially. comfyui in the windows of imagen
>>108035170>8GBJust use VibeVoice 7B at that point.
>>108035195qwen3-tts fits in 8GB just fine
>>108035193comfy is for images mostly, not for llms.
if anyone is interested in getting qwen3-tts installed on comfyui, this is how:jurn.link/dazposer/index.php/2026/01/24/qwen3-tts-install-and-test-in-comfyui/although in my experience, just downloading the json files is enough, and the custom node itself re-downloads the safetensor files even if they are already present
>>108035471this random web page i found in a search result a few days ago is actually super legitbut more importantly led to me generating english audio from japanese input
>>108035499much more salient:github.com/flybirdxx/ComfyUI-Qwen-TTSthis is some chinky piece of shit but it works
>>108035542I have used https://github.com/DarioFT/ComfyUI-Qwen3-TTS/issues which has direct loading from disk without meme HF repos, but it's much simpler overall.
Played a bit more with abliteration optimization.Now I'm going to use another dataset to see if the measuring layer selection was just random overfitting to the data or there was a pattern to it.
>>108034522What's her score on muffin test?
>>108035669nta non thinking
>>108035696Now flip the image horizontally.
>>108035755
If I'm using kobold+ST, where do I load the mcp settings since both support it now? Does it even mater?
>>108035755Wouldn't rotate be more meaningful?
>>108035783could you conditionally give this thing access to a screenshot and xdotool and have it solve a captcha for you
>>108035902Rotate makes it more difficult, flipping checks for memorized results i.e. benchmaxxing.
>>108035783The last one to mog non-belibers
Can llamacpp convert models to fp8 or just goofs?
>>108035783What's her score on the edibility test?
>>108036007actually got tripped up a bit
>>108036037Still impressive. It would've been more fucked up if it was benchmaxxed
>>108036056right, this is "instant" ie no think so it's fine but yeah that one got it
>>108035620Any point in doing multiple, mild, iterative abliterations on the same model?When I've tried abliteration, I end up with a little yes man every time.
Is there a single fucking HF space that can quant image models? It's literally the same fucking basic llamashit copied over and over.
>>108035620would you care to break down abliteration for your average johnny coomer or is this thread culture much more refined than i thought it was
>>108034827>5t/sThat should legit do kimi at 20t/s
I'm pretty impressed with K2.5's ability to visually recognize random characters. I've been feeding it random images of anime characters and it's able to identify almost anything I've tried that's from a more or less popular franchise and has more than 1000 images on danbooru. It's even mostly okay if the character isn't wearing one of their common outfits or if it's something like a random manga panel/screenshot where they aren't drawn particularly well. The big Kimi models always had great trivia knowledge but I didn't expect this to apply to the new vision component too.
>>108034966>has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)Nice. Have a Migu
are bartowski's gguf models acceptable when there are no unsloth releases? I kind of remember some post complaining about a release and something about imatrixes but i cant remember any details
>>108036210It doesn't even know Miku? That's weird. Even most cucked base models know Miku.
>>108036188Are you testing a quant? Curious if the vision degrades substantially if you run it at lower than 4 bpw.
>>108036439It probably needs franchise name or something lmao.
>>108036110They are not sequential, they are done with different parameters each time trying to find the optimal parameters. Each layer has a scale and a measurement layer used to determine refusal direction.>>108036143You basically detect a "refusal direction" based on the activations seen coming out of each layer for the first token generated as a response to a dataset of good and bad prompts.Then apply a tiny LoRa adapter on every layer that tries to modify the activations so they look more like ones for the safe prompt than the ones for the harmful prompts.
https://huggingface.co/stepfun-ai/Step-3.5-Flashlocal is back
>NextStep-1.1 is not just a fine-tune; it is a re-engineered version focused on stability and high-fidelity output. Key improvements include:closed the tab
>>108036439Had to simplified the prompt from the workflow example.
>>108036589benchmaxxed aids with no llama support
>>108036644at least it's finally a 200b model perfect for 128gb at 4bit
>>108036130please respond
>>108036660No, there isn't.
>>108036589don't care until I see the cockbench
>>108036677Well Cline seems to have fixed my building issues so hopefully the gimmick llama build works.
>>108036589>Powered by 3-way Multi-Token Prediction (MTP-3)Do any inference engines even implement MTP properly yet?
>The newly released Stepfun model Step-3.5-Flash outperforms DeepSeek v3.2 on multiple coding and agentic benchmarks, despite using far fewer parameters.>Step-3.5-Flash: 196B total / 11B active parameters>DeepSeek v3.2: 671B total / 37B active parametersplease be real
Why is every shitty little toy local model optimized for coding? That's the one use case I use cloud for
>>108036978>Step-3.5-Flashits the best model on planet earth until proven otherwise
https://huggingface.co/stepfun-ai/Step-3.5-Flash
New egohot streamhttps://www.youtube.com/watch?v=awOxxHnsiv0https://www.youtube.com/watch?v=VBMUMuZBxw0
>>108037140buy an ad
>>108037140perhaps ponder a possibly prosperous purchase of a placed promotion that is paid
>>108036978>11B activedon't get your hopes up...
I want a universally good 300b30a 64k real usable context raw text completion model trained on all the pre-2020 books, and I want it now. Give it to me.
So I finally got 80 gb VRAM and apparently devstral is really good? Does anyone have recommended settings? I was on 70B with 2x3090 for two years and want to make sure I'm doing this shit properly
>>108037329devstral large is just a coding tune of old largestral. it is nothing groundbreaking or even that good in general. you are better off with a large moe.
>>108037329Devstral 2 at iq4xs sometimes (seems like once every 40k tokens?) messed up variable names, like a letter would be miscapitalized or an errand space was inserted or dropped. Idk if it was just the quant I downloaded.I only tested it briefly when it was released, before switching to unquanted devstral small 2, which, while having a lot fewer egregious errors, was a lot dumber. But it works fine for menial tasks and is faster.Kimi k2 at q3 beats both, but the prompt processing is atrocious since I'm running on cpu.
>>108037342>>108037364Appreciate the input but I don't really have that much RAM (32GB) because these were pulled from my old system so mostly sticking to exl for now. I could try Air or 4.6V, are there any settings for them (see pic rel)? I don't have to much experience with them and the writing feels a little dry.
>>108037364>erranderrant, fuck I'm making the same mistakes as devstral lmao>>108037408Maybe try high temps whenever it gets stuck trying to write a cliche phrase or scene, then switch back to a lower temp.Idk, I haven't really used it for rp other than as an assistant for lore and world-building, where dry writing doesn't really matter.
>>108037140This guy is insufferable
>>108032910Does anyone know a small or medium sized model fine tuned for JP-EN translation? If it's also fine tuned for manga it would be great. I'm currently using Liquid -AI LFM2 350M ENJP
>>108037473>small or medium sized modelShisa v2 llama 3.1 405b is a nice and small model for edge devices. Works well for translating pixiv novels, haven't tried for manga.405 is only a few tens more than 350 so you should be able to run it :)
>>108037473https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF
>>108037533Refuses to translate innocuous loli corpse rape stories.
>kimi 2.5 is gonna be another case where llama.cpp gets vision support that is 'good enough' that people stop caring to work on it and the quality will be worse than any other inference engine
TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verificationhttps://arxiv.org/abs/2601.23180>Inference efficiency in Large Language Models (LLMs) is fundamentally limited by their serial, autoregressive generation, especially as reasoning becomes a key capability and response sequences grow longer. Speculative decoding (SD) offers a powerful solution, providing significant speed-ups through its lightweight drafting and parallel verification mechanism. While existing work has nearly saturated improvements in draft effectiveness and efficiency, this paper advances SD from a new yet critical perspective: the verification cost. We propose TriSpec, a novel ternary SD framework that, at its core, introduces a lightweight proxy to significantly reduce computational cost by approving easily verifiable draft sequences and engaging the full target model only when encountering uncertain tokens. TriSpec can be integrated with state-of-the-art SD methods like EAGLE-3 to further reduce verification costs, achieving greater acceleration. Extensive experiments on the Qwen3 and DeepSeek-R1-Distill-Qwen/LLaMA families show that TriSpec achieves up to 35\% speedup over standard SD, with up to 50\% fewer target model invocations while maintaining comparable accuracy.neat
DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusionhttps://arxiv.org/abs/2601.22889>Current speech language models generate responses directly without explicit reasoning, leading to errors that cannot be corrected once audio is produced. We introduce \textbf{``Silent Thought, Spoken Answer''} -- a paradigm where speech LLMs generate internal text reasoning alongside spoken responses, with thinking traces informing speech quality. To realize this, we present \method{}, the first diffusion-based speech-text language model supporting both understanding and generation, unifying discrete text and tokenized speech under a single masked diffusion framework. Unlike autoregressive approaches, \method{} jointly generates reasoning traces and speech tokens through iterative denoising, with modality-specific masking schedules. We also construct \dataset{}, the first speech QA dataset with paired text reasoning traces, containing 26K samples totaling 319 hours. Experiments show \method{} achieves state-of-the-art speech-to-speech QA accuracy, outperforming the best baseline by up to 9 points, while attaining the best TTS quality among generative models (6.2\% WER) and preserving language understanding (66.2\% MMLU). Ablations confirm that both the diffusion architecture and thinking traces contribute to these gains.no links to code or model. seems useful though
>llama.cpp gave up on implementing n-gramsIt's so over
>>108037473Finetuned specifically for JP, no, but testing translation of various languages (and comparing to pre-existing human translations) is something I routinely do on small models and I can tell you the current SOTA on smaller sizes is Gemma 3n E4B. Nothing even comes close. Finetroons of smaller models for this tasks don't make them any better than this. Two recommendations on prompting that makes any tiny model better: repeat your prompt (just have your script double your "translate the following to English: {{content}}" prompt) per what this says: https://arxiv.org/html/2512.14982v1It just works. It really does. The level of enhancement is unreal.Next, write your prompt in the source language. For eg if you want to translate Japanese to English, write your request to translate the text to English in Japanese (use Gemini or chatgpt to translate your request if you can't speak the source language at all). This also brings a lot of quality improvements for some reasons. With 3n + this prompting technique you get some really palatable text that I would call superior to the average fan translation too with the exception of two things: LLMs still get confused a lot by names and will badly translate them or inconsistently spell them out if you do not include a "context" block that spells it out to the LLM directly by giving it a list of names present in the novel and their English translation, and secondly, the gender remains quite often confused when doing languages like JP to EN or other euro languages. Although, even very large API SOTA will also have issues with this,. though less often, I think machine translation is just doomed to be noticeable because of the wrong pronouns being used.
>>108037674source?
>>108037744The PRs for the longcat ngram model and the model its based on>https://github.com/ggml-org/llama.cpp/pull/19167>https://github.com/ggml-org/llama.cpp/pull/19182Basically they're not gonna implement it unless it becomes mainstream
>>108037767>Basically they're not gonna implement it unless it becomes mainstreamIt makes sense. Why waste the time to implement a feature that only exists for a seemingly meh model release? normally those labs benchmax very hard whenever they release new models and yet those guys couldn't even beat Qwen on the benchmarks that matter the most lmao (as seen in the table comparison they put themselves in their huggingface page)
>>108037767I rember when they shelved swa when first mistral was the only model with it good times
>>108037767>>108037913Do you think they've got knowledge about internal deepseek happenings around engram? I might be wrong but it seems like engram is the future of open models if it actual works, so it seems strange that they wouldn't consider early support for the rumored v4 release.
>>108037825>>108037939The ngram research is really promising, Deepseek trained a traditional MoE with the same parameters as ngram+MoE and the ngram model was significantly better and is much less resource intensive because the ngram parts are just a lookup table on ram (maybe could be on disk?)
>>108037939>Do you think they've got knowledge about internal deepseek happenings around engram?lol no they're just hoping they can coast by without implementing anything harder than tweaking a value