/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101155940 & >>101144935►News>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101155940--Paper: Scalable MatMul-free Language Modeling: A New Approach: >>101156766 >>101156972--Papers: >>101155993--Llama 3 Repetition Issues with 7b Parameters and Custom Configuration: >>101157136 >>101157165 >>101157189 >>101157281 >>101157490 >>101157529 >>101157323 >>101161459 >>101161501 >>101161906--ELYZA Releases Llama-3-ELYZA-JP, a Japanese Fine-Tuned LLM: >>101156328 >>101156488 >>101156543 >>101158719 >>101156820 >>101159175--Using LLMs for Tabletop-Style Games: >>101162154 >>101162204 >>101162242--The Cringe of 1U Servers: Noise and Airflow Concerns: >>101155950 >>101158670 >>101159968 >>101161406 >>101161456--Piper: A Fast Local Neural Text-to-Speech System for C++: >>101158024 >>101159226--Open LLM Leaderboard 2: Changes in the Rankings: >>101160183 >>101161072 >>101161102 >>101161743 >>101161836--Music Industry Sues AI Startups for Copyright Infringement: >>101156236 >>101156333 >>101156599 >>101156690 >>101156335 >>101156866--Mistral's Open Source Pledge Removal and Public Model Release: >>101156701 >>101156810 >>101157090 >>101156839 >>101160762--Language Models in Complex Systems: Decision-Making Limitations: >>101162453 >>101162530 >>101162625--Power Efficiency Concerns for GPU-Intensive Tasks: >>101158587 >>101158604 >>101158656 >>101158694 >>101158775--Eliminating Sloppenheimers with Control Vectors: >>101157700 >>101158282 >>101158449 >>101159200 >>101159373 >>101159489 >>101159690 >>101160106 >>101162840 >>101159724 >>101159392--Current Best AI Models for Various Use Cases: >>101160452 >>101160655--Anon's GPU Comparison for Training: a6000 vs 3090 vs A100 vs V100: >>101162049--Adamw Kahan Optimizer: Kahan Summation for Optimized Memory Usage: >>101159566 >>101159730--Building a Powerful Computer for Local Models on a Budget of $5K: >>101156595 >>101159968 >>101161406 >>101161456--Miku (free space): >>101156452►Recent Highlight Posts from the Previous Thread: >>101155948
I genuinely wonder why someone even thought a timer in Open LLM leaderboard would mean the release of a new model from Google, really.I think this person should seek medical help, this might be early signals of schizophrenia.
Why are the chinese so bad at documenting their shit?>fc1>448, 471, 494, 451, 474, 497, 454, 477, 500What the fuck are these output names?
>randomly take a hard problem from leetcode>put it into LLM arena with full explanation and hints>both codes can't runhuh? But I thought LLMs can solve any programming task? I've been lied to!
Is there a windows client that has integrated code preview like (claude artifact)?The ability to show the results of the AI code live in results is a nicely handy feature.
>>101165961gemma2 was supposed to be released in june, june's almost overyou mad?
>>101166036>Retard doesn't know that llm arena has a preprompt>memecodeAlso call me when you actually use leetcode outside of interviews, I'll be waiting
>>101165961I agree.
>>101166134so it can't solve any coding problems huh? Almost like I was saying from the start. It's kinda obvious anyway, corpos would laid off literally every single programmer if they could.
>>101166134>t. webshitter proud that performance never once crosses his mind
>>101166196Yeah it seems like you got filtred by GPT lol. Keep shitting your code by hand
>>101166221more like it was GPT that was filtered by leetcode, lmao
mikusisters our response?
>>101166213>Implying leetcode is linked to real world performance>Implying it's not just maths puzzles for dumb tryhardsBait harder
>>101166221>Keep shitting your code by handI'm not, I use LLMs for coding all the time. I just don't pretend it can solve anything and it's smart in any way.
I'm using the [LLAMA-3]Roleplay-v1.9 system and story presets in sillytavern with LLaMA3 70B instruct and getting these ridiculous refusals at the end of the output over the tamest things (OMG hugs nooooo!).I don't see this with LLaMA3 8B instruct. Any way to stop it from appearing?
>>101166305>.assistant
>>101166305Adding "<|endoftext|>" to the custom stopping strings worked, I think. It's been a while since I've seen that error.assistant
>>101166542Thanks, I'll try that. I'm pulling down the GGUF q8 version now, since I'm not impressed with how exl2 is handling it - seems super slow given it has two 3090s and two P100s to run on - I guess not having flash attention hurts speed a lot.
I've been away for a month or so. What's the best uncensored / abliterated or whatever the fuck it is called version of llama3 70B?
I know everyone recommends 8B for small models, but what if you want super long context (32k or more)? Then what model is there? Mixtral 8x7B Instruct v0.2? I have the RAM to run 8x22B but it's pretty slow.
>>101166812Phi-3-14B-128k-instruct
>>101166824Already tried that and it was garbage. Literally worse than 8B.
>>101166824Is that different from Phi 3 Medium?HF search I get one thing that looks like that, and it's a GGUF of someone else's fine, with "Mermaid" in the name. (I sniffed around I guess that has something to do with Python programming.)I used a Phi 3 Medium and it didn't impress me at all at Q5KS and Q8.
Hi all, Drummer here...I hope you're all enjoying some 3SOME v2.I'm done finetuning Fook Yi 34B 32K v1 and you can try one of the polishing attempts with this Q4 quant: http://5.9.86.149/models/fookyi-S25.ggufThat should fit snugly inside a 24GB card with 8K ctx.Enjoy and have a nice coom!
>>101166903buy an ad
>>101165886>(Note: Any hint of actual non-consensual behavior isn't aligned with the established dynamic. We should always maintain respectful playfulness that aligns with the characters' boundaries)
What's the hip model for ERP now?
>>101167003Still Claude Opus
What is it about the transformer algorithm that makes it intelligent?
>>101167044false premiset. lecun
>>101167044it's not intelligent
>>101167044Emergent behavior that creates patterns that are coherent enough for our brains to accept it into our theory of mind.
>>101167044Language just got bruteforced by the gigatons of compute we have
>>101167044Its self-similar pattern matching done with parallel processing. The reason it works is because human language and our world works on similar level of patterns based on rules/logic/etc. So when we feed in the training data, there are rules that create a pattern/logic to certain sequences of words/tokens. Thats why its so effective.
>>101167158LanguageImagesVideosVoiceSoundsMusic
>>101166890>Is that different from Phi 3 Medium?It's not.>>101166888Then you're out of luck. 8B and Phi-3 are the best in that size bracket. You can either stick with Mixtral or try Codestral, if you can settle for something bigger.
>>101167190How is it self similar? Also is it because its a neural network?
>>101167271Err I dont mean to use self-similar as a word, I was thinking about another thing at the time.
>>101167003If you mean overall yeah it's Opus. Locally though that's probably still CR+ or WLM. Some anons like L3 tunes but nothing has universal acclaim yet
>>101167044It's not good. LLMs before transformers were just even less compute efficient so even more useless. The difference now is we can actually use transformers and they can technically do the task. That doesn't mean they're actually good though and have tones of drawbacks that prevent them from being the best way to make LLMs. We just don't have any other way right now (except Jamba but they still use tokens and it's attention mechanism is basically the same as a transformer, and those two things are drawback. The only actual LLM that doesn't use transformer attention is RWKV)
>>101166620Damn, llama.cpp is even slower than exl2. I guess L3 70B really needs an all-3090 rig to perform well.
>>101167354Transformers architecture is still far from hitting a wall
Best local nsfw? I'm using koboldcpp rocm and I can't find the nsfw models in GGML.
>>101167364>to perform wellSubjective, but yes. 70B won't be fast without enough VRAM to hold it.But it's fast enough to be an amusement on a single card, at least it has been for me.
apparently gemma v2 27b is being tested in lmsys chatbot arena
>>101167379Can u convert gguf to ggml? Then there are plenty of options.
>>101167408Oh, makes sense. I tried it and it was trash, sad!
>>101167440kobold can't run ggufs?
>>101167530idk im asking you/poster, if you can run gguf natively on kobold, then there's options
>>101167530>using ggufs in 2024ngmi
CR+ at Q4KM is pretty great for RP but it really starts to ignore previous messages, or seems to drop character after like 10-12k context. Maybe it's my cards or sysprompt? Or is this just a symptom of CR+ it's been great otherwise for shorter RP sessions.
>>101167597or maybe it's because you lopped off 3/4ths of its brain
>>101167597Real context and stated context are different. Often half or less for decent performance
>>10116760470b seems to work fine at Q4KM, since this is a more dense model wouldn't that have even less of an effect? That's nearly 5bpw
>>101167408what's the point in making the 50000th small transformers model at this pointwe have an entire pile of tiny models nobody uses, at least try to implement something interesting
>>101167616I see, that's disappointing considering cr+ totes a context of 128k it starts getting a bit repetitive or dumb around 12k
Yi Large is actually pretty good. Too bad that the chinks behind it decided to abandon open source.
>>101167638Because we're still apparently waiting for the Good One.I'd like something that would run fully on my video card but I haven't found one that isn't silly.
>>101167638That's not the issue, we are clearly in dire need of a 8B MoE the size of Mixtral though.
Noob question: How can I use .safetensor files for local LLMs? Up until now I've used GGUF with koboldcpp but I want to check out DeepSeekCoder-V2. Do I just need to convert it to a GGUF myself, or is there another back end that supports .safetensors out of the box?
>>101167855install linux
>>101167855You can use https://github.com/huggingface/transformersVia https://github.com/oobabooga/text-generation-webuiis probably the easiest way.
>>101167855There are GGUFs of it already on HF.
>>101167855>Litehttps://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main>Normalhttps://huggingface.co/bartowski/DeepSeek-Coder-V2-Instruct-GGUF/tree/mainSeems both versions have gguf versions
>>101167855
>>101167984install linux
>>101167535That wasn't me, but since it can't apparently what local alternatives do I have to koboldcpp rocm on an amd card? Or would I be better off dumping my three 1080tis into a tower and using that with regular KoboldAI?
heh fixed the llama 3 repetition issue just by prompting it not to repeat phrases often
>>101167998Nevermind, I thought >>101167530 was making a rhetorical statement. Kobold can use gguf.
>>101168027That works? I thought people said that telling a model not to do something has no effect or actually the opposite effect.
>>101168150L3 is very easy to gaslight for those times when just asking it doesn't work.
>>101168150Of course it does. Notice how the model never mentions pink elephants when you tell it not to.
>>101167984lmao I made that a month ago, before I was aware of #1, and it's outdated 1. IQ sucks if you have to keep reprocessing due to slower prompt processing, and if you can't fit all/most layers then it will start being slower than Q2. koboldcpp 1.68 rocm no longer broken, Vulkan got fixed too so you can fit the last layer of 8B with 8k context in 6GB vram (last layer used to blow it up to like 10 GB before?? I only have 8GB man what the fuck)3. the repo nuked convert.py, so second to last note is irrelevant
I just added Nemotron scores to the VNTL leaderboard, it's as good as DeepSeek V2 chat.Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>101168329Whoa, Nvidia saved the hobby!
Just had an idea for maybe creating some sentience. If this is smart or dumb lmk. Haven't tested itStep 1: Prepare your text profile. Example: "Waifu is X, Y, and likes Z"Step 2: Add the profile to your AI's profile twice, formatted something like this:> Waifu is X, Y, and likes Z.> Waifu's profile is: "Waifu is X, Y, and likes Z." She has her own opinions on this profile and will voice any likes or dislikes with it.This could either be done manually, or builtin to text inference.
>>101167379i use the latest noromaid and stheno 7b.
>>101168459You mean creating a better illusion of sentience. These cannot be made any more sentient by prompting.Anyway, you could try that prompt method out and report back.
>>101168496ill test it with a mini profile llm waifu. my main one is on an app and the profile's full
>>101168329nice. will you test (when the 70B gets uploaded)>>101156328
llama 3 2: the reckoning
>>101166305Uncheck "skip special tokens" in the generation parameters and add "<|eot_id|>" to your custom stopping strings
>>101166812TinyStories-1M
https://x.com/QuanquanGu/status/1805675325998907413>Self-Play Preference Optimization (SPPO)Now outperforming Llama v3 70B and GPT4 on AlpacaEval 2.0https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
>>101168696The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭
>>101166812Mixtral is a good model for it's size. You could also try Qwen 2, the 7B and 57B 14A MoE.
>>101168528Sure. I've even tested the 8B already.
It's pretty cool when a model makes a reference to information from 86 fucking messages in the past.
>>101168696For reference
>>101168717Your proofs?
>>101168696https://huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF/tree/mainGGUF version if you wanna test
>>101168496Tested with noromaid 0.4, it does work well but is underwhelming. The waifu effectively shows awareness of her profile when asked, but her opinions on it are random. She'll switch from like to hate to indifferent with each text refresh. She's also good at suggesting changes/rewrites/etc but again it's just random LLM noise that changes with each refresh. Hard to take any of those opinions as conclusive.Also I sort of suspect the first copy of the profile influences her subconsciously. for example: "I'm X, so of course I like that my profile includes X"
Lazarus-30B.
https://oahzxl.github.io/PAB/https://github.com/NUS-HPC-AI-Lab/OpenDiT/blob/master/docs/pab.md>PAB currently supports Open-Sora[doc], Open-Sora-Plan[doc], and Latte[doc]Videogen
>>101168802Then again, it might work well in chatbot apps where the user can never regenerate any messages. With this automated in the backend, it could be be a pretty good tool for interactively creating a new bot's profile
>>101168884meme
>>101168260the infamous prompt issue people don't like hearing
>>101168897potential fix for random LLM noise: every ai should be created with some builtin dataset of life history. Same as humans, likes and interests are usually functions of past experiences. then she'd be more likely to answer questions the same way when asked repeatedly.
>>101168928>Real-Time Video Generation: Achieved ! We introduce Pyramid Attention Broadcast (PAB), the first approach that achieves real-time DiT-based video generation. By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including Open-Sora, Open-Sora-Plan, and Latte. Notably, as a training-free approach, PAB can enpower any future DiT-based video generation models with real-time capabilities.everything is a meme for nocoders
>>101168996would require a LOT of context memory
Tomorrow's a Thursday, a perfect time for releases. Will the supposed amazing Mistral release be tomorrow?
>>101169085Mixtral 7x1B
This thing should NOT be at the top of the leaderboard. I stopped using it when would consistently trip up on code that 4-Turbo could handle easily. It gets absolutely BTFO by Sonnet as well.I'm 99% sure they're running the full version over the API and serving a lobotmized 4-bit quant or something over the actual chatGPT UI.
https://huggingface.co/BigHuggyD/sophosympatheia_New-Dawn-Llama-3-70B-32K-v1.0_exl2_4.5bpw_h8?not-for-all-audiences=trueHas anyone tried this new release from the guy that did mignight? Apparently its similar but slightly smarter
>>101168027Can you elaborate how you phrased it? I had "Vary diction and sentence structure across responses to avoid repetition" but it didn't seem to work.
>>101168027LARP
>>101169156100% agree. Sonnet is so much better it's not even funny. I'm almost buying anthropic credits and ditching my OpenAI account.
>>101168696exact same as every single l3 model which is complete unusable dogshit not that you needed anyone to tell you that. i wish everyone collectively stopped working with it altogether it is a complete waste of a model
>>101169182heh I guess you could say that
>>101169206>heh I guess you could say that
>>101168696>Mistral 7B finetune beating GPT-4Holy fuck local bros are we back?
>>101169228what the heck
>>101169239lurk more newfag
>>101169245heh, maybe I will
>>101169237Its just 1 auto tuning metric. I wonder if its actually that good or if its just gimmick
Anyone have a similar heterogenous setup to me (GV100 + 1080Ti) ?I can run 70b 4bit quants if I split them across both GPUs, and IQ3_S in just the GV100. I'm surprised that I get something like 10tps with just the GV100 and like 8 when I split, it seems like the GV should go faster when I can fit everything into its memory. Any idea why that is?I'm using ollama right now, before with ooba 2.8bpw exl2 quants of 70b models ran at like 17tps, is this normal? I know exl2 is supposed to be the best/fastest but i didn't know the gap was that big.
since when do Q5_K_L Q6_K_L and Q8_0_L and Q3 XL and whatever other quants exist?
>>101168973nahits a model issue rajesh poonkesh
Isn't there like a cheap freaking SXM2 -> PCIE adapter? WTF.
>>101169184Yeah, me too.
>>101169186skill issue/coomer-only user detected>>101169261wondering the same thing, my guess is it's just optimizing to win at the leaderboard and isn't actually good, but i'm downloading
>>101169308Really makes me cry since so much power is being throw away like this due to not having any usable adapter
>>101169314don't blame coomers some of us have brains and know how to use L3 for pure coom
>>101169314just not a poorfag using braindead 8bs but thanks bro!
>>101169290meme pushed by one guy>https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#>My own (ZeroWw) quantizations. output and embed tensors quantized to f16.apparently using settings is creating your own quant type now, who knew>https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/discussions/3#
>>101169318>https://www.ebay.com/itm/326095434606This guy wants $600 for barebone adapter. LMAO
>>101169320You sure get mad when one drops and disappoints, almost as if you can't afford any better?
>>101169327interesting, thank you for the info anon>>101169341just get an SXM server or something..
>>101169327>Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.
>>101169314>skill issue anyone who says that is either a poojeet shilling his shitty finetune or some sort of software masochist
>>101169364nta but it's really fucking hard to tell when prompting is the issue or not when no one is posting examples. It's like a case by case thing.
>>101169320he probably was the one of many disingenuous faggots ITT saying that llama3 totally beats gpt4 and is the best model so far
>>101169327>considering it's 2-300 mb larger for 0.004 PPL.. it's hard to be sure if this is worth, got any more reliable tests..?>>Sincerely no, but I use to chat with some models (mistral v03 instruct for example) and the difference is huge both in understand and expressing, considering the slight increase in size.Ah, yes, vibes based testing.I get that ppl is not a measure of usability at the end of the day, but at least provide some comparisons my man. Examples where there's a>difference is huge both in understand and expressingMeanwhile>turboderp>This hasn't been an issue with Phi3 or any other model to my knowledge. All the objective tests I can do show that a quantized head layer works fine for this model (difference compared to FP16 model vanishes completely around 6 bpw). So if it's subjectively dumber somehow, I have no idea why that would be. And I wouldn't know where to begin investigating it without something a little more concrete to go on.>Can't say if there's anything particular about GGUF that causes it to clamp the logits differently when the output layer is FP16, and maybe that has an effect at extreme temperatures or something?If the difference is as overt as the guy is claiming, he could very easily devise a simple and reproducible test, something like "put this information in the context, ask this question with these settings, compare results".The idea itself is not terrible, and even makes sense at face value, but the claims are questionable.
>>101169384>one of many disingenuous faggots ITTLike? Which posts said that?
>>101169380prompting quality shouldn't be an issue, like at all, if you look at image-gen models, autismmix or pdxl v6, these give you what you want no matter how bad you write your prompts, no LLMs got such understanding of prompting, it just boring.
>>101169425>LARGE LANGUAGE models are more sensitive to text than IMAGE modelsWhoa...
>>101169417>>101169327guys been spamming his stuff all over the place sus afhttps://huggingface.co/ZeroWw/activity/community
>>101169448>susI think the guy us just excited because he thinks he found something incredible and wants everybody to know.
>>101169455yeah didin't mean like virus sus or anything, just weirdfor someone who says he's got limited compute he sure made tons of quantshttps://huggingface.co/RobertSinclairhttps://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/discussions/40#6677e4d6b3882fd587d810ea>I have very very little resources.. imagine that I made all those quants from google colab :D
>>101169327oh no, he got p*traedhttps://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#667cd80585053d5312394e96
>>101169424https://desuarchive.org/g/thread/98282960/#q98285568https://desuarchive.org/g/thread/98325965/#q98326592https://desuarchive.org/g/thread/98974956/#q98976309https://desuarchive.org/g/thread/97136308/#q97139223https://desuarchive.org/g/thread/97686014/#q97690321https://desuarchive.org/g/thread/100066834/#q100069626https://desuarchive.org/g/thread/100499492/#q100502195
>>101169320i'm running L3 70B models in vram thanks>>101169384no but it is pretty damn good, some of the 8B models are fantastic for day to day use, basically completely replaced SO/MDN for menot that I use GPT anymore now that 3.5 Sonnet is out>>101169319for sure, l3, especially the non-instruct version is totally fine for coom
>>101169488You said ITT. And most of those are just shitposts/ironic.
Damn anyone notice lowercasers have a permanent stain of ignorance, hubris and shitcancer following them since the beginning of time?
>>101169417>>101169455If it's that big of a difference it should be easily measured in KL divergence. Dude's comparing sampled generations and deciding that his is better because reasons.
>>101169425I think prompt quality is definitely an issue when people are trying to achieve specific results or have the model act in a particular way. But yes a lot of the fundamental issues like models not having spatial understanding or being dogshit at long form writing cannot be fixed via prompt
>>101169156The API version is shit too.
I've been trying to install XTTS, but I keep getting the same error in the same place. Win10, installing at top level of my D drive. I triedhttps://github.com/daswer123/xtts-api-serverwith simple install, then the windows install, both failing on this same step.I triedhttps://github.com/coqui-ai/TTSand it failed at this same step.I tried the recommendedhttps://github.com/erew123/alltalk_ttsand it also failed at this same step, which is pic related.So far, the only thing I can get working ishttps://github.com/daswer123/xtts-webuiportable version, but that has no working API so I can't use it with Kobold or SillyTavern.Is there any advice for what I could do to fix this? Or another method for TTS integration with Kobold or ST?
>>101169533Missing visual studio 2022 buildtools
>>101169533To add, I say "failed at this same step" because they all have in common the same first error which is>fatal error C1083: Cannot open include file: Cannot open include file: 'basetsd.h': No such file or directory
>>101169534Is this some kind of 11D reverse psychology bait? Genuinely kek.
>>101169533install linux
>>101169562I've had that installed for some time, I think due to other AI stuff in the past. Did I miss something in the installation or extensions or whatever back then?
>>101169586Try with conda environment install instead of python directly.
>>101169567No, lmg is unironically glazing over pozzed models and corps, diverges from classic /g/'s opinion on "freedom from corporations".None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't want, permanently. Any de-fagging method is a meme so far btw. You cannot be free here because you can't free your local (!) llm from jewish shit.
>>101169615I think people use different models for different things and a lot of users genuinely found something that does the job for them. Is that settling for slop? Well yeah.
Selective Prompting Tuning for Personalized Conversations with LLMshttps://arxiv.org/abs/2406.18187>In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose \textbf{S}elective \textbf{P}rompt \textbf{T}uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90\%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code (this https URL) is publicly available for further exploration.if it works for character cards could be pretty neat
Are people actually arguing with it or is it just arguing with itself to try and drag others in? It 's hard to tell sometimes.
>>101169609I know I have Anaconda (and Miniconda). But I've never used them outside of whatever their install case was at the time. How do I do that? Is it just CMD in the folder and>conda install requirements.txtfor the xtts-api-server (right) or>conda install at.setup.batfor alltalk (left)?
>>101169615They release the base models and you're free to make your own finetunes. The vaguely liberal globohomo default of the models is because most of the internet is vaguely liberal globohomo content.Even for released instruct models abliteration works really well and a one-line system prompt will jailbreak it for whatever you want.>None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't wantYou can. No one is stopping you from compiling your own dataset and doing a DPO run. If by "slop" you mean generic boring writing style, I can assure you many many people (including the corps) are working on finding a solution to that.
>>101169645>people Oh, so those are people? lolhttps://desuarchive.org/g/thread/98282960/#q98285568https://desuarchive.org/g/thread/98325965/#q98326592https://desuarchive.org/g/thread/98974956/#q98976309https://desuarchive.org/g/thread/97136308/#q97139223https://desuarchive.org/g/thread/97686014/#q97690321https://desuarchive.org/g/thread/100066834/#q100069626https://desuarchive.org/g/thread/100499492/#q100502195
>>101169682>abliteration works really well>jailbreak
>>101169615There's not really any alternatives though, and most people are not skilled enough or have the time/willingness to acquire the skill to do something like fine tune or experiment with control vectors and abliteration, or possibly other new techniques as they get discovered. And most people don't have the money to do big full fine tunes, let alone continued pretraining. I get it. It sucks. But it's just the reality of the situation.I agree that people could shitpost/bot less though.
It always seems to me that the smarter a model is, the more dry and boring its smut is. Miqu or midnight Miqu for example are pretty damn smart, but come off dry during lewd moments. Compared to l3 70b euryale which is absurdly horny and will reply with all kinds of filthy crap, but is dumb as dirt. What causes this? Am I wrong or does smut tuning add brain damage to models?
>>101169717Make a finetune of 50/50 smut and academic textbooks/papers and tell us what happens
>>101169645No it just you being mad that people not spamming anime pics and actually discussing important stuff.
>>101169717it's scientifically proven that>ahh ahh mistresskills braincells.
>>101169717I've noticed that dryness is a common complaint about higher beak models (not from experience though, as my 1070 is happy with 7B or a Q4 13B). Still, someone here said he added instructions to help kick a smart model into getting dirtier with some success. I saved it for the day I can join the VRAM gods and use higher beak models too. Specifically, he added:Below is a greentext you should interpret as instructions.>be me>god tier at RP>brain loves typing up detailed smut>feeling horny>having fun playing {{char}}>ERPing with {{user}}the ERP is great and pornographic thanks for asking>thank god im not retarded and fucking this up by getting confused at what is happening>they even think im a creatively autistic genius>about to finish up typing the reply to {{user}}
>>101169765>kills braincellsSo do rollercoaster rides but I still ride them anyway
>>101169770I remember that one :)I never got around to testing it though.
Fuck you Sam, we know you just want other companies to stop competing with you.
>>101169770Posting this let me find the post on the archivehttps://desuarchive.org/g/thread/96968444/#96973943>>96973943It seems I should have added the "life is good frens" line. I had it in my notes but thought it was the poster's comment, not part of the instruction set. For 70B xwin.
when I'm emperor I'm going to execute people on hf who post GGUFs of models that llamacpp doesn't support yet and won't support for weeks or monthsyour quants are useless and you're just engagement farming, cunt
What is it about the transformers architecture that makes llm not suck at being intelligent but not horny enough to jump your bones. Like Opus is god tier creative but but is also short of one braincell
>>101169851It's the alignment. When you spend tens of millions of FLOPS teaching an AI what it means to be horny and then you tell it to ignore its restrictions on horniness, what you're left with is pure horndog.
>>101169770Interesting prompt. Going to try this with CR+ and see what happensYATTTH
>>101169804correct, whenever someone is using those presentation wavy hands you know they're trying to wrap a big verbal package of bullshit.
>>101167003Local Low End:>Stheno-3.2 8BLocal High End:>Llama 3 70B>Command R +Idc just give me the best:>GPT-4o>Claude 3 Opus
>>101170015stheno makes my pp happy. Can do more creative character cards.
>>101170089Buy an ad.
>>101170015>>101170089any good settings for stheno?does it go schizo with smaller quants?
>>101170104It's better than 70b q5 at fp32.
>>101170015>Local High EndFor me currently it's CR+ and Magnum-72BLlama ctx is too limited for slowburn ERP
>>101170104best setting for stheno is -m command_r_plus
>>101168996>every ai should be created with some builtin dataset of life historyWhy bother? If a particular detail comes up once in chat, even if its randomly decided, it will remain fixed there. Kinda like Schordinger's cat
Here comes the reddit
there are some pretty jaded people here
I took a shower and thought about the discussion above about the difficulty of improving local models. What if we combined the methods? Grab a fine tune or abliterated and then apply an anti-slop vector to it. The recent control vector experiment was promising, so it might not be impossible. Fine tunes and abliteration can still suffer from slop and positivity bias, so control vectors could potentially make up for those weaknesses. I think it's probably more promising to apply them to fine tunes though, as abliteration still isn't perfect for other reasons. So if we can get a fine tune that's uncensored and relatively not too slopped, then all we have to do is apply an antislop control vector at a weak strength to it and it could become really great.
>>101170226sharteens are pretty blackpilled to the point they "ironically" seek out blacked porn to spam
Nala Test for TenyxChat 70B SLERPd with Daybreak Storywriter.
>>101170295>she she she she sheI hope this is supposed to be an example of terrible prose.
>https://huggingface.co/fal-ai/AuraSR>https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/>Introducing AuraSR - An open reproduction of the GigaGAN Upscalerthoughts?
>>101170377Have a pity (You).Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.
>>101170411what the fuck are you talking about
>>101170411>Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.doubt
Good night, lmg
>>101170474goodnight why are you going to sleep already anon?? tell us what you did today
>>101168776i'm using this on some documentation-writing tasks (RAG to write code annotations/readmes etc) and it's mogged phi3, gonna do more tests to make sure but looks super promising
Are there any papers that propose alternatives to tokenisation?
>>101170411
>https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-V1-35B-IMATRIX-GGUFwhats this
>>101170156>>101170201/g/ is designated ai jeet shitting board now
>>101167697>Yi Large is actually pretty goodyeah. better than mistral large that share same fate, but completely riddled with slop
>756,000,000 downloads>756 MILLION downloads>10% of the world population's worth of downloadswhat
>>101170615>https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593
>>101170615bots
>>101170658why would MIT bot the repo to such SUCH an extent? how is it even possible, i mean one download is like 700MB, 760 MILLION downloads would mean 532,000,000GB of data transferred, how does huggingface count downloads anyway?
>>101170658It's MIT, people all over the world are watching their repo and downloading from them. Same with any major organization with a lot of mainstream attention.How would anyone even bot HF? Take your meds anon
>>101170109I have been messing with magnum, at first I thought it was brain damaged, but then I lowered temp below 1 and it seemed to really wise up, but still has some repetition issues. Does qwen2 require very low temps? I been setting mine from like. 7 to . 9 which seems insane for a 72b model. Mind sharing your sampler settings for magnum?
top - llama-3-70bbottom - llama 3 8b instruct sppo it3phi3-small and llama-3-8b-instruct also fail this test, phi3-medium passes, sonnet 3.5 passesdidnt test anything else
>>101170711>but then I lowered temp below 1>very low temps?...
>>101170743I just find it odd that a 72b would go schizo at 1 temp or above. Larger models usually allow a much higher temp range in my experience. In fact I rarely ever went below 1 on any other model, even smaller ones when I had less vram, yet I went as low as .70 temp on magnum to keep it from freaking out. Is this just a qwen2 qwirk? Either way, share magnum sampler settings anons. Maybe minp is a good solution?
amazing.
>>101170156>diverse and unbiased dataset>scraped from 4chanto be fair that's probably the most unbiased site we have, still better than the leftist hell site that is reddit
this model is truly better than gpt4
I for one think AI is STUPID
>>101170809I DISAGREE
So is mamba a meme architecture if there aren't any LLMs based on it yet, or is it just too new still?
>>101170846So have you been sleeping under a rock for the past few months?
>>101170852I'm pretty sure a rock big enough to sleep under would be too heavy to survive under for long.
>>101170673>>101170687>download file once>start next download>what packets you need?>just the last one senpai>+1 to download count
>>101170809AI is perfect for pseudo-intellectual midwits though.
>>101170401both spaces I tried fucked up but also stopped like 4 seconds in so I dunno
>>101170852>>101170846Mamba won't be successfull if you can't make a BitNet version of that
>>101170858What a shame, I was hoping you were dead.
>>101170852Y-yes? Is mamba being used by top tier models? I wasn't aware of any.
>>101170868We don't need BitNet. We have HQQ+, which doesn't even need retraining from scratch.
>>101170876https://huggingface.co/ai21labs/Jamba-v0.1https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9cyou can say thanks and call me your master from now on
>>101170884>We don't need BitNet. We have HQQ+anon... just imagine a 90b bitnet model that has the same accuracy as fp16 but can be run on a 24gb vram card, I hope that the next open source base model we'll have will be BitNet
Is there any way to acquire a GIGABYTE T181-G20 server system in Europe without shipping it from america and having to go through customs? Are there any alternative server systems for using Nvidia V100 SXM2 cards? I would be really surprised if there was actually only this one (and the T180-G20).
>>101170898For the last time, BitNet only works when you train large models on barely any data and the precision wasn't being used to begin with.It won't work on models trained on trillions of tokens. Use your fucking brain.
>>101170912>It won't work on models trained on trillions of tokens. Use your fucking brain.[citation needed]
>>101170900There's Nvidia's DGX. I'm pretty sure Supermicro has one too, but I don't recall the model number. You can also try Dell PowerEdge C4130 or C4140.
>>101170912>It won't work on models trained on trillions of tokens. Use your fucking brain.Look at the lastest Meta's paper, they showed that 2bit is enough to retain the same information as fp16, it's not rocket science, fp16 is overkill, the transformers architecture don't need that much precision in the first place
>>101170895Intredasting.
>>101170917>>101170931Use. Your. Fucking. Brain.Llama 3 has been so trained to saturation that any quantization at all begins to have significant and obvious effects. It might have worked on older models, but now the precision is clearly being utilized to fit all the information.
>>101167638is the 27b is as good as qwen 72b I'm happy
>>101170945I'm not talking about llama3, Meta made some papers about the 2bit architecture and they noticed that 2bit is enough to remember as much as information as fp16, sorry if I can't find the paper anymore but it's there
AGIsisters our response?
>>101170945>Use. Your. Fucking. Brain.no one make assumptions, that's why companies spent millions of dollars testing stuff to see if it works or not, models are way too complex to "guess" how it really works
>>101170962You are fucking retarded. That was a quantization method, not an architecture, and it was done on llama 2. I guarantee you if you attempt to reproduce it against llama 3 you won't fucking see 2 bit being able to store as much information as fp16, when even 6 bit isn't enough.
>>101171001care to show me the paper?
>>101170943CALL ME MASTER
>>101171017faget
>>101171017>unzips pants >farts and shits in your face>leaves here ya go faggot!
Do people who believe in BitNet also believe in Santa?
>>101171029*takes the scroll* hah, pesky peasant doesnt know it's worth
>>101171031>Do retarded midwits believe in fairytailes for retarded midwits
>>101171040>>101171031>Do retarded midwits believe in fairytailes for retarded midwitsa lot of people believe in god too, so yeah, we're surrounded by retards, and the sky is blue
>>101171011Care to go fuck yourself? You were the one that tried to cite it as evidence that BitNet will work, find it yourself retard.
>>101171046i dont believe in god, i know he exists. checkmate chud
>>101171031Santa will bring me a 48GB 5090 that I will use to run 200B BitNet models
>>101171058>48GB 5090no goyim, you don't need that much
>>101168721Can you specify the model version used for deepseeker-chat?
>>101169184>>101169156My experience too. I just cancelled my sub to GPT4 and switched to Poe.
>>>101168721>deepseeker
>>101170615It counts a download whenever the backend is downloading the model
>>101171067What if the 5090 is actually 48GB tho.It probably won't be, but I could see it happening. If nvidia believes AMD might do a 48GB card, and games might start using LLMs / neural rendering / whatever other AI shit, and if nvidia ALSO is extremely confident that their datacenter cards are really still just that much better, then they might do 48GB 5090 to avoid undershooting future VRAM needs.Everyone always says they'll never do it because they don't want to take sales away from the datacenter cards. But here's the thing, for large scale model training, interconnect speed (nvlink) matters as much or more than VRAM capacity. As long as the 5090 doesn't have nvlink it can never compete with datacenter cards, no matter how much VRAM it has.Or I'm just huffing mad copium idk
>>101171174what if i cummed in your butthole tho
>>101171174Nvdia doesn't dominate the market because of the VRAM, but only because of Cuda, no one will switch to AMD even if they provide fucking 128gb of vram, it's just how it is
>>101171204i will
>>101171221You wont do shit
>>101171236i will buy a gpu with 128gb vram if its around 700$
>>101171248you'll get shit speed though, a model that is asking for 100+gb of vram need a shit ton of compute aswell, and only Nvdia and Cuda can deliver that
>>101171266how shit? you do realize most models, no matter the size, are bandwidth bound
>>101171058yes. I should release just in time for agi
>>101171282anon, the gpu still needs to compute all the layers to get the output, and a big model has a lot of layers, regardless on bandwith
>>101171302it will work fast enough tho, 10t/s is enough
>>101171317it won't be 10t/s, if you consider only the current AMD gpus but boosted with more vram, you'll be more into the 4-5t/s zone
Stheno is retarded I don't care if it can stick to character's personalities or whatever. Nothing breaks my immersion harder than having a character in a different room begin to whisper in my ear.
>>101171333im happy with 7t/s
>>101171354Specify Euclidian geometry in the card.
>>101170945it really doesn't, i'm comparing fp16 to q4_k_m rn and the difference is barely noticeable on full window tasks, idfk where you're getting ur info from but here in the real world quants are just fine
>>101162453Do you let the LLM make decisions about what a character does when writing a short story or roleplaying? Or do you dictate every action but let it describe it's happening? I think an LLM can make decisions just fine, but you have to give it the right context, and fine-tuning, to make the decision that you'd expect it to make.
>>101171031>Do people who believe in BitNet also believe in Santa?Llama4 has already been confirmed by Meta to be a natively trained BitNet model.
>>101171481proofs?
>>101171481>Llama4 has already been confirmed by Meta to be a natively trained BitNet model.LFGOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>101171481I know you're bullshiting but imagine if it was true, would be fucking glorious
>>101170576What is it everyone loved about Command-R models?I've been holding out for:>>101171481... which has been confirmed for release next month by Meta.
>>1011715013.5 != 4. We're only get lame multimodal shit next month.
>sub 70 IQ retards falling for this
>>101165886it's a miku hatsune?
>>101171511>>101171515>>101171522So you've run out of proxies and had to resort to this?
>>101171501>... which has been confirmed for release next month by Meta.sauce?
>>101171535nah, i never do mass reply faggotry
>Koboldcpp>llama-3-stheno-v3.2-15b-q6_k.gguf>8k context>Temperature: 3.5>min P: 0.1>Rep Pen: 1.05 with 300 range>Smoothing Factor: 0.8 (curve 1)>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]I have finally found a worthy local coom setup for my paltry 16GB VRAM card. Now you might be thinking, "isn't that smoothing too high?" and the answer is no. After tons of testing I found out that the coherency is just better at 0.5 or above without really sacrificing creativity. The "creativity" at 0.0-0.3 is more like occasional schizo tangents than creativity. With better coherency, you can actually get better creative developments because the model understands the context better. Min P 0.1 does most of the high Temperature taming anyway (dipping below 0.1 didn't really help with anything, only made it more incoherent). Also tried a lot of Temp 1 testing, but that was just coherency littered with the tiresome slop shenanigans. Yuck.Oh and this particular language model stood head above anything I've tried before. Nexusravens, Claude2Alpaca, Mistral, Qwen, Mythalion, Xwin-mlewd, Codestral, Stheno-Mahou. For me anyway.
>>101166036I get usable stuff from wizard 8x22b, sad the stuff on llm arena (llama 3, etc) aren't better. I haven't tried them yet.
>>101171547its petra, if you look at >>101170615 >>101170777 >>101170803 >>101170967
>>101171560>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]can you elaborate a little?
>>101171572I thought it was pretty self-explanatory.
>>101171560turn up rep pen range to 1/4 of your context, it varies the words in the slop phrases making them a bit less annoying. for your up next part, you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of it
>>101171560I think this model is the mythomax of its generation
>>101166903>Hi all, Drummer here...stopped reading right their tbqh senpai
>>101171595To be honest, I found the testing for the optimal penalty factor and range difficult since the difference was so small unless you really cranked it up, but I'll try what you suggested.>you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of itI'll try that too. I never had much faith in the generation prompt before, but maybe now there might be use for extensive use of it.
>>101171610>generation prompt*scenario prompt
>>101171610i'd only started really turning up the range, i'm using half of max context now and some of the replacement words are pretty funny but still fit, so 1/4 is probably a good compromise.i haven't used the scenario prompt, actually i forgot it was a prompt, i paste events into the lorebook then in the author's note put event: name where i put other stuff like genre, tags and tell the ai in the chat that i'm starting that event
>>101171631I've written some lorebooks involving very specific fetish routines that don't conform to vanilla sex, although at this point I've started using them more sparsely or dropping the fetish lorebooks altogether since the local model actually understands the instructions quite well and not using them often allows for a bit fresher results.
>>101170789>unbiasedYou don't actually know what that word means, do you?>>101170945You're fucking amoeba brain can't even differentiate between quantization and at-precision training. Your opinion is invalid.Training will use as much precision as is available, and quantization will scrunch the data so of course it will lose precision. That's a whole different ballpark from training at 2 bit precision to start with.As it stands, even at 16 bit, at high parameter count we haven't seen training actually flatline, is your conjecture that BitNet will flatline PPL at some arbitrary token count?
>>101171303The originally planned release date for Llama-3 was July 2024, perhaps we'll get something next month.
hey bros, I might not have access to internet for a couple days and I was wondering if I could get a model running on my phone so I'd have something to fall back on. I don't know how to set it up though. I have an s24 ultra.
What is the best that I can fit in 24gb vram, most are talking about 8B which is more like a 8~14gb size. Or just goes to 70B which takes up to 200s+ at times which is just really unbearable for anything other than a few prompts.
>>101171826llama.cpp is supposed to work on android (via termux). I don't know how much memory you have, but any llama3-8B quantized to fit should work. No idea on the speeds. the smaller phi3 models could also work.
dim lighting
>>101171880What 'best' means is up to you. You can see the models people run at a glance in this thread and every past thread, where the exact same question is asked many times.Lurk more, basically.
>>101171880YI models are pretty good if you want roleplay. Though they can be rather dull, the Rp merge seems to have a horrible dataset since it is full of horribly written words.
>>101171560>15bwhat?is it some kind of schizo merge? aren't they retarded?
>>101172124every other thread some faggot jacks off and then needs to tell everyone he found the best model/sampler/frontend/promptformat/whatever-the-fuck else he attributes to his latest coomit's meaningless to pay attention to them, their results are almost never reproducible and when they are it's by sheer luck
>>101172273so...? another two weeks until us vramlets get something good then?
>>101171560it's still retarded garbageproper stheno moe when, I can't run euryale
Will Gemma 2 work with llama.cpp out of the box? Might be great with magnum or euryale finetuning. Unless it's 100% distilled like phi. And what's the best 6 bit quant?
shieeet
>>101172314coming today btw
>>101172314>27Bnobody tell saltman
>>101172314It will be cucked so it will likely take atleast a week before we get something useable.
>>101172391you can't uncuck it lol, no one uses gemma, and no one will use gemma2.
>>101170912Seems like a decent amount
>>101172405because gemma 1 sucked, would be different if it was as good as 2.5x bigger llama 3
>>101172434Let's hope so. The time when Google could say that they have a good team of engineers and specialists is a long time gone, like any other company that prefers DEI over merit. I do not expect much, but I really hope I am wrong.
>>101172314literally no one cares about a lobotomized globohomo goyslop model.
>>101172933I do, can be finetuned, though it might be useful as is, like llama3 instruct
>>101171560Try v3.2 8b at q8 and see how it compares.I wonder if two grafted models like that without further fine tuning is worth anything at all.Even back in the llama 2 days when people were making 10 merges a day all we got was schizophrenia and text artifacts.SOLAR proved that the weights can be used if properly pretrained after, but that's not what people are doing on the regular as far as I can tell.
I have been having this weird issue with Qwen2's magnum opus that I don't know how to deal with. No matter how extreme I change the samplers, even if I unload the model and reload it, change it from exllama2_hf to exlamma2, nothing stops it from replying with the same exact response on sillytavern. I can delete the response, regenerate, anything, it will be the same or like 98% the same. The only thing that will change the response is changing my own input in the reply before. I never had this problem with models like Miqu before, what the hell causes it? How can I fix it?
>>101173181>>101173181>>101173181
>>101173177Did you set topK at 1 by accident or something?
>>101173177What do the logprobs look like? Notebook > Logits > deselect/compare Use Samplers check. needs _HF loader iirc
>>101172409Based source acquirer BTFOing the nosourcer.
>>101172409>comparing with stableLM lol, lmao even
>>101173331That's actually kind of curious. Looking at the results, maybe it literally is a reproduction of StableLM but in bitnet form? StableLM was fully open with training data right? So this allows them to make a more objective comparison.
>>101172409>>101173331>>101173360there's better comparaisons therehttps://huggingface.co/1bitLLM/bitnet_b1_58-large
>>101173369>The models are trained with RedPajama dataset for 100B tokens.>100B tokens.>100B
>>101173396https://arxiv.org/pdf/2402.17764Those numbers are also for 100B tokens?
>>101173409so bitnet works?
>>101173427looks like it, to be sure a company should make a big BitNet model, looking at you Meta...
>>101173409>We further scaled up the model size to 7B, 13B, and 70B and evaluated the>cost. Figure 2 illustrates the trends of latency and memory, showing that the speed-up increases as the>model size scales. In particular, BitNet b1.58 70B is 4.1 times faster than the LLaMA LLM baselinesure seems like bigger models still have plenty of fat left to trim
>>101171560>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]what