/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101749053 & >>101739747►News>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101749053--vLLM GGUF loading issues with Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf: >>101752460 >>101752493 >>101752587 >>101752887 >>101752942 >>101754105 >>101754463--Using LLMs for code obfuscation and randomness: >>101750235 >>101750681 >>101751579 >>101751645 >>101751742 >>101752513 >>101753593 >>101753898 >>101754413 >>101754550 >>101754571--Training a multimodal LLM for reverse stable diffusion captioning: >>101751771 >>101751869 >>101751899 >>101751940 >>101752015 >>101751930--InternLM 2.5 20B model impresses with benchmark results: >>101750268 >>101750302 >>101751043 >>101751291 >>101753224 >>101753296 >>101750400--Florence and Kosmos for multimodal image description: >>101750080 >>101750118 >>101750228 >>101750173 >>101751908 >>101751942 >>101751950--Anon discusses prompt execution time and computer specs: >>101754319 >>101754426 >>101754477 >>101754507 >>101754698 >>101754725 >>101755844 >>101755911 >>101756005 >>101756120 >>101755903--Using video games to test AGI morality is flawed due to inconsequential decisions: >>101754707 >>101754720 >>101755315 >>101755394 >>101755515--Updated Mistral preset for Large and other models: >>101751804 >>101753610 >>101753679--Hardware suggestions for power-efficient Gemma 2 27b inference server: >>101752368 >>101752925 >>101752951 >>101753060--DeepSeek implements caching on API, others to follow: >>101757114 >>101757169 >>101757232 >>101757482--fp8 vs fp16 and schnell quality issues: >>101749508 >>101749522 >>101749599 >>101749963--LORA hotswap endpoint merged, mixture of LORAs idea status unknown: >>101753955 >>101753979--Anon wants speculative decoding for server mode for coding tasks: >>101752977 >>101753322 >>101753548 >>101753584 >>101753613--Miku (free space): >>101749957 >>101751995 >>101752374 >>101752420 >>101752435 >>101752960 >>101752990 >>101753050 >>101753073 >>101755688 >>101756008►Recent Highlight Posts from the Previous Thread: >>101749058
Dead general.
>>101757635We need to revitalize it with more mikus!
how's the new magnum 32b?
>>101757635I miss the log posters and the Miku Movers who made this thread really comfy, now its just full of /pol/ schizos.
>>101757635Dead internet.
https://x.com/HalimAlrasihi/status/1820918388002009363
https://blackforestlabs.ai/wp-content/uploads/2024/08/tv_no_screen_2.mp4
>>101757729>>101757743Do you have vanilla? I'm not into your fetishes.
>>101757729>not even black just mannequin coloredIf you're going to live out your cuck fetishes as least do a good job.
>>101757729Holy mental illness. Get help
>>101757772You obviously have a racemixing fetish, you told us so even.
>>101757742Looks like if you replace the 2 with a 3, you get a higher bitrate video.
>>101757742Do I even want to know what the VRAM requirements for this is going to be?
>>101757791Miku mindbroken him LOL
>>101757815Its not really cope when you told us.
>>1017578041 and 3 are hevc, don't play in my browser.
>>101757742huh, they actually have info on their site
>>101757840Projection because you clearly are to the point where you have gigs of this stuff saved on your computer.
can someone host local models for us to test instead of gatekeeping?
>>101757772based
>>101757849https://blackforestlabs.ai/up-next/Not that much info.
>>101757890kek, didn't read. Have fun seething.
>>101757875Use colab/kaggle poorfag
>>101757908>Obsessed for monthsNah you're mentally ill schizo. Glad I'm not having a meltdown on an anime waifu
>>101757901Speaking of German AI. Has AlephAlpha done anything noteworthy in the past few years? Last i heard from them was their shitty proprietary 70b that they advertised as the definitive BLOOM/OPT-175B killer.
>>101757908>gets filteredBrainrot is so easy to spot these days, it was probably all cope anyway.
>>101757926you shut the fuck up retard how can you expect people to learn locals if you fucking say "LOL POORFAG"
>>101757901i just meant their announcement post
>>101757944I filter brainrot words as well, zoomers are so fucked up that normal people look mentally ill to them.
I'm quitting
>>101757948There is no use in seething about being poor. I'm poor myself and have to run the models on CPU and RAM.Just get more RAM.
>>101757742How can a company appear out of nowhere and absolutely destroy SD? I think the chinks are behind this.
>>101757948I've given you free options zoomer, be grateful and fuck off. You're not worth anyone time
>>101757908You need new material
>>101757601so how well cpu offloading works in vllm?
>>101757948why do you think anyone here cares if you learn anything?
QUIT BEING POOR IS WHAT YOU SAY BUT I LIVE IN A FUCKING COUNTY WHERE I CANT JUST GET SHIT FOR CHEAP. NO WONDER BARELY ANYONE GETS INTO LOCALS BECAUSE YOU SHOULD FUCKING HELP
>>101758083I'm saying that I'm poor too.I don't even have a video card...
>>101758083Dude, first of all relax retard. Secondly, what you can run is going to be heavily tied to what hardware you can buy/afford. So start saving. If you can't wait buy a runpod, if you can still manage to put money aside for food, housing, fuel, and the little fund to get better computing parts. Say goodbye to other time and money sink hobbies as well while you are at it.
>>101758173>If you can't wait buy a runpodDoesn't that require ID + photo KYC bullshit?
>>101757672
>>101747334Need more info/updates on this
>>101758190No idea, if it does then fuck that and your new hobby is now saving money, kek.
>Try 405b>It's really goodIt's not fucking fair, bwos...
Is this actually better than Stheno 3.2? https://huggingface.co/Sao10K/L3-8B-Niitama-v1
>>101758409I use 405B and Mistral Large on openrouter and I find myself preferring to use Mistral most of the time
>>101758442I didn't test it much but seemed like a wash aside from it being a little less horny by default.
alright bros what cool new models we got
>>101758512I'm not your bro, pal.
>>101758548I'm not your pal, buddy.
>>101758512>>101758548>>101758559I am your bro, pal and buddy, fellas.
>>101758566then why didnt you answer my question :(
>>101758575My silence is saving your soul, please understand
>>101758512it's been months and there still hasn't been a release any better than commandr 35b desu
>>101758581But anon I need my celeb sex toy ai
>>101758583What is command r good at?I just downloaded a while ago.
>>101758594Starling 7B Beta.
>>101758385Probably nothing, the guys posts don't seem that intelligent so it's likely someone just padding his paper count with bullshit. I'm starting to believe that guy that said LLMs will be smarter than 'AI researchers' soon.
the fuck is a brainstorm model?
>>101758629But anon thats old
>>101758583Gemma 2 27B is pretty damn close. Hell of a lot smarter than CR while still having a nice writing style. Too many spine shivers and voices barely above a whisper though.
>>101758472>Use mistral-large on openrouter>Begins repeating itself within the first messageI dunno, man. It's very technically proficient, but I just can't get it to stop repeating.
>>101758512>>101758583>>101758690See anon, that's how you get answers on this board. Don't ask to be spoonfed, just post an opinion that will bait responses you want.
>>101758409sterile as fuck. It's good if you're a regular user, but shit if you want creative writing/roleplaying. Even then, largestral is better at coding, so it's lacking for productive cases as well.
>>101758751>Don't ask questions, just consume LLMs and then get excited for next LLMs
>>101758751This but unironically
>>101758750I posted about this last night. Same exact setup and same exact problem. I hypothesized it was related to Context Size as I couldn't modify it via the OpenRouter API on Kobold but idk what front end you're using.
>>101758846I'm using Sillytavern, let me mess around with context size.
>>101758785L3 405B needs higher temperatures to get sovl out of it. If you aren't running at least 1.0 you're wasting your compute.
>tfw people are talking about using 405b>me, stuck at 70b
>>101759055Anon, those of us talking about 405B aren't running it on our own computers
>>101759055And I'm stuck at 30b models running on RAM
>>101759055405B can be used for free by trialscumming together.ai
>>101759055>>10175907312B checking in (T-T)b
>>101757635with flux and the latest generation of LLMs people are spending less time posting and more time actually gatcha rolling AI content
>>101759099Don't they have the worst prices, low context, insanely quanted models, and a shit free trial now though?
>>101755281>>101755355>>101755454>>101749539I can finally coom to a 2B model
>>101759133>free>pricescan you rephrase your question in a less schizo way that doesn't make it seem like you don't understand what "free" means?
>>101759211I mean sure, if you want to create a new burner every time you want to get three generations of quantslop rather than spending $3 for better options on OpenRouter, that's your perogative
>>101759275I do use openrouter personally, but I'm assuming the frogposter was some kind of poor if he was upset about not having access to 405Blast time I trialscummed Together they gave $25 credit on signup
ZLUDA has been taken down.https://github.com/vosen/ZLUDAWhy is AMD so retarded?
>>101759347The CEO of AMD and Nvidia are related. That should be all the information you need to piece together what the scam is.
>>101759133>worst pricesThey don't have the worst prices.>low contextIt seems.>insanely quanted modelsThey have INT4, FP8, and FP16. The livebench benchmark was done with FP8 and 405B is the 3rd best model.>and a shit free trialI think it's $5, but I still have my $25 from a few months ago.
>>101759434They do though. Input price dominates, and it's 4.50 on Together, which is highest out of all providers, on top of a whooping 4k contextThere's truly no reason to use them if you can avoid it
>>101759474>which is highest out of all providersFalse. They also have best throughput among the ones in OpenRouter.
>>101759434One thing I forgot to mention, the FP16 models are reference models. Those will double in price after August 31 to encourage the quants.
>>101757682It's good, but they need to stop being fags and make either the v2 magnum 72b, or do l3. 1 70b magnum
>>101759515On OpenRouter, you tard.Also, there's your 2 token/second throughput.The service is shit. Get money and use a better provider.