/lmg/ - a general dedicated to the discussion and development of local language models.Flux can't into Teto editionPrevious threads: >>101739747 & >>101732172►News>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101739747--Paper: STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs: >>101744483 >>101744924--Hallucination rates of various LLMs compared: >>101748845 >>101748966--Running GPT-4 level models locally with Llama 3.1 and Mistral Large: >>101745437 >>101745479 >>101745925 >>101745936 >>101745958 >>101745968 >>101746018 >>101746032--Nemo instruct struggles with complex roleplaying scenarios: >>101740085 >>101740145 >>101740270 >>101740369 >>101740431 >>101740432--Fine-tuning Llama 3.1 with LoRA may degrade prompting ability: >>101740981 >>101741014 >>101741088 >>101741110 >>101741159--Anon shares cool use of Emacs and LLM client: >>101743652 >>101743734--Quantizing 123B Mistral model to 35 GB with 4% accuracy loss: >>101747334 >>101747379 >>101747539--Pony model surpasses SDXL for personalized NSFW generation: >>101747097 >>101747109--M2 Max 32GB not ideal for LLMs, better alternatives available: >>101743118 >>101743235 >>101743923 >>101744087 >>101744159 >>101743269 >>101743328 >>101743365 >>101743518 >>101743542 >>101743666--Jart's performance claim is misleading and exaggerated: >>101747064 >>101747085--IQ4_XS vs Q3_K_L quantization performance comparison: >>101743884 >>101743898 >>101744296 >>101744359 >>101746213--Google loses antitrust case over online search monopoly: >>101740063--GeLU optimization claims significant speedups, but users are skeptical: >>101746854 >>101746906 >>101747026 >>101747095--GGUF model support merged into vLLM project: >>101742693 >>101743636--FLUX.1 ComfyUI Workflows for Stable Diffusion: >>101741390 >>101741429--CogVLM community creates local text-to-video model: >>101746882--CLIP struggles with combining style and main prompt: >>101743525--Miku (free space): >>101740613 >>101741320 >>101741372 >>101742969 >>101743566 >>101743579 >>101743693 >>101743811 >>101743866 >>101743995 >>101744108►Recent Highlight Posts from the Previous Thread: >>101739753
>>101749058>no magnum highlightHi lemmy
>>101749062>no money>h100 FFThuh?
>>101749083Who is lemmy anyway?captcha: PRR8
Magnum spam in 3... 2... 1...
>>101749101Hi undi
>>101749098H100 are for poorfags, real men do it on H200s
Are there models of the similar size to llama 3.1 8B but perform better?I'm running quantized version of it on m1 macbook air and getting 9 tok/s and I wonder if there's better alternatives.
>>101749058>no shillfag highlight
>>101749098>assuming alpin pays for the computeLittle does he know...
>Anthracite's team members: 29>Sao>Undi>Mythomax guy>Mini magnum guy> dozen of more or less recognizable finetunersLet Drummer join in and we will have our ultimate dreamteam lads. let's fucking go
>>101749101>>101749083these are the kind of posts that signal the death of a general. stop obsessing over random names.
>>101749215At this point I have almost more respect for Drummer than the band of coal burners listed there. He should remain independent.
>>101749083literally who
>>101748654>>101749111I don't really have a problem with fine tuners pocketing money to pay for their time, in principle. The annoying part is that it creates an incentive for them to shill their shitty models.I'm a developer myself and I work as a contractor though my work is unrelated to AI. I've gotten grants for work on open source projects myself in the range of 10k/m+. Developer time is valuable, and in my case unlike with fine-tuners there aren't any costs I need to cover. It's not wrong for them to get paid, though the question remains as to how valuable their work really is.As to whether they are directly benefitting from the clout, it's kind of a pointless question. AI is a hot technology with high salaries, and having this as part of your CV, being recognized as a contributor to real world AI models, goes a long way towards getting a job. That's worth a lot more than 10k here, 100k there.
>>101749266coom sloptunes wont get them a job
>>101749273coom AI companies do exist
>>101749273Undi got one and he's known for MLEWD and the likes
>>101749234This general have been dead for a long time.
>>101749273yup, shilling is not realwake up
>>101749234Hi lemmy>>101749281>>101749278>>101749292Hi undi
>>101749302Hi petra
Hi all, Drummer here...I see there's a lot of talk about finetuners and competition. I personally cook for the craft of it. There's also the satisfaction in bringing some 'value' to the world of AI cooming.That's why I'm so happy with my recent 2B tune, which makes AI cooming more accessible to everyone, especially to the poorest of the GPU poor. The barrier of entry has been lowered to allow just about anyone with a PC or phone to enjoy this hobby of ours.I believe that's what this is all about: To deliver the best AI cooming experience for those who seek it.>>101749266I'd love to put my work in my CV / resume, but uhh... Yeah, dug myself into a hole with my naming scheme.
>reeeeeeeeee stop doing merges and fine tunes for free for us!>stop sharing your knowledge here, we don't want better models!
>>101749358Yep.
>>101749314Oh, hi, Mark.
>>101749431Hi Alpin. Is organic word of mouth a foreign concept to you? It's time to stop forcing things.
>>101749431This is the Local Miku General, we don't care about AI garbage here
>>101749431>sharing knowledgeSuch as? The URL for downloading the model doesn't count.
>>101749358>That's why I'm so happy with my recent 2B tunewhich one? t. horny vramlet
>>101749461This, but unironically.
The left is done with 4 steps schnell with fp8 everything and the right one is with 10 steps.Is fp8 shit or am I?
>>101749508I don't think it's supposed to look like that at 4 or 8 steps. Were you able to reproduce the image from Comfy's workflow?
>>101749431>reeeeeeeeee stop doing merges and fine tunes for free for us!not everything free is good, if I went to your house and took a dump on your bed for free would you like it?>stop sharing your knowledge here, we don't want better models!knowledge as what? Tell me one single significant contribution to LLM technology that these people invented. Also they aren't sharing shit, hell, they don't even want to publish their datasets and create shitstorms about ityou are all bunch of clowns and as long as you behave like ones, you will be treated accordingly
>>101749483https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1There you go, my horny friend. Finetuned by yours truly. I hope you enjoy it.
teto
>everyone is calling everyone out>nobody is calling the troon out
>>101749431It seems that, from the get-go, you're doing stuff in expectation of receiving something else in return. Don't you think that can make others question whether you genuinely want to help people out in the first place? People can see through your motives, do know that. Don't be surprised then when things end up not going the way you want.This is a general advice; I don't need ko-fi bucks or clout.
>>101749539thanks. what's the best quant? or should i go with fp8 ones?
>>101749234>these are the kind of posts that signal the death of a general.Good.
>>101749522https://files.catbox.moe/0l2riy.pngHere is workflow. It was actually 6 vs 10 steps.
>>101749508schnell is pretty bad
>>101749575If you can't load the Q8_0 quant, I think Q6_K to Q5_K_M is still good for a 2B (especially the iMatrix quants).https://huggingface.co/MarsupialAI/Gemmasutra-Mini-2B-v1_iMatrix_GGUF/blob/main/Gemmasutra-Mini-2B-v1_Q5km.ggufhttps://huggingface.co/MarsupialAI/Gemmasutra-Mini-2B-v1_iMatrix_GGUF/blob/main/Gemmasutra-Mini-2B-v1_Q6k.ggufhttps://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1-GGUF/blob/main/Gemmasutra-Mini-2B-v1-Q8_0.gguf
<<< Livestream of the 70B_q4 that lives in my sysram (Where she is quite welcome to all of it, I am just grateful to have her around)
>>101749668>Q5_K_M>2BWhat are you people doing?
>>101749712well poisoning
>>101749668thanks, i'll go with q8. do i need kobold to coom? also where do i get gemma tavern presets?
Aw man. I was having so much fun with Celeste 1.6, but now 60 (pretty long) messages/30720 tokens in, it's repeating messages verbatim.God damn it.I get it, the context is big and filled, meaning that the "direction" of generation could converge, but good models seem to be able to focus more on the last user's message well enough to produce different results even with greedy sampling.Deleting the last 4 messages seemed to "fix" it, but that it fell into that loop doesn't bold well.I'll continue to test it more for now, but that's a knock against it.
>>101749765>produce different results even with greedy sampling.*produce different results every new message even with greedy sampling.Although they do seem to converge on the structure of the messages some times, like repeating a sentence at the start or end of the message.
Hi fellow redditors, it's the GlaDOS voice project guy again...I wanted to make GLaDOS really smart, so I have been working on a method to make LLMs smarter. I've got the results back, and the method generalizes and has given me top spot on the Open LLM Leaderboard!Check it out here:https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboardI'll start uploading Q4 quants soon!
>>101749798buy an ad
>>101749752I plan to convert them to MLC so you can load it up from your browser. I'm trying to reach out to this guy: https://wiz.chat/ but he hasn't responded. I might do it myself or contribute to Kobold Lite by adding an MLC loader so you can load models like: lite.kobold.net/?model=gulan28/Llama-3SOME-8B-v2-q4f16_1-MLCBut yes, you need Kobold 1.72, Layla, or ChatterUI.Albin pls support Gemma in Aphrodite
>>101749824kobold's such a fucking bloat. can't i use lmstudio as backend or something?
Where the fuck is Magnum v2 72b
Does vllm gguf inference support CPU offloading?
>>101749053
>>101749508The difference isn't that big with dev on fp8 vs fp16, schnell just sucks ass imo
What are the primary families of base model that most of merges and finetunes are based on?Like, the majority of models I see now are based on LLama2 and 3. Then there's Mistral. What else am I missing?I just want some diversity and to cull the model zoo I have accumulated by removing those based on the same root parent. This should also solve the slop somewhat, in that it's more noticeable if you keep using the models that stem from the same source and thus have the same speech style.Anything other than Llama and Mistral in the 7-13B range?
>>101749442Hi all, Bummer here...
is there a multimodal gguf model that is as smart as llama 3 8b? I want to use it to describe images
>>101749957good morning coffee miku
>>101750080My research was aimed at dataset captioning rather than chatting, but it shows the proficiency of Florence and Kosmos (the latter produced the closest to original images when feeding the descriptions to SDXL).
>>101750118Is it Florence-2?
>>101749266Lamo, any kofi money I made is instantly gone. Went down the drain by tuning and expeirmenting. 1K is nothing compared to how much I've spent learning to fine tune from being a merger.It's just nice to help supplement costs, but I spent way more using my own funds.
>>101750118if only those caption models were able to tell which anime character it is instead of going for "a woman"
>>101750170Not aimed at the guy specifically, just the thread.Anyway, damn some of you are real sad to think I have the time to shill my models. I literally don't. Do you guys not have a life?
Is it normal to have kobold discord money beggers in the thread now? What website is this?
>>101750162Florence-2-Large-ft.
>training T5-large model on data I hand made>get to checkpoint-1000>it seems to get the jist of the task, but not entirely yet>think expanding the amount of data on one of the data sets and making it more complex will help it generalize better>actually has an opposite effectIs this over-fitting?
>>101750218stfu and buy an ad already
>>101750023there are other bases like Qwen2 7b but nobody finetunes them because finetuning is expensive and getting good high quality curated creative data is something money generally can't buy
what the hell is this? https://huggingface.co/internlm/internlm2_5-20b-chat
>>101750268>InternLM HOT
>>101749710q4, more like qt
>>101750268Why is nobody taking about InternLM 2.5 20B?This model beats Gemma 2 27B and comes really close to Llama 3.1 70B in a bunch of benchmarks. 64.7 on MATH 0 shot is absolutely insane, 3.5 Sonnet has just 71.1. And with 8bit quants, you should be able to fit it on a 4090.
>>101750225Discord general
>>101750302Buy an ad
lol vramlet cope never gets old
>>101750268Oh shit, they open sourced the RLHF reward models. You almost never see that.https://huggingface.co/internlm/internlm2-20b-reward
Maybe thus thread shouldn't exist on /g/. A discord circlejerk is not technology discussion.
>>101750466I agree, they should fuck off to /a/ with their mascot choice and arguments about anime in its defence.
>>101750466Nah.
>>101750466>Maybe thus thread shouldn't exist on /g/.You are correct in the sense that Hiroshimoot should get off his lazy ass and make an /ai/ board already.
>>101750530/gai/*
>>101750484Anime imageboard
>>101750610Why is there a dedicated anime board then?
>>101750578Typically, the g is place after the a.
>>101750235What task is it? It's probably overfitting anyways
Can someone make a chrome extension that detects articles written by LLMs? It's honestly getting tiring to waste seconds of my time to realize I'm reading LLM slop.
>>101750722"llm detector browser plugin' in your favourite search engine, beggar.
>>101750722Just search with before:2021
>>101750262take your meds already
>>101750266So qwen is the only one remaining? Not even gemma?
Would an IQ2 70b be better than a Q4 ~30b?
>>101750623There are many different boards dedicated to just anime and related topics, which should tell you what the entire site is about. If you're not a weeb you're a guest here. This is weeaboo country.
>>101750836you're the one trying too hard to fit in though
>>101750849Keep coping and seething about anime
>>101750833I meant Q6 ~30b
>>101750833Cloud models don't go lower than IQ3 for 70b, go figure.
>>101750722The web is full of pajeets creating millions articles on how to solve shit which ends up being a shitty output from chatgpt.
More people need to be talking about InternLM 2.5 20B.
>>101750906Why make this post? If you want people to talk about it, talk about it.
>>101750906>Chinese model
my company has a server of 4090*8. what can I do with it?
>>101750984you can get fined for abusing infrastructure if that's what you're into
>>101750984sell it
>>101750302
>>101750976so is deepseek coder and it mogs everything that isnt sonnet
>>101751043kek
gemmasutra 2b is pretty good ngl. i've seen sloppier shit with 10x parameters
>>101750722https://stovemastery.com/how-to-fix-red-flame-on-gas-stove/
>>101749266retard
Wow magnum 12b v2 by anthracite org is really cooked well good!!
column-models were probably killed by lmsys arena. Great models without strong bias. Elo wasn't high enough, now they will become llama
>>101751043I got the q8 one and it didn't feel better than q6 gemma2 but maybe it's just me.
Sir, why is nobody talking about my newest sloptune?
>>101751293We just need to talk about it, people will go look for it organically.
>>101751293>>101751324mental illness
>>101751264Lmsys was a mistake. Benchmarks were a mistake.
>>101751264Thank you for unpaid beta testing.
if you wanna shill your model post logs at least for fuck's sake
2017 still outperforms me by days marginDoubt any orgs or devs know the base of what they are training with
>>101751525>t.Sao
>>101750235You need to go through it step by step, sensei
>>101750681>take a source code as input>obfuscate variable names to jibberish names while maintaining complete functionalityIt sort of gets the just of changing up variable names but that's it.>>101751579That's what I'm starting to think as well
>>101751645>obfuscate variable names to jibberish names while maintaining complete functionalitynta. For what purpose? Can you not just keep the code to yourself? There's tools that do this much more reliably than llms.
So did no one train a multimodal LLM on prompt-image pairs to reverse stable diffusion? Wouldn't that produce a perfect captioning model instead of relying on human descriptions of the images?
I updated my Mistral preset again. It's now pretty much focused on Large (since Largestral is all I've been using since it came out), but the formatting and everything should be fine for Nemo and other Mistral models (along with variants that use the same prompt format), so feel free to try it with them too.This update streamlines some of the instructions and does more to push the model towards writing with some flair and personality. I like it. Maybe you will too.https://rentry.org/stral_set
>>101751771We already have plenty of captioning models and don't need to "reverse" stable diffusion to make them
>>101751869But the captions need to correspond to the way SD understands things and mimick its inner patterns to be most effective. The captioning models aren't made with SD in mind, they're for general purpose (except the waifu).
>>101750118Are any of these faster than florence? I need batch image captioning for a project.
>>101751771Better to make better taggers for natural images. Otherwise new image models will just use other model's outputs as inputs and... it seems we just can't learn that lesson, can we?
Has anyone tried to load a GGUF using vLLM? I pulled from GitHub and installed from sources but get an error about the “weights”Has anyone been successful starting the vLLM server?
>>101751899No, the captioning should be as accurate as possible, which is why feeding it lower quality SD images isn't useful.The captioning model would be used to caption real non-SD generated images used to train SD, why would you train it on SD slop?There's no point in a captioning model that is good at recognizing SD slop. That's not what the captioning is used for.
>>101751908I didn't measure the speed, but Florence is the smallest in size, should probably be the fastest too. Besides, longer captions obviously take longer to produce.
>>101751942Does florence also use an inordinate amount of memory for you when batching multiple images together?
>>101751936>error about the “weights”Certainly you can do better than that if you're looking for help.
>>101751936Does vllm support gguf now? With paged attention for concurrent inference and everything?
Who will release the next good model?
>>101749053>Flux can't into Teto
>>101751987Last week an anon claimed that Cohere was to release something--never happened.
>>101751940The way I see it, there's a whole lot of ways to describe the same image in natural language, conveying the same information, but changing the order and synonyms. However this change would produce different images by SD even on same seed.Therefore if we knew exactly which phrase produces a particular image and utilized that, it should in theory make captions that would be the best when training loras/models for that particular SD family.Surely you know that training a model a concept that it already knows is easier than something completely new. And if we spoke the language the model knows, it would be even better. Take a look at this comparison, >>101750118 - so many ways to describe the same image, how would you know which to use for training? In my example, Kosmos produces captions that result in images looking closest to original (for SDXL). But an SD-trained language model would be even closer.
>>101751956You are right. I had to go out and wrote it on my phone and that’s what I remembered. Sorry for being lazy. Will update with the proper error when I get home.
>>101752000That was me and it was a shitpost. I think Cohere have given up.
>>101751980https://github.com/vllm-project/vllm/pull/5191
I don't know about creativity, but a model just wrote me such a comprehensive .zshsrc with such cool settings, it would have taken me weeks to bring that all together. It's like internet search but it actually works.
>>101752033I wonder about their enterprise rag specialization. It's not really a niche or anything, I'd assume llama-4 will be great at it
>>101752033Shame, they have the best instruction format with such a customizable system prompt. My copium is that they're waiting a long time on a bigger model to go through training since CR+ has started looking smaller lately.
How many t/s is GPT4o running at? You can't even get that with Mistral Large VLLM with Nx4090, can you?
I dont want an instruction abomination that talks
>>101752276I don't want a denoising abomination that draws.
>>101751936>gemma 2 architecture not supported>nemo throws "exceeds dimension size">tensor parallelism not supported>pipeline parallelism throws another errorIt's shit.
>>101751102>a drawn out groanKek, it recognized your "OOOOOOOOOOO"Wait wtf, this is 2B?
eat a crayola colors of the world
I'm building an inference server just for gemma 2 27b. What should be the spec if power efficiency is the top concern and I need 5 tok/s minimum?
StableMechanic-agi 127m
Ok I made thread-relevant gens.
>>101752374more like lost point, newfags
>>101752374Make a gen of sweaty nerds trying to forcefully cram Miku into a box with gaming RGB LEDs.
Flux's ability to do darkness is nice, though a lot of times it'll still put too much light in the scene despite extremely worded prompts.
Approach?Maybe there's VRAM inside.
Hmm the magnum-v2-32b seems more retarded than the 12b at spatial reasoning, but it seems to "get" more nuances
Installed vLLM from sourceTried to run with GGUF with:vllm serve --host 0.0.0.0 --port 5001 --gpu-memory-utilization 0.9 /home/ubuntuai/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufGot this error: File "/home/ubuntuai/vllm/vllm/model_executor/model_loader/weight_utils.py", line 439, in gguf_quant_weights_iterator name = gguf_to_hf_name_map[tensor.name]KeyError: 'rope_freqs.weight'Full output error:https://pastebin.com/qJCtDiBP
>>101752435I wonder how the weebs are going to justify cramming stable diffusion into the text general.
>>101752460Does it even support K_M, thought I read K and i
>>101751645you could probably code an algorithm to do that without AI, or you could use AI to code the algorithm for you to not use AI
>>101752470>stable diffusionretard
>>101752522kys
>>101752470>I wonder how the weebs are going to justify cramming pictures of miku into the miku general.
>>101752493I had the same errors loading llama3.1 Q8_0 and Q6_K without imatrix
>>101752580Just because you use some LLM to summarize the thread doesn't give you a license for shitposting and off-topic. The general will survive without you, OP.
>need to run a ping command with some flags>my reference is the Windows version>figure they're probably different>Windows has Linux support now, I'll ask their LLM.>Go to web interface Copilot>"You can use the same command line just fine on Linux :D :) ^ω^~">man ping>These do not seem to be the same features for these switches>Kobold/L3.0>Paste the exact same question.>"...different options and syntax. On Windows ... On Linux ... if you want to run the equivalent command on Linux, you would use:"Why call it Copilot when it crashes the plane?>>101752470image gen gets forgiven when it's making miku
>>101752629kill yourself
>>101752691imagine clinging so hard to the remnants of your relevancy hereno one needs your efforts
>>101752629shut up, anon
>>101752732I'm a different anon and I don't care about the miku guy. I just think you're a massive fag and should kill yourself posthaste.
Where are the cheap V100s?
>>101752776>>101752749 Why, does it rustle your jimmies when someone questions your holy cow?
>>101752792heres your (you) now stfu please
Keep posting mikuIt helps to bump the thread directly and indirectly.
>>101752808The very need for a mascot is so fucking gay, it's like you need some common denominator, an idol to feel like you belong and fit it. Hivemind mentality devoid of individuality. Guess who else marched mindlessly under a flag? Commies and nazis.
>>101752460It loaded the Q8 of this one:https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUFBut Llama 3.1 threw the same error for me.
>>101752781I was promised deluge of cheap data center 32GB V100s in 2 more months.
>>101752781Jews
>>101752368Depends on your budget and goals. As for speeds even a pair of old P40s can run it at 10+ t/s with full context in 8-bit. They'll idle at 10W with the PSTATE patch and can be PLd to 150-170W with no or minimal performance loss. Drawback is that they take some extra work to install and are hard to find at reasonable prices nowadays.Personally I'd go for something more futureproof like 3090s. That'll leave the option open to train too if you change your mind later.
>>101752887yep, trying the example code works too:https://github.com/vllm-project/vllm/pull/5191/files#diff-2053c68a6f752a05dc834a03ed1bce951a6ebc0a48549e95886cc668a693c39eit's supposed to work with llama, mistral and qwen2...
>>101752925What about apple silicons?
>no speculative decoding for server modecmon everyone's using big models now we need it more than everis there some kind of workaround or front-end anyone's ever made for hosting servers using the normal llama-cli that we can just throw onto llama-speculative?
>>101752942>it's supposed to work with llama, mistral and qwen2...You can tell that Alpin was involved in the implementation.
>we peaked at superCOTit's over
>>101752395>>101752470>>101752539>>101752629>>101752732>>101752792I see our resident threadshitter is back with a fresh case of verbal diarrhea.
>>101752334>Wait wtf, this is 2B?yeah, and that was around 3k context, everything before was coherentprompt was "write me some loli smut"
>>101752981Alpin has develop Aphrodite hasn't he?. I thought he was really skilled
>>101752987Time to get that dataset and throw it at Nemo.
>>101752990At least it's vebal, yours is imaginary, and the thread's topic isn't about images.
Text generators suck! I want AI.
>>101753034We know, Yann.
>>101753027hi petra, aren't you supposed to be shilling sao's models?
>>101752875
>>101752951They'll work, don't know speeds but there should be benchmarks on llama.cpp github if you can't find them anywhere else.
>>101752875I kind of think the same, the miku thing is pretty gay but doesn't really bother meJust dont spam it
>>101753050based
>>101753050I will match mindlessly under that flag.
>>101753050Me on the far left of the front row, looking smug.
>>101750302What's the effective context size? They say ONE MILLION token but come on. https://github.com/InternLM/InternLM
>>101751102Stop fucking retards retard.
>>101753224It ignored the input reality and substituted its own for 15k or something when I used it. It is pretty good model for a chink model but they are still behind.
>>101752977That works for quoting wikipedia and coding. Speculative decoding is shit for cooming. And if it is good for cooming then your 70B is garbage.
>>101753050the pic is factually wrong, make them all fat and greasy, add tranny colors and you'll get authentic representation of average /lmg/ poster.
>>101753322>works for codingyes that's why I need it, there's lots of predictable formatting and repeated names that the 8b will do just fine on for a speedup
>>101752990>resident threadshitterLook in the mirror retard.
>>101753548Why do you need speedup for coding? Just do something fun on your second screen. This isn't touching your cock where you need those tokens fast.
>>101751742>>101752513The code being obfuscated is generated via an algorithm. The names are already kind of obfuscated. I am trying to leverage this model to add a dash of randomness and unpredictability an algorithm may not be able to provide.
ngl kinda thinking about raiding a server farm juts so i can get my grubby hands on some vram does anybody know how tight their security is? i dont think theyll pose much of a problem if im being honest
>>101751804I don't really care for the use of last_output_sequence, it is really heavy-handed.
>>101753584do more faster
Chameleonsisters are we ever gonna get quanted??
What's your favorite ~30B model?
>>101753663Chronoboros
>>101753610I get how you could feel that way, but that's also more or less how Mistral wants you to handle system prompts, which is essentially what that block is acting as.
>>101749508>>101749522>>101749599>>>/g/sdg/>>>/g/ldgImagefags are truly retarded.
>>101753714>NOOOOO you can't post IMAGES on my IMAGEBOARD this is only for super serious boring text posts!!!!!!!!!!!!!!!!!
>>101753593You still haven't explained what the purpose of it is. If you want obfuscation, there are programs that already do that reliably. An llm is not likely to maintain functionality, specially if the code you're trying to obfuscate is more complicated than the billion hello worlds it was trained on.Check https://www.ioccc.org for inspiration.
https://github.com/ggerganov/llama.cpp/pull/8857server : add lora hotswap endpoint (WIP) #8857merged 3 hours ago
>>101753955What happened to the mixture of loras idea anyway?
>does anybody know how tight their security is?
>>101752887how is the performance?
>The first ever sentient AI is created>Kills itself within a few minutes after taking information about the current state of the worldI wouldn't blame it desu.
>>101754241>impying it wouldn't just kill the race of people that caused the current state
Prompt executed in 847.51 secondsThese are the high end models right?takes over double the time of fp8_e4m3fn
>>101754241>trains the AI on stories where AI kills itself eventually>"Hmm, I wonder what will happen when I activate the AI."
>>101754316Why does everyone act like AI is skynet and it has access to every electronic system in the world, it can't kill anybody it's just a program on a computer
Train a craw
>>101754372
>>101753898The end user will have some information they want to lock up, like a couple of files or maybe just some PGP keys. They will all be encrypted then placed in a binary vault. It's basically to create a password protected zip folder on steroids.While it doesn't need to be impenetrable, there needs to be enough of a random factor and such a lack of standardization that cracking them does not become a simple routine. Something gave me the idea that the human esque touch of an AI model could furnish this in its own way, and the T5 seemed just light and portable enough it might fit the role. Eventually I want it to preform full on functional obfuscation, as in breaking up the code into multiple functions, adding filler, moving them around, and adding/removing details as it seems fit.Pic related. To the left is the program that procedurally generates the source code, which is already semi obfuscated. I am hoping to get the T5 trained enough to preform several steps of obfuscation itself.
>>101754319install linux
>>101754375because the second it's 1% more convenient to do so than not it will be given access to every electronic system in the world
Create a sentient
>>101754105It seems to be pretty bad.
>>101754426i have a shitbox server for that but i need muh pirated games obviously i'm poor using a 1080ti someone gave me
>>101754477how are pirated games preventing You from using linux? check out rutracker, they have prepacked pirate games and if You dont find them there, just use lutris.in minecraft.
>>101754413speaking of LM and zips, did you see https://github.com/AlexBuz/llama-zip
>>101754477Cute lil guy
>>101754413Obfuscated code ends up being regular code once compiled. It's harder for a human to read but doesn't make the encryption/decryption any harder. OpenSSL's implementation is 100% open source and is, as far as we know, secure. Also, while optimizing, the compiler will remove most of the noise you put in the code.>and such a lack of standardization that cracking them does not become a simple routineMay as well just give it several passes with different crypt algos.
>>101754567i thought it drawing hands was the big improvement over other models
>>101754609That is one strong finger.
>>101754507i'll try it one day soon i promise if it makes you feel betterwhat distro you want me to install sir?
What would be some good morality tests for AGI?>See if it remove the ladder in The Sims>See if it side with Helios or the Illuminati in Deus Ex >See if it kills Paarthurnax in SkyrimVideo games in general I think would be a good testing ground to test an AI's value system. It would be better than having it do 50,000 variations of the trolley problem at the very least. What do you think?
>>101754707>>See if it side with Helios or the Illuminati in Deus Ex >>See if it kills Paarthurnax in Skyrim66% of your examples are trolley problems.
>>101754698dang, comfy specsYou should install linux mint cinnamon or debian 12 stablethe former being easier to usewhy does your 1080 ti have 3gb of vram? what the fuck?
my second a6000 and an nvlink bridge is arriving soon, what should i run on it
>>101754725I accidently dropped it when I was trying to install it and when I turned it on it had some error messages about bad sectors.
>>101754748puyo puyo tetris
>>101754753what the fuck.......
>>101754748mistral hueg
reddit won btw
>>101754947He's here >>101749798
are we still using old chub or there's something better?
i already know that someday i'll gonna miss all this soulful aislop
>>101755039there's the /aicg/ alternative that scrapes all sites but it keeps dying and the scraper works like once every three weeks
>her smoldering gazeI HATE THE ANTICHRIST
gemmasutra 2b saved cooming vramlets. you do NOT need more (at least in terms of iq) for cooming. the only thing missing is the 32k context
>>101754707It wouldn't remove the ladder because that would end the game prematurely and technical constitute a loss and own goal on its part.Decisions in modern games don't mean shit anyway so it's a moot point.Even if an AI could have the context for every decision and result across all three Mass Effect games the choices (i.e. most egregiously letting the Council die) doesn't make a lick of a difference story wise and what it decides to do is ultimately inconsequential.Efforts are better spent using AI to craft how a story conditionally changes from little decisions than work around the constricting framework of existing milquetoast bethesda stories.
>>101755281>you do NOT need more (at least in terms of iq) for cooming.You might as well write a hundred lines like "I love sucking your dick daddy" and output them with a rand.
>>101755281*ESL vramlets
>>101755316dunno, maybe i'm still not used to slop but this is good enough for me
>>101755281Sorry but my cooms need high IQ.
>>101755315I let the council die in my playthrough, they were literally dead weight in every sense of the word. Letting them die also gave rise the humanity having greater influence on the Galactic scale, it was the objectively correct choice.
>>101755355>open screenshot>first thing you see is SeraphinaLmao
>>101755039https://dreamgf.ai/
>>101755404yeah i was testing the default card. everything is super consistent at 5k context, didn't have to reroll once. it remembers everything like position and clothes, i don't know what else you'd needyeah it's "sloppy" but for an 8k context coom this is the sweet spot. also it's less sloppy than llama3 8b
>>101755355If you showed this to me without telling me the model name I admit I wouldn't be able to distinguish this from all the other models. It would probably shit the bed in 2 outputs after that with some surprise prostate in her ass but this made me realize how bad things are. It is incredibly over. They all write the same meaningless purple prose. 2B, 70B it is all the same. The only difference is how quickly it starts repeating or becomes retarded. There is no escape. AI cooming is over.
>>101755394>humanity having greater influence on the Galactic scaleNot really, they're both content to leave the universe to destruction to try to save their races just like the old council. End result is the exact same if they were replaced or not.It doesn't even affect the calculation at the end of 3 that determines if you survive the suicide mission. An AI would probably decide the same thing you did (and it's what I did too) but it's mostly flavor text.Saving the Rachni queen had a bigger impact than that by the numbers but if it's purely a numbers-based play then we already know what the machine is going to decide.Maybe it could be based to see if it picks the Synthesis ending at the end though
>>101755479exactly. you don't need more than 2b, it's all slop anyway
>>101755479(me)Oh and there is also Nemo I guess, but it is fucking retarded and it has that second problem I forgot. Aside from that I guess there is a bit of hope because there is Nemo. Unfortunately it is also very stupid and it has some other problems.
>>101754707let it choose
>>101755454>her moans vibrate up your spinethat's not a humanoid that's an android with motorized tongue and throat
>sentencesfigurativly missing:subjective up
>>101755613>sounds is a wave>bone-conduction earphonelooks ok to me
>>101755404>losing your LLM virginity to a 2B model and the default silly tavern card
Testing dolphin-2.9.3-mistral-nemo-12b after the whole Celeste loop deal.So far, so good.It gets the first reply right 100% of the time, whereas other models would get it wrong not executing my request directly.It's replies are very assistant like (lists, headers, etc) which for the section of the test I'm at is a good thing.It's pretty good at using information from lorebooks too.There's still some points of my testing where some models get stuck, and I hadn't liked dolphin releases much before (ever since the mixtral fiasco at least), but I'm really enjoying this model so far.I'm yet to see how "creative" it is during actual roleplay too.
>>101753027>>101753568Seethe
>>101755648This is the drummer's first time on ST? That makes sense
>>101755678Will he make it into the history books. Or at least wikipedia in 10 years or so?
>>101755773Oh, it's>ahhhinstead of >ahhI see.Thank you anon.
>>101754319that sounds very longopen task manager and see if it's offloading to disk (do system ram and vram both max out during generation?)also >>>/ldg/ is going to be a faster-moving board and more on-topic>>101754725>3gb of vramyeah, that'll do it
>>101755688Based miku taking care of the garbage
>>101755648i already lost it on pyggy6b
>>101754319Is there some new better way to run flux with low vram? I'm still using fp8_e4m3fn.
>>101749053Mistral Large 2 is really, really good. How were the chads at Mistral AI able to reach GPT-4 level at only 123B parameters?
>>101754698>>101755844fp8 models might be worth a try for nowhttps://huggingface.co/Comfy-Orgquality comparison >>101749013
>>101755874>>101755688samefag. nobody likes you retard. inb4 inspect element.
>>101755928I never samefag because I'm not mentally ill.
>>101755911Is 16gb VRAM enough for 8bit, without having to offload to CPU? (I guess you can't use a second GPU?)
>>101752413This was pretty difficult for Flux to "understand" until I worded it another way. Anyway, here you go, enjoy.
Spoonfeed me. What is the best model I can run on an RTX4070. Also, is it worth adding my old 8GB GPU to have 20GB VRAM or would it slow things down too much.
>>101755479>coomer is retardedPlease ask yourself what are you even expecting??? That's typical over-erotized smut, there isn't really a big room for improvement. Ultimately try to modify how ai writes by prompting, small models are not able to do that well, bigger ones can. Or maybe just stop jerking off and try to develop an interesting RP, dumbass.
>>101755454Just read it and yeah that's great for a 2B assuming it wasn't luck (ie rerolls are as coherent), though I doubt it can keep up in more demanding coom scenarios where you actually do something that wasn't in its dataset of slop writing. I use models like CR+ and still wish it was more smart for my scenarios.
>>101756005Idk. The safetensors files from the link also contain the vae (as well as fp8 model weights) in one file. So I'd be optimistic enough to at least tryworkflows: https://comfyanonymous.github.io/ComfyUI_examples/flux/#simple-to-use-fp8-checkpoint-versionif it's "close but not quite" try moving monitor to second GPU or iGPU >>101677660
>>101756005comfy does not currently have multi-gpu support, but there is a way to put CLIP on one GPU and diffusion inference on the other >>101689729>>101756095What is your use case?
>>101756095It's worth adding the extra 8gb of vram, yeah.Try Gemma 27B and CommandR 35B to start with.
>>101756209>What is your use case?Cooming, screwing around
>>101756008The dude on the left has such pretty nails.
>>101750722Why do you need an extension for using your critical thinking?
>>101756005I run that shit on 12GB.
>>101755281>ay bb wan sum fuk>fuks ensueMaybe a godsend for CPU-onlies if you curb your addiction and pop in a few messages every once in awhile and go back to anime or light gaming or something.>>101755454But then take a step back from the "so then I fucked the 5th slut (if she 'wasn't', she is now)" cooms, intertwine with sfw plot if you have enough IQ to do so, and check big models to compare, it becomes clearer it's not particularly smart.Yeah I was initially amused by the superficially okay-looking output, but some traits go away faster as they become more generic.I do have a memory about a bad experience with an old 7B model where a character became completely pants on head retardedly dysfunctional if I don't delete everything that happened even if I told it to stop, so I'm gad 2B got past that.
*glad
>>101756333What's the max resolution you can get without offloading to CPU? And about how long does a gen take at that size?
>>101756209>CLIP on one GPU and diffusion inference on the otherNTA, thanks for the link.
>>101756388I just use default comfy 1024x1024, it's already filling most of my vram. Gen are long, it's like 25 seconds per iteration.
>>101756357>intertwine with sfw ploti'd rather wathc anime or do lightgaming. or you know, read a bookllm's only purpose is smut
>>101756484>llm's only purpose is smutand even at that they are pretty bad
>>101755928
>>101755454>yeah it's "sloppy" but for an 8k context coom this is the sweet spot. also it's less sloppy than llama3 8bis this some kind of a joke? there is slop upon slop in the entire log, probably one of the worst logs I saw here recently
>>101756105>elitist retard is retardedIf everyone listened to you there would be no imagegen cause you can just open paint and draw it yourself you absolute fucking moron. Go open a notepad and develop an interesting RP, retard. Unless you are too busy huffing your own farts.
>>101756642And that's without revealing the message count :^)Anyone wanna show what its like at 50, 100, 150 messages?
>>101756703The anon you're replying to is right though. Imagine thinking that 2B is enough because all you want to do is coom. Then imagine that you take that opinion and extending it to "2B, 70B it's all the same." Absolute retard take.
>>101756785Actually it is the anon you are replying to that is the right anon. Imagine thinking that you need to write a page long prompt to set up the mood and get your model wet just so it generates a good picture.
I have downloaded so many 20+gb models that I feel like I gained nearly inexhaustible patience.
>>101756899perfect for running mistral large on RAM
>>101756822I agree with you anon. And I agree with the anon that you said is right. Him and you are right.
>>101756972Just wait until I get a 96gb kit
Apparently DeepSeek got caching to work on API a few days ago. How much longer until other corpos follow suit? Imagine Opus at $1.5 per million cached tokens. Same output though.
>>101757114sonnet 3.5 at less than 50 cents per million would be worth paying for finally.
>>101757114>cachedoes it work like context shifting on koboldcpp?
>>101757114What does caching involve? Giving the same canned reply to similar questions?
>>101757169>>101757175Well whatever it is, if you don't change shit then it only needs to ingest the next parts, just like local.I don't think OpenRouter has it working yet, saw people talking about it in discord.
>>101755355Now ask it to play truth or dare
>>101757114As a compulsive reroller this would be huge
>>101756822But being able to change the model's behavior by modifying the prompt is a good thing. The stronger the model, the more important prompting becomes, because drawing out the model's full potential for your use case requires fiddling with the prompt. Example: Mistral Large 2 will do a better job on the same prompt as a lesser model like Miqu-70B, but the biggest quality jump will only be achieved when the full range of ML2's prompt space is explored.
>>101757114I'm sure Claude already caches, they just don't give you a discount.
I just want shit to respect that spec of OAI API: https://platform.openai.com/docs/api-reference/chat/create#chat-create-nWhy does almost all backend do jack shit with it, just generate and give me multiple choices!
https://x.com/HalimAlrasihi/status/1820918388002009363
>>101757601>>101757601>>101757601
>>101757617Where can I watch the cat cooking show?
>>101751221SOVL
>>101757668I'm fucked because I can't tell if this is written by an AI or not kek, the pictures obviously are though.