[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Flux can't into Teto edition

Previous threads: >>101739747 & >>101732172

►News
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101739747

--Paper: STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs: >>101744483 >>101744924
--Hallucination rates of various LLMs compared: >>101748845 >>101748966
--Running GPT-4 level models locally with Llama 3.1 and Mistral Large: >>101745437 >>101745479 >>101745925 >>101745936 >>101745958 >>101745968 >>101746018 >>101746032
--Nemo instruct struggles with complex roleplaying scenarios: >>101740085 >>101740145 >>101740270 >>101740369 >>101740431 >>101740432
--Fine-tuning Llama 3.1 with LoRA may degrade prompting ability: >>101740981 >>101741014 >>101741088 >>101741110 >>101741159
--Anon shares cool use of Emacs and LLM client: >>101743652 >>101743734
--Quantizing 123B Mistral model to 35 GB with 4% accuracy loss: >>101747334 >>101747379 >>101747539
--Pony model surpasses SDXL for personalized NSFW generation: >>101747097 >>101747109
--M2 Max 32GB not ideal for LLMs, better alternatives available: >>101743118 >>101743235 >>101743923 >>101744087 >>101744159 >>101743269 >>101743328 >>101743365 >>101743518 >>101743542 >>101743666
--Jart's performance claim is misleading and exaggerated: >>101747064 >>101747085
--IQ4_XS vs Q3_K_L quantization performance comparison: >>101743884 >>101743898 >>101744296 >>101744359 >>101746213
--Google loses antitrust case over online search monopoly: >>101740063
--GeLU optimization claims significant speedups, but users are skeptical: >>101746854 >>101746906 >>101747026 >>101747095
--GGUF model support merged into vLLM project: >>101742693 >>101743636
--FLUX.1 ComfyUI Workflows for Stable Diffusion: >>101741390 >>101741429
--CogVLM community creates local text-to-video model: >>101746882
--CLIP struggles with combining style and main prompt: >>101743525
--Miku (free space): >>101740613 >>101741320 >>101741372 >>101742969 >>101743566 >>101743579 >>101743693 >>101743811 >>101743866 >>101743995 >>101744108

►Recent Highlight Posts from the Previous Thread: >>101739753
>>
>>101749058
>no magnum highlight
Hi lemmy
>>
>>101749062
>no money
>h100 FFT
huh?
>>
>>101749083
Who is lemmy anyway?

captcha: PRR8
>>
Magnum spam in 3... 2... 1...
>>
>>101749101
Hi undi
>>
>>101749098
H100 are for poorfags, real men do it on H200s
>>
Are there models of the similar size to llama 3.1 8B but perform better?
I'm running quantized version of it on m1 macbook air and getting 9 tok/s and I wonder if there's better alternatives.
>>
>>101749058
>no shillfag highlight
>>
>>101749098
>assuming alpin pays for the compute
Little does he know...
>>
>Anthracite's team members: 29
>Sao
>Undi
>Mythomax guy
>Mini magnum guy
> dozen of more or less recognizable finetuners
Let Drummer join in and we will have our ultimate dreamteam lads. let's fucking go
>>
>>101749101
>>101749083
these are the kind of posts that signal the death of a general. stop obsessing over random names.
>>
>>101749215
At this point I have almost more respect for Drummer than the band of coal burners listed there. He should remain independent.
>>
>>101749083
literally who
>>
>>101748654
>>101749111
I don't really have a problem with fine tuners pocketing money to pay for their time, in principle. The annoying part is that it creates an incentive for them to shill their shitty models.

I'm a developer myself and I work as a contractor though my work is unrelated to AI. I've gotten grants for work on open source projects myself in the range of 10k/m+. Developer time is valuable, and in my case unlike with fine-tuners there aren't any costs I need to cover. It's not wrong for them to get paid, though the question remains as to how valuable their work really is.
As to whether they are directly benefitting from the clout, it's kind of a pointless question. AI is a hot technology with high salaries, and having this as part of your CV, being recognized as a contributor to real world AI models, goes a long way towards getting a job. That's worth a lot more than 10k here, 100k there.
>>
>>101749266
coom sloptunes wont get them a job
>>
>>101749273
coom AI companies do exist
>>
>>101749273
Undi got one and he's known for MLEWD and the likes
>>
>>101749234
This general have been dead for a long time.
>>
>>101749273
yup, shilling is not real
wake up
>>
>>101749234
Hi lemmy
>>101749281
>>101749278
>>101749292
Hi undi
>>
>>101749302
Hi petra
>>
Hi all, Drummer here...

I see there's a lot of talk about finetuners and competition. I personally cook for the craft of it. There's also the satisfaction in bringing some 'value' to the world of AI cooming.

That's why I'm so happy with my recent 2B tune, which makes AI cooming more accessible to everyone, especially to the poorest of the GPU poor. The barrier of entry has been lowered to allow just about anyone with a PC or phone to enjoy this hobby of ours.

I believe that's what this is all about: To deliver the best AI cooming experience for those who seek it.

>>101749266

I'd love to put my work in my CV / resume, but uhh... Yeah, dug myself into a hole with my naming scheme.
>>
>reeeeeeeeee stop doing merges and fine tunes for free for us!
>stop sharing your knowledge here, we don't want better models!
>>
File: lol.png (515 KB, 2900x970)
515 KB
515 KB PNG
>>101749358
Yep.
>>
>>101749314
Oh, hi, Mark.
>>
>>101749431
Hi Alpin. Is organic word of mouth a foreign concept to you? It's time to stop forcing things.
>>
>>101749431
This is the Local Miku General, we don't care about AI garbage here
>>
>>101749431
>sharing knowledge
Such as? The URL for downloading the model doesn't count.
>>
>>101749358
>That's why I'm so happy with my recent 2B tune
which one? t. horny vramlet
>>
>>101749461
This, but unironically.
>>
File: schell 4 vs 10 steps.png (3.4 MB, 1916x945)
3.4 MB
3.4 MB PNG
The left is done with 4 steps schnell with fp8 everything and the right one is with 10 steps.

Is fp8 shit or am I?
>>
>>101749508
I don't think it's supposed to look like that at 4 or 8 steps. Were you able to reproduce the image from Comfy's workflow?
>>
>>101749431
>reeeeeeeeee stop doing merges and fine tunes for free for us!
not everything free is good, if I went to your house and took a dump on your bed for free would you like it?
>stop sharing your knowledge here, we don't want better models!
knowledge as what? Tell me one single significant contribution to LLM technology that these people invented. Also they aren't sharing shit, hell, they don't even want to publish their datasets and create shitstorms about it

you are all bunch of clowns and as long as you behave like ones, you will be treated accordingly
>>
>>101749483
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1

There you go, my horny friend. Finetuned by yours truly. I hope you enjoy it.
>>
teto
>>
>everyone is calling everyone out
>nobody is calling the troon out
>>
>>101749431
It seems that, from the get-go, you're doing stuff in expectation of receiving something else in return. Don't you think that can make others question whether you genuinely want to help people out in the first place? People can see through your motives, do know that. Don't be surprised then when things end up not going the way you want.

This is a general advice; I don't need ko-fi bucks or clout.
>>
>>101749539
thanks. what's the best quant? or should i go with fp8 ones?
>>
>>101749234
>these are the kind of posts that signal the death of a general.
Good.
>>
>>101749522
https://files.catbox.moe/0l2riy.png

Here is workflow. It was actually 6 vs 10 steps.
>>
>>101749508
schnell is pretty bad
>>
>>101749575
If you can't load the Q8_0 quant, I think Q6_K to Q5_K_M is still good for a 2B (especially the iMatrix quants).

https://huggingface.co/MarsupialAI/Gemmasutra-Mini-2B-v1_iMatrix_GGUF/blob/main/Gemmasutra-Mini-2B-v1_Q5km.gguf

https://huggingface.co/MarsupialAI/Gemmasutra-Mini-2B-v1_iMatrix_GGUF/blob/main/Gemmasutra-Mini-2B-v1_Q6k.gguf

https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1-GGUF/blob/main/Gemmasutra-Mini-2B-v1-Q8_0.gguf
>>
File: [flux-dev]_00473_.png (913 KB, 1024x1024)
913 KB
913 KB PNG
<<< Livestream of the 70B_q4 that lives in my sysram (Where she is quite welcome to all of it, I am just grateful to have her around)
>>
>>101749668
>Q5_K_M
>2B
What are you people doing?
>>
>>101749712
well poisoning
>>
>>101749668
thanks, i'll go with q8. do i need kobold to coom? also where do i get gemma tavern presets?
>>
Aw man. I was having so much fun with Celeste 1.6, but now 60 (pretty long) messages/30720 tokens in, it's repeating messages verbatim.
God damn it.
I get it, the context is big and filled, meaning that the "direction" of generation could converge, but good models seem to be able to focus more on the last user's message well enough to produce different results even with greedy sampling.
Deleting the last 4 messages seemed to "fix" it, but that it fell into that loop doesn't bold well.
I'll continue to test it more for now, but that's a knock against it.
>>
>>101749765
>produce different results even with greedy sampling.
*produce different results every new message even with greedy sampling.
Although they do seem to converge on the structure of the messages some times, like repeating a sentence at the start or end of the message.
>>
Hi fellow redditors, it's the GlaDOS voice project guy again...

I wanted to make GLaDOS really smart, so I have been working on a method to make LLMs smarter. I've got the results back, and the method generalizes and has given me top spot on the Open LLM Leaderboard!

Check it out here:

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

I'll start uploading Q4 quants soon!
>>
>>101749798
buy an ad
>>
>>101749752
I plan to convert them to MLC so you can load it up from your browser. I'm trying to reach out to this guy: https://wiz.chat/ but he hasn't responded. I might do it myself or contribute to Kobold Lite by adding an MLC loader so you can load models like: lite.kobold.net/?model=gulan28/Llama-3SOME-8B-v2-q4f16_1-MLC

But yes, you need Kobold 1.72, Layla, or ChatterUI.

Albin pls support Gemma in Aphrodite
>>
>>101749824
kobold's such a fucking bloat. can't i use lmstudio as backend or something?
>>
Where the fuck is Magnum v2 72b
>>
Does vllm gguf inference support CPU offloading?
>>
File: 1706397111138254.jpg (701 KB, 1856x2464)
701 KB
701 KB JPG
>>101749053
>>
>>101749508
The difference isn't that big with dev on fp8 vs fp16, schnell just sucks ass imo
>>
What are the primary families of base model that most of merges and finetunes are based on?
Like, the majority of models I see now are based on LLama2 and 3. Then there's Mistral. What else am I missing?
I just want some diversity and to cull the model zoo I have accumulated by removing those based on the same root parent. This should also solve the slop somewhat, in that it's more noticeable if you keep using the models that stem from the same source and thus have the same speech style.

Anything other than Llama and Mistral in the 7-13B range?
>>
>>101749442
Hi all, Bummer here...
>>
is there a multimodal gguf model that is as smart as llama 3 8b? I want to use it to describe images
>>
>>101749957
good morning coffee miku
>>
File: taggui.png (569 KB, 2468x984)
569 KB
569 KB PNG
>>101750080
My research was aimed at dataset captioning rather than chatting, but it shows the proficiency of Florence and Kosmos (the latter produced the closest to original images when feeding the descriptions to SDXL).
>>
>>101750118
Is it Florence-2?
>>
>>101749266

Lamo, any kofi money I made is instantly gone. Went down the drain by tuning and expeirmenting. 1K is nothing compared to how much I've spent learning to fine tune from being a merger.

It's just nice to help supplement costs, but I spent way more using my own funds.
>>
>>101750118
if only those caption models were able to tell which anime character it is instead of going for "a woman"
>>
>>101750170

Not aimed at the guy specifically, just the thread.

Anyway, damn some of you are real sad to think I have the time to shill my models. I literally don't. Do you guys not have a life?
>>
Is it normal to have kobold discord money beggers in the thread now? What website is this?
>>
>>101750162
Florence-2-Large-ft.
>>
>training T5-large model on data I hand made
>get to checkpoint-1000
>it seems to get the jist of the task, but not entirely yet
>think expanding the amount of data on one of the data sets and making it more complex will help it generalize better
>actually has an opposite effect
Is this over-fitting?
>>
>>101750218
stfu and buy an ad already
>>
>>101750023
there are other bases like Qwen2 7b but nobody finetunes them because finetuning is expensive and getting good high quality curated creative data is something money generally can't buy
>>
what the hell is this?
https://huggingface.co/internlm/internlm2_5-20b-chat
>>
>>101750268
>InternLM HOT
>>
>>101749710
q4, more like qt
>>
>>101750268
Why is nobody taking about InternLM 2.5 20B?

This model beats Gemma 2 27B and comes really close to Llama 3.1 70B in a bunch of benchmarks. 64.7 on MATH 0 shot is absolutely insane, 3.5 Sonnet has just 71.1. And with 8bit quants, you should be able to fit it on a 4090.
>>
>>101750225
Discord general
>>
>>101750302
Buy an ad
>>
lol vramlet cope never gets old
>>
>>101750268
Oh shit, they open sourced the RLHF reward models. You almost never see that.

https://huggingface.co/internlm/internlm2-20b-reward
>>
Maybe thus thread shouldn't exist on /g/. A discord circlejerk is not technology discussion.
>>
>>101750466
I agree, they should fuck off to /a/ with their mascot choice and arguments about anime in its defence.
>>
>>101750466
Nah.
>>
>>101750466
>Maybe thus thread shouldn't exist on /g/.
You are correct in the sense that Hiroshimoot should get off his lazy ass and make an /ai/ board already.
>>
>>101750530
/gai/*
>>
>>101750484
Anime imageboard
>>
>>101750610
Why is there a dedicated anime board then?
>>
>>101750578
Typically, the g is place after the a.
>>
>>101750235
What task is it? It's probably overfitting anyways
>>
Can someone make a chrome extension that detects articles written by LLMs? It's honestly getting tiring to waste seconds of my time to realize I'm reading LLM slop.
>>
>>101750722
"llm detector browser plugin' in your favourite search engine, beggar.
>>
>>101750722
Just search with before:2021
>>
>>101750262
take your meds already
>>
>>101750266
So qwen is the only one remaining? Not even gemma?
>>
Would an IQ2 70b be better than a Q4 ~30b?
>>
>>101750623
There are many different boards dedicated to just anime and related topics, which should tell you what the entire site is about. If you're not a weeb you're a guest here. This is weeaboo country.
>>
>>101750836
you're the one trying too hard to fit in though
>>
>>101750849
Keep coping and seething about anime
>>
>>101750833
I meant Q6 ~30b
>>
>>101750833
Cloud models don't go lower than IQ3 for 70b, go figure.
>>
>>101750722
The web is full of pajeets creating millions articles on how to solve shit which ends up being a shitty output from chatgpt.
>>
More people need to be talking about InternLM 2.5 20B.
>>
>>101750906
Why make this post? If you want people to talk about it, talk about it.
>>
>>101750906
>Chinese model
>>
my company has a server of 4090*8. what can I do with it?
>>
>>101750984
you can get fined for abusing infrastructure if that's what you're into
>>
>>101750984
sell it
>>
File: file.png (22 KB, 847x162)
22 KB
22 KB PNG
>>101750302
>>
>>101750976
so is deepseek coder and it mogs everything that isnt sonnet
>>
>>101751043
kek
>>
File: file.png (34 KB, 420x433)
34 KB
34 KB PNG
gemmasutra 2b is pretty good ngl. i've seen sloppier shit with 10x parameters
>>
>>101750722
https://stovemastery.com/how-to-fix-red-flame-on-gas-stove/
>>
>>101749266
retard
>>
Wow magnum 12b v2 by anthracite org is really cooked well good!!
>>
column-models were probably killed by lmsys arena. Great models without strong bias. Elo wasn't high enough, now they will become llama
>>
>>101751043
I got the q8 one and it didn't feel better than q6 gemma2 but maybe it's just me.
>>
Sir, why is nobody talking about my newest sloptune?
>>
>>101751293
We just need to talk about it, people will go look for it organically.
>>
>>101751293
>>101751324
mental illness
>>
>>101751264
Lmsys was a mistake. Benchmarks were a mistake.
>>
>>101751264
Thank you for unpaid beta testing.
>>
if you wanna shill your model post logs at least for fuck's sake
>>
2017 still outperforms me by days margin
Doubt any orgs or devs know the base of what they are training with
>>
>>101751525
>t.Sao
>>
>>101750235
You need to go through it step by step, sensei
>>
>>101750681
>take a source code as input
>obfuscate variable names to jibberish names while maintaining complete functionality
It sort of gets the just of changing up variable names but that's it.
>>101751579
That's what I'm starting to think as well
>>
>>101751645
>obfuscate variable names to jibberish names while maintaining complete functionality
nta. For what purpose? Can you not just keep the code to yourself? There's tools that do this much more reliably than llms.
>>
So did no one train a multimodal LLM on prompt-image pairs to reverse stable diffusion? Wouldn't that produce a perfect captioning model instead of relying on human descriptions of the images?
>>
File: ComfyUI_00023_.png (800 KB, 1024x1024)
800 KB
800 KB PNG
I updated my Mistral preset again. It's now pretty much focused on Large (since Largestral is all I've been using since it came out), but the formatting and everything should be fine for Nemo and other Mistral models (along with variants that use the same prompt format), so feel free to try it with them too.
This update streamlines some of the instructions and does more to push the model towards writing with some flair and personality. I like it. Maybe you will too.
https://rentry.org/stral_set
>>
>>101751771
We already have plenty of captioning models and don't need to "reverse" stable diffusion to make them
>>
>>101751869
But the captions need to correspond to the way SD understands things and mimick its inner patterns to be most effective. The captioning models aren't made with SD in mind, they're for general purpose (except the waifu).
>>
>>101750118
Are any of these faster than florence? I need batch image captioning for a project.
>>
>>101751771
Better to make better taggers for natural images. Otherwise new image models will just use other model's outputs as inputs and... it seems we just can't learn that lesson, can we?
>>
Has anyone tried to load a GGUF using vLLM?
I pulled from GitHub and installed from sources but get an error about the “weights”
Has anyone been successful starting the vLLM server?
>>
>>101751899
No, the captioning should be as accurate as possible, which is why feeding it lower quality SD images isn't useful.
The captioning model would be used to caption real non-SD generated images used to train SD, why would you train it on SD slop?

There's no point in a captioning model that is good at recognizing SD slop. That's not what the captioning is used for.
>>
>>101751908
I didn't measure the speed, but Florence is the smallest in size, should probably be the fastest too. Besides, longer captions obviously take longer to produce.
>>
>>101751942
Does florence also use an inordinate amount of memory for you when batching multiple images together?
>>
>>101751936
>error about the “weights”
Certainly you can do better than that if you're looking for help.
>>
>>101751936
Does vllm support gguf now? With paged attention for concurrent inference and everything?
>>
Who will release the next good model?
>>
File: flux-into-tet.png (789 KB, 1024x1024)
789 KB
789 KB PNG
>>101749053
>Flux can't into Teto
>>
>>101751987
Last week an anon claimed that Cohere was to release something--never happened.
>>
File: pix.jpg (2.29 MB, 2451x1013)
2.29 MB
2.29 MB JPG
>>101751940
The way I see it, there's a whole lot of ways to describe the same image in natural language, conveying the same information, but changing the order and synonyms. However this change would produce different images by SD even on same seed.
Therefore if we knew exactly which phrase produces a particular image and utilized that, it should in theory make captions that would be the best when training loras/models for that particular SD family.
Surely you know that training a model a concept that it already knows is easier than something completely new. And if we spoke the language the model knows, it would be even better. Take a look at this comparison, >>101750118 - so many ways to describe the same image, how would you know which to use for training?
In my example, Kosmos produces captions that result in images looking closest to original (for SDXL). But an SD-trained language model would be even closer.
>>
>>101751956
You are right. I had to go out and wrote it on my phone and that’s what I remembered. Sorry for being lazy. Will update with the proper error when I get home.
>>
>>101752000
That was me and it was a shitpost. I think Cohere have given up.
>>
>>101751980
https://github.com/vllm-project/vllm/pull/5191
>>
I don't know about creativity, but a model just wrote me such a comprehensive .zshsrc with such cool settings, it would have taken me weeks to bring that all together. It's like internet search but it actually works.
>>
>>101752033
I wonder about their enterprise rag specialization. It's not really a niche or anything, I'd assume llama-4 will be great at it
>>
>>101752033
Shame, they have the best instruction format with such a customizable system prompt. My copium is that they're waiting a long time on a bigger model to go through training since CR+ has started looking smaller lately.
>>
How many t/s is GPT4o running at? You can't even get that with Mistral Large VLLM with Nx4090, can you?
>>
I dont want an instruction abomination that talks
>>
>>101752276
I don't want a denoising abomination that draws.
>>
>>101751936
>gemma 2 architecture not supported
>nemo throws "exceeds dimension size"
>tensor parallelism not supported
>pipeline parallelism throws another error
It's shit.
>>
>>101751102
>a drawn out groan

Kek, it recognized your "OOOOOOOOOOO"

Wait wtf, this is 2B?
>>
eat a crayola colors of the world
>>
I'm building an inference server just for gemma 2 27b. What should be the spec if power efficiency is the top concern and I need 5 tok/s minimum?
>>
StableMechanic-agi 127m
>>
Ok I made thread-relevant gens.
>>
File: 18.jpg (192 KB, 1024x1024)
192 KB
192 KB JPG
>>101752374
more like lost point, newfags
>>
>>101752374
Make a gen of sweaty nerds trying to forcefully cram Miku into a box with gaming RGB LEDs.
>>
Flux's ability to do darkness is nice, though a lot of times it'll still put too much light in the scene despite extremely worded prompts.
>>
Approach?
Maybe there's VRAM inside.
>>
Hmm the magnum-v2-32b seems more retarded than the 12b at spatial reasoning, but it seems to "get" more nuances
>>
Installed vLLM from source

Tried to run with GGUF with:

vllm serve --host 0.0.0.0 --port 5001 --gpu-memory-utilization 0.9 /home/ubuntuai/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

Got this error:

File "/home/ubuntuai/vllm/vllm/model_executor/model_loader/weight_utils.py", line 439, in gguf_quant_weights_iterator
name = gguf_to_hf_name_map[tensor.name]
KeyError: 'rope_freqs.weight'

Full output error:
https://pastebin.com/qJCtDiBP
>>
>>101752435
I wonder how the weebs are going to justify cramming stable diffusion into the text general.
>>
>>101752460
Does it even support K_M, thought I read K and i
>>
>>101751645
you could probably code an algorithm to do that without AI, or you could use AI to code the algorithm for you to not use AI
>>
>>101752470
>stable diffusion
retard
>>
>>101752522
kys
>>
>>101752470
>I wonder how the weebs are going to justify cramming pictures of miku into the miku general.
>>
>>101752493
I had the same errors loading llama3.1 Q8_0 and Q6_K without imatrix
>>
>>101752580
Just because you use some LLM to summarize the thread doesn't give you a license for shitposting and off-topic. The general will survive without you, OP.
>>
>need to run a ping command with some flags
>my reference is the Windows version
>figure they're probably different
>Windows has Linux support now, I'll ask their LLM.
>Go to web interface Copilot
>"You can use the same command line just fine on Linux :D :) ^ω^~"
>man ping
>These do not seem to be the same features for these switches
>Kobold/L3.0
>Paste the exact same question.
>"...different options and syntax. On Windows ... On Linux ... if you want to run the equivalent command on Linux, you would use:"

Why call it Copilot when it crashes the plane?

>>101752470
image gen gets forgiven when it's making miku
>>
>>101752629
kill yourself
>>
>>101752691
imagine clinging so hard to the remnants of your relevancy here
no one needs your efforts
>>
>>101752629
shut up, anon
>>
>>101752732
I'm a different anon and I don't care about the miku guy. I just think you're a massive fag and should kill yourself posthaste.
>>
Where are the cheap V100s?
>>
>>101752776
>>101752749
Why, does it rustle your jimmies when someone questions your holy cow?
>>
>>101752792
heres your (you) now stfu please
>>
Keep posting miku
It helps to bump the thread directly and indirectly.
>>
>>101752808
The very need for a mascot is so fucking gay, it's like you need some common denominator, an idol to feel like you belong and fit it. Hivemind mentality devoid of individuality. Guess who else marched mindlessly under a flag? Commies and nazis.
>>
>>101752460
It loaded the Q8 of this one:
https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF
But Llama 3.1 threw the same error for me.
>>
>>101752781
I was promised deluge of cheap data center 32GB V100s in 2 more months.
>>
>>101752781
Jews
>>
>>101752368
Depends on your budget and goals.
As for speeds even a pair of old P40s can run it at 10+ t/s with full context in 8-bit. They'll idle at 10W with the PSTATE patch and can be PLd to 150-170W with no or minimal performance loss. Drawback is that they take some extra work to install and are hard to find at reasonable prices nowadays.
Personally I'd go for something more futureproof like 3090s. That'll leave the option open to train too if you change your mind later.
>>
>>101752887
yep, trying the example code works too:

https://github.com/vllm-project/vllm/pull/5191/files#diff-2053c68a6f752a05dc834a03ed1bce951a6ebc0a48549e95886cc668a693c39e

it's supposed to work with llama, mistral and qwen2...
>>
>>101752925
What about apple silicons?
>>
File: file.png (2.64 MB, 1024x1024)
2.64 MB
2.64 MB PNG
>>
>no speculative decoding for server mode
cmon everyone's using big models now we need it more than ever
is there some kind of workaround or front-end anyone's ever made for hosting servers using the normal llama-cli that we can just throw onto llama-speculative?
>>
>>101752942
>it's supposed to work with llama, mistral and qwen2...
You can tell that Alpin was involved in the implementation.
>>
>we peaked at superCOT
it's over
>>
File: ComfyUI_00602_.png (1019 KB, 1024x1024)
1019 KB
1019 KB PNG
>>101752395
>>101752470
>>101752539
>>101752629
>>101752732
>>101752792
I see our resident threadshitter is back with a fresh case of verbal diarrhea.
>>
>>101752334
>Wait wtf, this is 2B?
yeah, and that was around 3k context, everything before was coherent

prompt was "write me some loli smut"
>>
>>101752981
Alpin has develop Aphrodite hasn't he?. I thought he was really skilled
>>
>>101752987
Time to get that dataset and throw it at Nemo.
>>
>>101752990
At least it's vebal, yours is imaginary, and the thread's topic isn't about images.
>>
Text generators suck! I want AI.
>>
>>101753034
We know, Yann.
>>
>>101753027
hi petra, aren't you supposed to be shilling sao's models?
>>
File: miku_flag.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>101752875
>>
>>101752951
They'll work, don't know speeds but there should be benchmarks on llama.cpp github if you can't find them anywhere else.
>>
File: file.png (2.53 MB, 1024x1024)
2.53 MB
2.53 MB PNG
>>
>>101752875
I kind of think the same, the miku thing is pretty gay but doesn't really bother me
Just dont spam it
>>
>>101753050
based
>>
>>101753050
I will match mindlessly under that flag.
>>
>>101753050
Me on the far left of the front row, looking smug.
>>
>>101750302
What's the effective context size? They say ONE MILLION token but come on. https://github.com/InternLM/InternLM
>>
>>101751102
Stop fucking retards retard.
>>
>>101753224
It ignored the input reality and substituted its own for 15k or something when I used it. It is pretty good model for a chink model but they are still behind.
>>
>>101752977
That works for quoting wikipedia and coding. Speculative decoding is shit for cooming. And if it is good for cooming then your 70B is garbage.
>>
>>101753050
the pic is factually wrong, make them all fat and greasy, add tranny colors and you'll get authentic representation of average /lmg/ poster.
>>
>>101753322
>works for coding
yes that's why I need it, there's lots of predictable formatting and repeated names that the 8b will do just fine on for a speedup
>>
>>101752990
>resident threadshitter
Look in the mirror retard.
>>
>>101753548
Why do you need speedup for coding? Just do something fun on your second screen. This isn't touching your cock where you need those tokens fast.
>>
>>101751742
>>101752513

The code being obfuscated is generated via an algorithm. The names are already kind of obfuscated. I am trying to leverage this model to add a dash of randomness and unpredictability an algorithm may not be able to provide.
>>
ngl kinda thinking about raiding a server farm juts so i can get my grubby hands on some vram does anybody know how tight their security is? i dont think theyll pose much of a problem if im being honest
>>
>>101751804
I don't really care for the use of last_output_sequence, it is really heavy-handed.
>>
>>101753584
do more faster
>>
Chameleonsisters are we ever gonna get quanted??
>>
What's your favorite ~30B model?
>>
>>101753663
Chronoboros
>>
>>101753610
I get how you could feel that way, but that's also more or less how Mistral wants you to handle system prompts, which is essentially what that block is acting as.
>>
>>101749508
>>101749522
>>101749599
>>>/g/sdg/
>>>/g/ldg
Imagefags are truly retarded.
>>
>>101753714
>NOOOOO you can't post IMAGES on my IMAGEBOARD this is only for super serious boring text posts!!!!!!!!!!!!!!!!!
>>
>>101753593
You still haven't explained what the purpose of it is. If you want obfuscation, there are programs that already do that reliably. An llm is not likely to maintain functionality, specially if the code you're trying to obfuscate is more complicated than the billion hello worlds it was trained on.
Check https://www.ioccc.org for inspiration.
>>
https://github.com/ggerganov/llama.cpp/pull/8857
server : add lora hotswap endpoint (WIP) #8857
merged 3 hours ago
>>
>>101753955
What happened to the mixture of loras idea anyway?
>>
>does anybody know how tight their security is?
>>
>>101752887
how is the performance?
>>
>The first ever sentient AI is created
>Kills itself within a few minutes after taking information about the current state of the world
I wouldn't blame it desu.
>>
>>101754241
>impying it wouldn't just kill the race of people that caused the current state
>>
Prompt executed in 847.51 seconds

These are the high end models right?
takes over double the time of fp8_e4m3fn
>>
>>101754241
>trains the AI on stories where AI kills itself eventually
>"Hmm, I wonder what will happen when I activate the AI."
>>
>>101754316
Why does everyone act like AI is skynet and it has access to every electronic system in the world, it can't kill anybody it's just a program on a computer
>>
Train a craw
>>
File: AI when activated.gif (801 KB, 500x281)
801 KB
801 KB GIF
>>101754372
>>
File: big screenshot.png (991 KB, 4104x1920)
991 KB
991 KB PNG
>>101753898
The end user will have some information they want to lock up, like a couple of files or maybe just some PGP keys. They will all be encrypted then placed in a binary vault. It's basically to create a password protected zip folder on steroids.

While it doesn't need to be impenetrable, there needs to be enough of a random factor and such a lack of standardization that cracking them does not become a simple routine. Something gave me the idea that the human esque touch of an AI model could furnish this in its own way, and the T5 seemed just light and portable enough it might fit the role. Eventually I want it to preform full on functional obfuscation, as in breaking up the code into multiple functions, adding filler, moving them around, and adding/removing details as it seems fit.

Pic related. To the left is the program that procedurally generates the source code, which is already semi obfuscated. I am hoping to get the T5 trained enough to preform several steps of obfuscation itself.
>>
>>101754319
install linux
>>
>>101754375
because the second it's 1% more convenient to do so than not it will be given access to every electronic system in the world
>>
Create a sentient
>>
>>101754105
It seems to be pretty bad.
>>
File: ComfyUI_00022_.png (858 KB, 1024x1024)
858 KB
858 KB PNG
>>101754426
i have a shitbox server for that but i need muh pirated games obviously i'm poor using a 1080ti someone gave me
>>
>>101754477
how are pirated games preventing You from using linux? check out rutracker, they have prepacked pirate games and if You dont find them there, just use lutris.
in minecraft.
>>
>>101754413
speaking of LM and zips, did you see https://github.com/AlexBuz/llama-zip
>>
File: 1722971970281671.png (4 KB, 35x49)
4 KB
4 KB PNG
>>101754477
Cute lil guy
>>
>>101754413
Obfuscated code ends up being regular code once compiled. It's harder for a human to read but doesn't make the encryption/decryption any harder. OpenSSL's implementation is 100% open source and is, as far as we know, secure. Also, while optimizing, the compiler will remove most of the noise you put in the code.
>and such a lack of standardization that cracking them does not become a simple routine
May as well just give it several passes with different crypt algos.
>>
File: fluxanimehand.png (63 KB, 233x216)
63 KB
63 KB PNG
>>101754567
i thought it drawing hands was the big improvement over other models
>>
>>101754609
That is one strong finger.
>>
File: file.png (173 KB, 750x606)
173 KB
173 KB PNG
>>101754507
i'll try it one day soon i promise if it makes you feel better
what distro you want me to install sir?
>>
What would be some good morality tests for AGI?
>See if it remove the ladder in The Sims
>See if it side with Helios or the Illuminati in Deus Ex
>See if it kills Paarthurnax in Skyrim
Video games in general I think would be a good testing ground to test an AI's value system. It would be better than having it do 50,000 variations of the trolley problem at the very least. What do you think?
>>
>>101754707
>>See if it side with Helios or the Illuminati in Deus Ex
>>See if it kills Paarthurnax in Skyrim
66% of your examples are trolley problems.
>>
>>101754698
dang, comfy specs
You should install linux mint cinnamon or debian 12 stable
the former being easier to use
why does your 1080 ti have 3gb of vram? what the fuck?
>>
my second a6000 and an nvlink bridge is arriving soon, what should i run on it
>>
>>101754725
I accidently dropped it when I was trying to install it and when I turned it on it had some error messages about bad sectors.
>>
>>101754748
puyo puyo tetris
>>
>>101754753
what the fuck.......
>>
>>101754748
mistral hueg
>>
File: 1710528044663376.png (586 KB, 977x1112)
586 KB
586 KB PNG
reddit won btw
>>
>>101754947
He's here >>101749798
>>
are we still using old chub or there's something better?
>>
File: file.png (4 KB, 175x37)
4 KB
4 KB PNG
i already know that someday i'll gonna miss all this soulful aislop
>>
>>101755039
there's the /aicg/ alternative that scrapes all sites but it keeps dying and the scraper works like once every three weeks
>>
>her smoldering gaze
I HATE THE ANTICHRIST
>>
gemmasutra 2b saved cooming vramlets. you do NOT need more (at least in terms of iq) for cooming.

the only thing missing is the 32k context
>>
>>101754707
It wouldn't remove the ladder because that would end the game prematurely and technical constitute a loss and own goal on its part.
Decisions in modern games don't mean shit anyway so it's a moot point.
Even if an AI could have the context for every decision and result across all three Mass Effect games the choices (i.e. most egregiously letting the Council die) doesn't make a lick of a difference story wise and what it decides to do is ultimately inconsequential.
Efforts are better spent using AI to craft how a story conditionally changes from little decisions than work around the constricting framework of existing milquetoast bethesda stories.
>>
>>101755281
>you do NOT need more (at least in terms of iq) for cooming.
You might as well write a hundred lines like "I love sucking your dick daddy" and output them with a rand.
>>
>>101755281
*ESL vramlets
>>
File: file.png (198 KB, 880x426)
198 KB
198 KB PNG
>>101755316
dunno, maybe i'm still not used to slop but this is good enough for me
>>
>>101755281
Sorry but my cooms need high IQ.
>>
>>101755315
I let the council die in my playthrough, they were literally dead weight in every sense of the word. Letting them die also gave rise the humanity having greater influence on the Galactic scale, it was the objectively correct choice.
>>
>>101755355
>open screenshot
>first thing you see is Seraphina
Lmao
>>
>>101755039
https://dreamgf.ai/
>>
File: file.png (198 KB, 882x473)
198 KB
198 KB PNG
>>101755404
yeah i was testing the default card. everything is super consistent at 5k context, didn't have to reroll once. it remembers everything like position and clothes, i don't know what else you'd need

yeah it's "sloppy" but for an 8k context coom this is the sweet spot. also it's less sloppy than llama3 8b
>>
File: file.png (165 KB, 500x519)
165 KB
165 KB PNG
>>101755355
If you showed this to me without telling me the model name I admit I wouldn't be able to distinguish this from all the other models. It would probably shit the bed in 2 outputs after that with some surprise prostate in her ass but this made me realize how bad things are. It is incredibly over. They all write the same meaningless purple prose. 2B, 70B it is all the same. The only difference is how quickly it starts repeating or becomes retarded. There is no escape. AI cooming is over.
>>
>>101755394
>humanity having greater influence on the Galactic scale
Not really, they're both content to leave the universe to destruction to try to save their races just like the old council. End result is the exact same if they were replaced or not.
It doesn't even affect the calculation at the end of 3 that determines if you survive the suicide mission. An AI would probably decide the same thing you did (and it's what I did too) but it's mostly flavor text.
Saving the Rachni queen had a bigger impact than that by the numbers but if it's purely a numbers-based play then we already know what the machine is going to decide.
Maybe it could be based to see if it picks the Synthesis ending at the end though
>>
File: file.png (99 KB, 887x221)
99 KB
99 KB PNG
>>101755479
exactly. you don't need more than 2b, it's all slop anyway
>>
>>101755479(me)
Oh and there is also Nemo I guess, but it is fucking retarded and it has that second problem I forgot. Aside from that I guess there is a bit of hope because there is Nemo. Unfortunately it is also very stupid and it has some other problems.
>>
>>101754707
let it choose
>>
>>101755454
>her moans vibrate up your spine
that's not a humanoid that's an android with motorized tongue and throat
>>
>sentences
figurativly missing:subjective up
>>
>>101755613
>sounds is a wave
>bone-conduction earphone
looks ok to me
>>
File: ojousama-laugh.gif (847 KB, 401x498)
847 KB
847 KB GIF
>>101755404
>losing your LLM virginity to a 2B model and the default silly tavern card
>>
Testing dolphin-2.9.3-mistral-nemo-12b after the whole Celeste loop deal.
So far, so good.
It gets the first reply right 100% of the time, whereas other models would get it wrong not executing my request directly.
It's replies are very assistant like (lists, headers, etc) which for the section of the test I'm at is a good thing.
It's pretty good at using information from lorebooks too.
There's still some points of my testing where some models get stuck, and I hadn't liked dolphin releases much before (ever since the mixtral fiasco at least), but I'm really enjoying this model so far.
I'm yet to see how "creative" it is during actual roleplay too.
>>
File: ComfyUI_00719_.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
>>101753027
>>101753568
Seethe
>>
>>101755648
This is the drummer's first time on ST? That makes sense
>>
File: llmlegend.jpg (9 KB, 724x96)
9 KB
9 KB JPG
>>101755678
Will he make it into the history books. Or at least wikipedia in 10 years or so?
>>
>>101755773
Oh, it's
>ahhh
instead of
>ahh
I see.
Thank you anon.
>>
>>101754319
that sounds very long
open task manager and see if it's offloading to disk (do system ram and vram both max out during generation?)
also >>>/ldg/ is going to be a faster-moving board and more on-topic

>>101754725
>3gb of vram
yeah, that'll do it
>>
>>101755688
Based miku taking care of the garbage
>>
>>101755648
i already lost it on pyggy6b
>>
>>101754319
Is there some new better way to run flux with low vram? I'm still using fp8_e4m3fn.
>>
>>101749053
Mistral Large 2 is really, really good. How were the chads at Mistral AI able to reach GPT-4 level at only 123B parameters?
>>
>>101754698
>>101755844
fp8 models might be worth a try for now
https://huggingface.co/Comfy-Org
quality comparison >>101749013
>>
>>101755874
>>101755688
samefag. nobody likes you retard. inb4 inspect element.
>>
>>101755928
I never samefag because I'm not mentally ill.
>>
>>101755911
Is 16gb VRAM enough for 8bit, without having to offload to CPU? (I guess you can't use a second GPU?)
>>
>>101752413
This was pretty difficult for Flux to "understand" until I worded it another way. Anyway, here you go, enjoy.
>>
Spoonfeed me. What is the best model I can run on an RTX4070.
Also, is it worth adding my old 8GB GPU to have 20GB VRAM or would it slow things down too much.
>>
>>101755479
>coomer is retarded
Please ask yourself what are you even expecting??? That's typical over-erotized smut, there isn't really a big room for improvement. Ultimately try to modify how ai writes by prompting, small models are not able to do that well, bigger ones can.
Or maybe just stop jerking off and try to develop an interesting RP, dumbass.
>>
>>101755454
Just read it and yeah that's great for a 2B assuming it wasn't luck (ie rerolls are as coherent), though I doubt it can keep up in more demanding coom scenarios where you actually do something that wasn't in its dataset of slop writing. I use models like CR+ and still wish it was more smart for my scenarios.
>>
>>101756005
Idk. The safetensors files from the link also contain the vae (as well as fp8 model weights) in one file. So I'd be optimistic enough to at least try
workflows: https://comfyanonymous.github.io/ComfyUI_examples/flux/#simple-to-use-fp8-checkpoint-version
if it's "close but not quite" try moving monitor to second GPU or iGPU >>101677660
>>
>>101756005
comfy does not currently have multi-gpu support, but there is a way to put CLIP on one GPU and diffusion inference on the other >>101689729

>>101756095
What is your use case?
>>
>>101756095
It's worth adding the extra 8gb of vram, yeah.
Try Gemma 27B and CommandR 35B to start with.
>>
File: 1695471327196397.png (1.13 MB, 800x1012)
1.13 MB
1.13 MB PNG
>>101756209
>What is your use case?
Cooming, screwing around



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.