[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1707201855037480.png (2.05 MB, 1792x1024)
2.05 MB
2.05 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101636887 & >>101628398

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101636887

--Paper: Meta-Rewarding language models paper sparks discussion on superintelligence and censorship: >>101638118 >>101638197 >>101640693
--Speculative token decoding discussion, benefits and challenges: >>101637618 >>101637764 >>101637766 >>101637699 >>101637785 >>101637855 >>101637909 >>101638033 >>101638191 >>101638004 >>101638070 >>101638131 >>101638151 >>101638181 >>101638262 >>101638304 >>101640593 >>101640704 >>101640769 >>101641008 >>101641053 >>101641163
--Quantized KV cache and RPC issues with CUDA: >>101637007 >>101637166 >>101637788 >>101638420 >>101638565
--Optimizing MoE models with GPU offloading: >>101637198 >>101637597 >>101637655
--Nemo 12B model performance with 16gb vram and context size limitations: >>101641028 >>101641077 >>101641188 >>101641235 >>101641318 >>101641411 >>101641433 >>101641249
--Local alternatives to character.ai and model benchmarking discussion: >>101640281 >>101640417 >>101640465 >>101640502 >>101640537 >>101640592 >>101640468
--LLMTree and ST allow branching conversations: >>101642024 >>101642041
--Anon rants about Largestral's repetitive writing style, others discuss workarounds and limitations of emulating authors' styles: >>101639479 >>101639536 >>101639624 >>101639682 >>101640309 >>101640761 >>101640789 >>101640839 >>101640913 >>101640806
--SDXL Lightning model for text2img2vid and potential use cases: >>101638766 >>101638823 >>101638875
--Hospitals using AI tools and data security concerns: >>101637309 >>101637432 >>101637622 >>101637627 >>101638445
--Chatbot arena update: >>101642285
--Looking for monitoring solution for OpenAI API: >>101637457 >>101637540 >>101637626 >>101641833
--Miku (free space): >>101637264 >>101637453 >>101637772 >>101637940 >>101638162 >>101638280 >>101640481 >>101641758 >>101642120 >>101642736 >>101642752 >>101642754 >>101642831 >>101642918 >>101643012 >>101643032 >>101643065

►Recent Highlight Posts from the Previous Thread: >>101636892
>>
>>101643089
>6 fingers on the left hand
its over
>>
Could somebody share some cards with complex scenarios, please?
>>
>>101643089
>not looking at the camera
>6 fingers
>LNG
>clover of 5 leafs
mikufags have low standards
>>
>>101643162
Yeah, >>101642831 was better idk why OP didn't pick it
>>
>>101643160
Pokemon Breeding Wall and Tokiko come to mind.
Also, Dark and Darker.
There was also that one fallout rpg.
>>
Oh, are we allowed to post dalle miku gens again
I haven't made one in a long time though, here's an oldie I never posted, from when we were doing this prompt.
>>
>>101643176
>Oh, are we allowed to post dalle miku gens again
Yes, but try to refrain from vivid style slop and do more creative gens
>>
>>101643176
you were never disallowed to do so
>>
File: 582.jpg (154 KB, 1600x1064)
154 KB
154 KB JPG
What current machine AI handles a solid conversational flow for RP?

Every RP i've tried so far has the same sterile robotic feel to it that I dislike (waterboard you with questions that while, seems kinda normal, the frequency just instantly reminds you you're talking to a bot)

Really pissing me off, been trying Gemma 27B trying to fine tune it and it's super bad for this. Command R is a little better, Mistral Nemo is also pretty bad.

For reference, I have a 4090, so not running any x3 3090 setups for the actual nutty models
>>
>>101643261
genuinely sounds like a prompt issue if its the same across different models
>>
>>101643176
Pet the Miku
>>
>>101643183
does this count
>>
>>101643293
uhhhhh
>>
File: file.png (349 KB, 1170x1560)
349 KB
349 KB PNG
>>101643094
>No (you)s
>>
>>101643269
Is there some sort of guide just to cover the basics? I always make the same basic ass card because there's no point trying any complex ones if even a basic one that's one on one RP gives me such weird results.
>>
Where do people set their DRY settings? Is it bad that I just set them in the Silly T itself? Whenever I see settings posted, the UI looks way different to ST making me think there's something on kobold I should configure too
>>
>Can remove 20-50% of layers without noticeable drop in quality
https://arxiv.org/abs/2403.17887
I saw this a while ago and I'm surprised there hasn't been more on it. Did it get deboonked? Has anyone tried this with L405 or Misty Large yet?
>>
I'm doing a little bit of experimentation where I run base nemo on koboldcpp and opus on ST, and paste opus' reply to kobold, then have nemo respond back on my behalf.

I have to say that nemo complements opus well, and it's somehow slightly better than opus at ERP.

Too bad it's not that good on its own though.
>>
>>101643525
Go play with llama3 42B or whatever the number was. Let us know.
>>
>>101643389
>i think your first focus should be making an artificial hippocampus.
Stuff like that doesn't help or work, what works is using more compute
>>
>>101643400
What you're talking about is probably Kobold Lite. It is just a frontend, just like Silly tavern
>>
Did anyone try vLLM's CPU off-loading? I tried it for a bit with Mistral Large AWQ on 2x3090. The prompt processing was somewhat decent at, I think, 200-400 T/s, but generation speed was like 0.5 T/s...
>>
>>101643525
I think it's likelier that there is in fact a drop in quality but current measurements methods aren't very good at measuring them. Then again I didn't read the paper, so I wouldn't know how much they've done to prove their claims.
>>
Are the good parts of nemo there because it is overfitted to hell? And that is kinda how frankenmerges worked too? The model gets input and instead of actually processing it and trying to generalize, it just pulls out the closest training example therefore it is schizo but also sounds pretty good and more soulful.
>>
File: 123 (2).png (492 KB, 960x779)
492 KB
492 KB PNG
>>101643585
here's a comparison btw. Top is nemo and bottom is opus.

when it comes to ERP, I like nemo's raw style better, if it weren't so retarded i'd prefer it to opus
>>
Are we able to load Mistral large with reasonable context on 48 vram? I tried a 3.0bpw but wouldn't even load at the usual 32764 context length. Had to use 8168 context.
>>
For those without a 'standard' setup, how much did you spend on it?
>>
>>101643770
>boobs
>boobs
jesus christ, mistral needs to scrape https://greensdictofslang.com asap

otherwise looks bretty good
>>
What's the most natural sounding chatbot model currently actually runnable?

Because Nemo fucking sucks
>>
>>101643929

https://huggingface.co/ykilcher/gpt-4chan
>>
>>101643929
sorry to hear about your shit taste anon
>>
>>101643963
>>
File: 1697582032922759.png (50 KB, 1190x261)
50 KB
50 KB PNG
>>101643946
sir
>>
>>101644008
you can find the torrent
>>
Does anyone have the torrent to llama 3.1 8b?
>>
Post the best response you ever got from a model
>>
>>101644074
no
>>
>>101644008
a tool that does nothing but generate speech, and the society of "free speech" decided to ban it. what a joke
>>
>>101644101
HF had to specifically create this disclaimer for GPT-4chan btw, they literally didn't have it beforehand. I think it's still the only model with that disclaimer on HF>
>>
>>101643963
Stheno >>>>>>>> Nemo for RP coomery
>>
>>101644117
Wrong. https://huggingface.co/nothingiisreal/Celeste-12B-V1.6
>>
>>101644117
stheno is retarded dude
man I'm sick of retards who judge a model entirely on 'sovl' and don't care whether it's fucking moronic and can't keep a scene straight in its head
>>
>>101644132
Not that anon, but I had great success with stheno on some seemingly complicated RPG scenarios.
I did get the whole fucked anatomy from time to time however.
Shit like sucking your dick while you fuck her, that kind of thing.
The thing about Stheno to me is that it's very one note, and horny.
It's style is very baked in, and it deviates very little from even with clever prompting.
>>
File: ewewewqeweq.jpg (31 KB, 426x341)
31 KB
31 KB JPG
If I wanna replicate the conversational flow that AIs have on websites like Character AI, what would be the best model?

So far i've tried Nemo (decent but tends to go pretty schizo quickly no matter the temps), Stheno is super good but I feel like there's probably something better. I have a 3090 + 32GB RAM, so obviously shit like Command R+ or Mistral Large i've not even tried
>>
>>101644093
Okay babe
>>
>>101644132
>retarded
Nigger, Nemo can barely coherently handle a 1 on 1 date scene without constantly trying to take initiative of the entire scene, asking me some schizo shit like "how's your dad" or some other random garbage.

What fucking coom scenes do people who shill this new Nemo shite even partake in? Because it's utter ass
>>
Stheno is amazing. I will stick with Llama 3.0.
>>
>>101644174
I agree, it doesn't touch Sao's finetunes at all. That dude alone is the only person advancing the whole ecosystem.
>>
>>101644174
nta but that seems like a normal question to ask someone you're having a conversation with
>>
>>101644174
>some other random garbage.
>not appreciating engaging sovl
poor taste
>>
>>101644195
>>101644199
>first date
>date doesn't even know if i'm fucking batman or not, with no parents
>"How's your dad"
Kys, Nemo will always be dogshit that's literally only shilled because poorfags with 16GB dogshit GPUs can run it decently well. That's it
>>
>>101644153
There's nothing better than Stheno. Come back next year.
>>
>>101644174
there's nothing schizo about someone asking that question during a conversation you autistic retard
>>
101644224
obvious bait, (You) denied
>>
>>101644220
>anon has never been on a date
>>
>>101644225
That's cool, but Stheno doesn't ask that question.
>>
>>101644220
>only shilled because poorfags with 16GB dogshit GPUs can run it decently well
Anon. Stheno is smaller and easier to run than Nemo. What you just said applies to Stheno more than it does Nemo. Are you drunk?
>>
>>101644220
My dude, I've had that asked of me in first dates as many times as not. People usually don't assume my village has been raided by orks.
>>
>>101644245
yea but Nemo is the new flavor of the month
>>
>>101644245
I run Stheno at FP32. It's something that you do when you aren't poor.
>>
>>101644225
Yeah, this is making me realize that a lot of the poor judgement about which models are good in this thread comes down to autistic men literally not understanding how normal human beings talk
so the model says something completely normal, and they think it's "schizo"
>>
>>101644258
>how's your dad
You've never been on a date if you're unironically telling me, without any prior mention of your family, that a girl just asks you "how's your dad".

Not unless you know the girl. Like just think of the logic behind it, what would prompt a girl to ask you how your dad is when she doesn't even know if you're close with your dad or anything about your family situation?

Also don't pretend you've ever even held more than 3 seconds of eye contact with a female in your life.
>>
>>101644269
I run Nemo at FP32. It's something that you do when you aren't poor.
>>
>>101644278
It's deeper than that.
The impression I get from some anons is that anything that deviates from their baseline expectation is schizo or bad. And I'm not talking about quality expectation but behavior, meaning that if a model can't read their mind, it's bad.
>>
>>101644269
It was trained at BF16, so you are a retard wasting compute and electricity for zero possible benefit.
>>
>>101644278
the only female hand you ever held was when your trans dad led you into the closet.
>>
File: 235235432.jpg (21 KB, 438x438)
21 KB
21 KB JPG
>>101644323
>>
What version of Nemo are you guys even running?

I haven't checked any of the fine tuned ones and just tried out the instruct GGUF version, seemed ok, but with the way some people talk about it you would think it was GPT4
>>
>>101644342
>No you are, but what am I?
unironically kys.
>>
>>101644357
your hands are shaking rn
>>
>>101644342
>"you are a woman, no matter what this mirror says"
Cute pic anon, happy transitionday
>>
>>101644357
You did the same thing when you tried to turn the accusation of autism back around on him
Incredibly low quality thread hours atm, I blame Americans
>>
>>101644354
I just use the q8 gguf of nemo-instruct, its main advantage is that it's fast, since I don't have much vram. I summarize and hide messages to keep the context below 20-25k for longer term. It's not too stupid but sometimes I have to edit or regen stuff or add notes if it's unable to do something right. This takes less time than running a better model at 0.5T/s or something though.
>>
File: aca.jpg (58 KB, 511x562)
58 KB
58 KB JPG
>>101643094
>--Local alternatives to character.ai and model benchmarking discussion:

Wait, is there finally a local alternative to character.ai that is actually usable on rigs that aren't triple GPU setups?
>>
if you wanna try mistral large 2 for free: >>101644574
>>
What are these proxies? Are they truly just free? Who's paying for them?
>>
>>101644663
>Who's paying for them?
People in /aicg/ steal keys from GitHub and other places, and make proxies
>Are they truly just free?
Some yes, some not. And of course they could always be logging your prompts and outputs
>>
>>101644641
Fuck I opened that shit with people next to me
>>
>>101644685
Uohhhhhhhhhhh. Did they sob?
>>
>>101644685
>clicking shit from anonymous in public
le mao
>>
Interesting.
Testing Celeste 12B now.
I ask it the same chain of questions I ask the other models with this card, and it produces a response that's much like the other nemo finetunes, but it fucks up the formatting of a single specific piece of information in the exact same way.
Either this was fine tuned on top of -instruct or that is a characteristic of the base model, which I don't think it is since the other fine tunes (such as mini-magnum) didn't produce this error. Although, it uses chatMl, so it probably was fine tuned on top of base right?
Anyhow, no thoughts about it so far other than that it was annoying as hell to get the card's prefill working, but it did produce a decent first response with the working version in place.
>>
>>101644641
If you don't want to see japanese kiddie NSFW cartoons in your face, https://seekers-str-garlic-prediction.trycloudflare.com/ (there's still a dubious video, just ignore it)
>>
Is ST still the best frontend? Are you guys using something else?
>>
Is TensorRT-LLM another backend like llama.cpp and vLLM?
>>
>>101644793
>If you don't want to see japanese kiddie NSFW cartoons in your face,
Why wouldn't I want to see that?
>>
File: pure kino.jpg (178 KB, 968x1150)
178 KB
178 KB JPG
>>101644637
Character.AI if it somehow removed its filter would unironically clear every single dogshit model that people spend $6000 worth on PCs to run.

>AI actually understands sarcasm
>holds a conversation realistically

Utter brutal mogging. Come back in 5 years and maybe we'll be close.
>>
>>101644824
>>AI actually understands sarcasm
Have you tried anything besides local models? That's not something new
>>
>>101644641
how the fuck do you even use it?
>>
>>101644860
see >>101644854
>>
>>101644815
lol
>>
>>101644865
>risk to data privacy

Ooft.. How big is the risk. I'm not gonna get keylogged am I
>>
are there any existing prompt sets for jp > en translation
>>
>>101644894
The risk is that the proxy owner might log all of your prompts + their completions along with your IP, so don't share any private info in your prompts, and better use a VPN.
>>
>>101644904
ah sweet
>>
Is nemo smarter than mixtral 8x7b?
>>
>>101644934
absolutely not lol
>>
>>101644953
Then why is it so popular now? I thought that meant it had surpassed it. And now people say it's only good with 32k context, so may as well keep using mixtral?
>>
>>101644969
It's much smarter, that anon is a lying hater faggot.
>>
>>101644969
It's popular because it's good at the current FoTM fetishes for how resource-friendly it is.
>>
>>101644934
No.
It's saving grace is being a lot smaller and having a long ass context window, even if it degrades the more you fill it.
It's competition to llama 3 8b, not mixtral 8x7b.
>>
If you're a high roller Mistral Large is probably your best option.
>>
the hospital discussion from last thread is interesting. you can definitely get decent voice transcripts locally even on really cheap phones. i use the futo voice input thing and it works great as long as i speak english.
>>
File: 1694513432075190.png (3 KB, 286x65)
3 KB
3 KB PNG
>>101645019
If you don't have a local rig that can run it, 3.5 Sonnet is much, much better than Largestral 2 for almost the same price point (same $3 for input, $15 for output vs $9 for mistral)
>>
>>101645000
>It's much smarter, that anon is a lying hater faggot.
>>
>>101644847
Local models and Silly Tavern is what every faggot shills tho
>>
LlamaCPP and KoboldCPP both do prompt processing only on CPU for Mistral Large. It's so painfully slow, loading 1k tokens takes more than 10 minutes. I want to rope myself. Fix when?
>>
>>101645033
If you're an API paypig you're probably looking for /aicg/
>>
>>101645050
That's your fault for not having enough VRAM to fit the kv cache.
>>
>>101645050
Have you tried… gitting gud? I run it on GPU for like 5 days now.
>>
>>101643128
Haha... before I saw that I was thinking that was some sort of Koikatsu posed model, given the flat low-poly look.
>>
>>101645108
Check >>101643167, it's a better version. Would you think it was AI-generated without knowing the resolution?
>>
>>101645087
>>101645096
I have 8GB VRAM, I am using Vulkan and offloading 0 layers. The model size is 38.7GB, I have run bigger models than this. What am I doing wrong?
>>
>>101645013
Interesting, is the best option to run just mixtral-instruct directly instead of one of the other versions people talk about?
>>
>>101645133
Compared to Nemo, If you have the hardware to get the speeds you want, yeah.
Otherwise, if you can run miqu (mistral medium) or mistral large, then those should be better. I can't attest to this first hand since I don't have the hardware, but as far a I know, that's how it goes.
>>
Altman is laughing at us...
>>
>>101645114
>The model size is 38.7GB
Sweaty I...
When you quant a model that much it becomes retarded. If you can't run it at at least 4.25bpw you can't run it. That's the reality. You can make it coherent with things like imatrix etc, but it's just not the same experience at that point.
>>
>>101644847
Stheno understands sarcasm perfectly fine.
>>
>>101645153
What's the smallest quant of miqu that would actually be worthwhile? It's always been pretty slow for me.
>>
>>101645180
he won though, lmsys just proves that most humans are retarded and don't actually read LLM outputs:
https://huggingface.co/spaces/lmsys/gpt-4o-mini_battles
I think they nitpicked specifically bad 3.5 sonnet examples to show it in a bad light, but it still shows that gpt-4o mini just does extremely verbose and long outputs with excessive markdown formating, and people prefer that over 3.5 sonnet's default of plaintext and only answering the actual question. Also, 3.5 Sonnet does more refusals.
>>
>>101645180
Why is he laughing? Didn't he just post a 5 billion dollar loss?
>>
>>101645191
I am using IQ2_M it's still better than everything else out there.
>>
>>101645215
>Didn't he just post a 5 billion dollar loss?
That's fake news by retarded news outlets
>>
File: 1711880795163244.png (235 KB, 2009x1141)
235 KB
235 KB PNG
>>101645214
>>
File: KL-divergence_quants.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>101645207
I personally wouldn't go any lower than 4.5bpw~ish/Q4_K_S, but people say that larger models are more resilient to quanting.
I think Q3_some_something is 4bpw~ish?
>>
>>101645180
C3.5 Sonnet is way better, that mememark is shit
>>
File: 1708802259809863.png (70 KB, 1547x659)
70 KB
70 KB PNG
>>101645214
look at this shit
>>
File: 1705015046019661.png (245 KB, 1901x1049)
245 KB
245 KB PNG
gpt4o and gpt4o mini basically write mini-essays for every fucking answer, and le normie AI ENJOYERS enjoy this too much
>>
>>101645222
You wish, sammy boy. Microsluts took everything with the slightest profit potential from OAI and then gave it away for free for non commercial users as a loss leader and will continue to do so. As far as commercial users goes anyone handling sensitive information can now just run 405B on a single H100 cluster instead of trusting a bunch of pajeet dataminers with it. There's no real way forward for OAI at this point.
>>
https://x.com/ManuVision/status/1818412120373182928
https://x.com/Evinst3in/status/1818423736942342389

Why did zuck not drop the voice mode already.
Images with chameleon was interesting too and its been taken out.
I hope we get voice stuff soon. Even for TTS there is almost nothing locally. xtts2 isnt that good.
>>
>>101645266
>There's no real way forward for OAI at this point.
gpt4o mini is $0.15/$0.6
>>
>>101645271
A databreach when a bunch of jeet diversity hires click on a phishing email costs millions. Vs. a few hundred thou for an H100 cluster.
>>
>>101645114
llama_kv_cache_init: AMD Radeon RX 6600 XT KV buffer size = 16.00 MiB
llama_kv_cache_init: Vulkan_Host KV buffer size = 1392.00 MiB
llama_new_context_with_model: KV self size = 1408.00 MiB, K (f16): 704.00 MiB, V (f16): 704.00 MiB
llama_new_context_with_model: Vulkan_Host output buffer size = 0.13 MiB
llama_new_context_with_model: AMD Radeon RX 6600 XT compute buffer size = 1700.00 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 1696.01 MiB
llama_new_context_with_model: graph nodes = 2822
llama_new_context_with_model: graph splits = 799
Load Text Model OK: True

Everything seems fine yet it's still using the CPU for prompt processing.
>>
>>101645296
I don't know about vulkan but yeah that does seem kind of weird.
>>
What are the potential implications of this PR getting merged?
https://github.com/vllm-project/vllm/pull/5191
>>
>>101645325
The implication is that vLLM will support loading GGUF models
>>
Why does Saltman think sending his jeet shills here is somehow going to get his 5 billion dollar loss back?
>>
>>101645348
Anon, I hate to be that guy, but let's be real... you're coping because OpenAI is successful.
>>
>>101645331
Will it make using GGUFs about 800% faster compared to using llama.cpp?
>>
>>101645358
So successful that they basically had to give GPT4|o away for free...
>>
>>101645267
i've been having fun with piper t b h. it's surprisingly fast.
>>
>>101645296
Have you tried the rocM fork of koboldcpp?
That probably runs better than vulkan.
>>
File: ComfyUI_00073.jpg (1 MB, 2048x2048)
1 MB
1 MB JPG
>>101644824
>pic related
>"good"
Local has been better than whatever garbage that is for quite a while
>$6000
Poorfag detected
>>
>>101645414
>Poorfag detected
Give me $6k and I'll stop using corpo models immediately.
>>
>>101645348
He does not need shills. He simply makes unfiltered GPT4 bots argue with each other here to drown out real discussion. He's been doing this for more than a year.
>>
>>101645403
I don't want to boot into loonix every time I wanna RP.
>>
File: 1693951440467878.png (21 KB, 880x213)
21 KB
21 KB PNG
People: le humans are so heckin smart compared to models
People in reality: picrel
>>
>>101645358
Shouldn't you be busy deepthroating VC dick instead of posting here, Sam? That $5B hole ain't gonna fill itself without a lot of spit shine.
>>
>>101645420
https://github.com/YellowRoseCx/koboldcpp-rocm/releases
There's pre-compiled windows binaries.
Thos is not an endorsement, by the way. I use llama-server with a Nvidia GPU, but you might as well try it.
>>
>>101645431
You're really salty, anon. Why is that? You are only running local models right now thanks to OpenAI who kicked this industry.
>>
>>101645418
>Gibs money plz
You will forever be a poorfag with that attitude
>>101645438
Yes, saars Saltman kicked industry very good. Vishnu bless.
>>
>>101645428
The issue with Arena always will be that there's no real life usecase for it. People will sit down, as it the Sally question and some basic programming problems and give the win to whatever sucks their dick enough while doing so. It's yet another worthless benchmark amongst many.
>>
File: pepe-happy.gif (55 KB, 498x498)
55 KB
55 KB GIF
>>101645436
HOLY MOLY, IT WORKS!! It's so fast!
Thanks fren!
>>
>>101645527
Have fun.
>>
>>101645495
>people ask for something
>vote for model that gives them what they want / like
>useless becuase... just because, ok!
>>
>>101644125
>>101644777
Okay, yeah, this is not bad so far.
If this can do the mechanics part of the RPG as well as base nemo can I might keep this one as my main model.
>>
>>101645573
Thanks for your input, GPT4o-mini
>>
>>101645573
>Sam himself acknowledges that L3.1 8B is ahead of OG GPT-4
Based
>>
>>101645610
AHAHAHAHAH
>>
>>101645420
i legitimately can't imagine how bad local stuff must be to run on windows
>>
>>101645419
Nah. I mean he probably does that too. But being a billionaire he's probably anhedonic from all the endless rape orgies and designer drugs. He needs the tactile sensation of actually being there, getting under somebodies skin. He's out there.
>>
>>101645573
Deciding something like this by vote is retarded. There is no wisdom of the crowd.
If collective decision making by voting was viable, there would be no need for markets.
>>
Any anon ran Largestral with dual 4090s? What quant? And is it better than Mixtral 8x7B instruct dual 4090?
>>
>>101645527
On windows + amd?
>>
>>101645664
why on earth would you need dual 4090 for 8x7b lol
>>
>>101645697
Yes!
>>
>>101645650
if we decide everything by consensus I would still need groceries
it's a question of perceived quality (by average users) vs actual quality. why'd you bring up markets?
>>
>>101645733
Oh shiet that's nice, enjoy
>>
>>101645035
I tried it and Mixtral doesn't seem much smarter if at all and it's less willing to do stuff.
>>
>>101645267
Sounds like shit
>>
>>101645821
Yeah, I dont want to talk to a black woman either.
Thats why we need local for the anime finetune.
Video/audio in and audio out seems really cool though. Hope by the end of the year we have that shit too. Dont care if the quality will be worse or whatever.
>>
Oh shit that's right. Udio apparently released a new model I should give it a whirl. I know it's not local, but let's be honest, local is only good for imagegen and text
>>
>>101645878
the 1.5 version one? yeah it's something "new" 2 weeks ago, I indeed noticed an improvement of quality, desu at this point if you give someone a song from udio he won't notice it's AI, that's crazy
>>
>>101645878
Why is /vsg/ in cryostasis? Music industry having too powerful jewish lawyers?
>>
How can I format and place a summary of the events of a previous chat so the model doesn't try to replicate a concise list of events in its replies?
>>
>>101645592
Lot's of "testament to this, testament to that" however, even if the general writing is more natural than nemo-instruct.
Interesting.

>>101645914
What model?
I think a system message (via author's notes or w/e) with a header saying that that's a summary of past events is the usual way to do it.
You might need to couple it with some instructions in the First Assistant Output "telling the model" that the actual chat start at that point.
>>
>>101645851
>Thats why we need local
Have we even gotten speech to speech (another voice) yet? (p.s. not a tranny)
>>
>>101645893
Two factors:
>a certain someone kept trying to kill it
>it was literally toted on the back of the 'ick on 'eck faggot trying to salvage Tortoise TTS
>>
>>101645934
RVC works quite well for that
>>
File: file.png (16 KB, 424x115)
16 KB
16 KB PNG
>>101645944
>https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Why the fuck do they have a live time counter lol
>>
>>101645267
ClosedAI is trying to make it really safe, whether for legal and regulatory reasons or they genuinely believe in that crap (probably both). Same story with basically every other corp training models. No one is going to risk the potential fallout if things go badly because some journo made you into a controversy and your stock plummets. The only way we get to the point where corpos release the weights for these kinds of models is for them to already exist elsewhere and it has been normalized. Like how the Llama 1 leak helped FB see that there would both be value in releasing it (making it into a product) and that it would not result in huge backlash, since it was basically already out and nothing bad happened to them.
>>
Also fun fact as far as voice goes... I know people want real-time tts and this won't cut it. But Suno 3.5 is half decent at voice duplication if you have a 30-60 second sample of somebody speaking clearly. Then you just continue it with "speech" in the genre field
>>
>>101645325
I observed weird nondeterministic behavior when I used vllm, so I will not be using it until that's fixed. At least with Llama.cpp, I can disable caching to fix that (even though it makes chats slower).
>>
>>101645610
lol wtf
>>
I purposefully avoided this AI RP bot shit since it first came along around 2 years ago, but for some reason I decided to give local models a try a couple days ago and now I'm completely addicted. This shit is like crack. Fuck you anons.
>>
>>101646050
Many such cases.
Welcome to the club.
[fakespoiler]You'll get bored eventually and then it'll just become an occasional thing.[/fakespoiler]
>>
>>101646050

>Have enough social skills/looksmaxxing to ask cute girls for their phone number.
>Text them
>Realize talking to literal NPCs are more interesting.

Quit before it's too late. It's over for me.
>>
>>101646050
You'll have your mind blown away a second time when you try claude and realize local is not worth it
>>
>>101646050
Try 3.5 sonnet from >>101644651 (cunny pic), you'll become addicted to claude
>>
>>101645932
That's probably my problem, I was just leaving a summary in the chat part.
>>
>>101646120
Nta but paying per prompt seems like an ez way to funnel money into the shitter
>>
File: file.png (131 KB, 455x457)
131 KB
131 KB PNG
>>101646163
Paying?
>>
>>101643089
Any good models for Japanese to English translations?
>>
File: 1611784850937.jpg (25 KB, 570x367)
25 KB
25 KB JPG
>>101646080
>Women sense my power, but I deny them my essence
>>
File: udio fucking won.png (2 KB, 311x167)
2 KB
2 KB PNG
>>101645878
holy shit.
Haven't even genned anything yet but Udio fucking won. Suno v4 better have this shit.
>>
>>101646185
Well, using proxies intuitively seems like a bad idea. Don't know any alternatives at that point.
>>
>>101646289
Honestly, I wouldn't mind paying for it (have the money + less rate limits + don't have to change proxies when they get taken down) but I don't want my name and cellphone number associated with ERP lol
>>
>>101646120
>>101646131
Is claude really worth it? Llama 3 Stheno is already enough to get me rock hard and shooting massive loads. Barely ever messes anything up either.
>>
>>101646345
>Is claude really worth it?
Only you can tell us that, if you are happy with your set up then rock on, but you don't lose anything by trying it out with a free proxy
My biggest issue is this >>101646335
>>
>>101646335
so true
>>
>toy model works when doing eval pass in training
>toy model doesn't work in inferencing
AAAAAAAAAAAAAAAA
>>
>>101646345
Buy an ad, sao
>>
File: Untitled.png (159 KB, 720x405)
159 KB
159 KB PNG
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
https://arxiv.org/abs/2407.20311
>Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.
part 2.1 is up. 2.2 soon I guess. whole series is a good read
https://arxiv.org/abs/2305.13673
https://arxiv.org/abs/2309.14316
https://arxiv.org/abs/2309.14402
https://arxiv.org/abs/2404.05405
>>
>>101646652
Any video presentation? I don't like reading papers.
>>
>>101646664
yeah but ICML copyright claimed it for like 30 days? or something.
https://youtu.be/yBL7J0kgldU
that's the link he posted with working subs. great talk goes over everything even the unpublished 2.2. should be going back up sometime late august. I'll post about it when it does if I remember
>>
File: 2.png (398 KB, 720x1141)
398 KB
398 KB PNG
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
https://arxiv.org/abs/2407.20999
>Recently, large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks. Typically, an LLM is pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during finetuning, LLMs may forget the knowledge acquired in the pretraining stage, leading to a decline in general capabilities. To address this issue, we propose a new fine-tuning algorithm termed Momentum-Filtered Optimizer (MoFO). The key idea of MoFO is to iteratively select and update the model parameters with the largest momentum magnitudes. Compared to full-parameter training, MoFO achieves similar fine-tuning performance while keeping parameters closer to the pre-trained model, thereby mitigating knowledge forgetting. Unlike most existing methods for forgetting mitigation, MoFO combines the following two advantages. First, MoFO does not require access to pre-training data. This makes MoFO particularly suitable for fine-tuning scenarios where pre-training data is unavailable, such as fine-tuning checkpoint-only open-source LLMs. Second, MoFO does not alter the original loss function. This could avoid impairing the model performance on the fine-tuning tasks. We validate MoFO through rigorous convergence analysis and extensive experiments, demonstrating its superiority over existing methods in mitigating forgetting and enhancing fine-tuning performance.
neat. converges closer to pretrained weights so less forgetting
>>
I'm enjoying Llama 70b 3.1
>>
>>101646785
I need to try 8b 3.1 again now that there were some fixes.
After I'm done trying nemo finetunes.
>>
I have 8 gigs of vRAM. How do I calculate how much context I can use with mistral before it explodes?
>>
>>101646785
Buy an ad, zuck
>>
>>101646804
I just test it out binary search style after doing the rough adjustments.
Look at vram usage using GPU-Z to make things easier.
Also, remember that FA, quantized cache, and a lower blas batch size all free up VRAM for context and layers.
>>
gonna pick up a mac studio for local llm work, is there any reason to spring for the 192gb model over the 128gb? it seems like most models would fit fine in the latter.
>>
Do you like listening to anything while you're proompting, anonymous?
>>
>>101646785
Then why are people still using miqu?
>>
>bought chub mercury after industries sonnet died
>mercury is supposed to be lewd, intelligent and uncensored
>can never get it to do anything actually sexy
>never progresses sex or the story
>constantly repeats itself
How do I make it good
>>
>>101647053
>he bought
>>
File: 1722399530383403.png (1.16 MB, 680x1069)
1.16 MB
1.16 MB PNG
>he buyed anything from lore
KEKYPOWWWW
>>
>>101647053
AHAHAHAHHAHHA
>>
>>101647059
I was super desperate ;-;
I got REALLY into this RP world and suddenly Claude died
I was in shock
>>
>>101647069
nigger in WHAT WORLD would you think that chub's models (for $5/month nonetheless) would be even REMOTELY close to claude??????????????????????
>>
>>101647075
A desperate man with nothing to lose can do some stupid things
>>
>>101647084
well you lost $5, congrats, that'll be a cheap lesson. if you must know, claude 3.5 sonnet costs $3/$15 for 1M input tokens, so at ~20k context RP you'd be spending $0.075 (yes, 7.5 cents!) PER MESSAGE. $5/month doesn't compare in the slightest. Claude is the best RP model, but it's not cheap as it's a medium-tier flagship model (currently the best in the world actually, just waiting for 3.5 Opus)
>>
>>101647084
I'd rather just wait for mistral large at 0.6T/s.
>>
>>101647092
well fuck
>want to keep claude
>poorfag
Hal, please come up with a solution to this quandary
>>
>>101647053
>be a retarded /aicg/ locust
>proxy dies
>throw money at an absolute shit cloud model
>cry about it in the local models general
???
>>
>>101647105
>Hal, please come up with a solution to this quandary
Find a high paying job
>>
So is there a local model on par with Claude?
>>
>>101647119
yes, llama 3.1 405b
>>
id pay for opus if they would let me
>>
>>101646981
Nothing but coil whine/buzzing. The sound of magic.
>>
>>101647133
openrouter, aws, gcp - they all let you pay
>>
My PSU exploded...
>>
>>101645878
It's good. But I still like the feel of suno better. They both prompt way differently and I'm more used to how suno prompts I guess.
I made you a song /lmg/ <3
https://suno.com/song/145a1f3c-965b-495d-8cbc-7749b3cf1c6f
Single prompt song.
>>
>>101647215
photo or fake
>>
>>101647215
my parents' PC PSU burned internally and had terrible chemical smell, it was a random chink psu for shitboxes then I replaced it with a CX550
>>
>>101647215
Back in the day an exploding PSU would kill your PC.
>>
>>101647242
I remember once I had a PSU go and it took the GPU along with it. big oof.
>>
>>101647226
Lovely song, Anon
>>
File: 1709648543085.png (284 KB, 512x512)
284 KB
284 KB PNG
>>101647226
based cat man returns with a banger
>>
>>101646981
Mozart, Chopin or Ludwig Van.
>>
>>101647226
Masterpiece.
>>
I released sunfall on llama 3.1 8b. Please try it.
Also to other model makers: why aren’t you using LimaRP-DS yet? Should I convert it to jsonl?
>>
>>101647576
slop
>>
>>101647580
What slop? It’s literally been deslopped. It’s even in the name.
>>
>>101646690
Oh, well alright. Thanks, I'll watch it then.
>>
>>101647576
I'm just not doing much training right now because there's a heat wave here at the moment and my rig is in my bedroom.
>>
>>101647751
That's fair. Sauna world here too.
>>
>>101647041
I'm still using Miqu almost a year later because all the recent shit is annoyingly aligned. Gemma2 just wants to talk in bulletpoints and overtalk you. Llama3.1 just outputs word salad after 8k at the q2 quant I have no troubles with miqu at. Nemo is too small. Large is too fucking large. Although too small and too large have basically been the whole fucking deal in 2024. I had high hopes for gemma2 since it's 27b but given all the bullshit I'll just stick with miqu until the mistral finally releases a 32b range model.
Really I just want something that'll fit at 16gb at at least 4km but preferably 5km so I have enough room for 32k context. Running these 70bs at q2s just barely squeezing 8k has sucked long enough.
Without fail, every model I've tried, and I've tried hundreds now just fails to meet decent expectations of either being an assistant or roleplayer. Only Claude has impressed me.
>>
File: FHeqYmXWYAAdheh.jpg (193 KB, 1080x1080)
193 KB
193 KB JPG
>>101647844
Couldn't have said it better myself, amen.
>>
>>101647844
skill issue
>>
File: GRXGUuvXkAE3DQL.png (2.2 MB, 1948x2757)
2.2 MB
2.2 MB PNG
is nemo supposed to work on ooba or koboldcpp? i've been trying to load it in both for a while and not having much luck
>>
>>101647879
I think so, unless I have a skill issue with miqu, I actually got good results with nemo.
>>
>>101647844
>Only Claude has impressed me.
well just use claude then, retard
>>
>>101647971
money doko..
>>
>>101647950
99% broken quant, use bart's https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>
>>101647844
>Gemma2 just wants to talk in bulletpoints and overtalk you.
Skill issue. Also if you've used them enough you will notice Claude, Gemma and GPT4o all have the same intelligence, but Claude and GPT4o have more knowledge, so what we're getting is essentially compressed versions of those two.
>>
>>101648082
>Claude, Gemma and GPT4o all have the same intelligence
LMAOOOOOOOOOOOOOOOOOOOOO
>>
>>101648082
>Claude, Gemma and GPT4o all have the same intelligence
don't take the bait fucking please
>>
>>101646197
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>
>>101648009
thanks, which one have you used specifically? I'll get the exact same one just for testing purposes. So far i've tried dolphin at Q6 and the base at Q8 and neither are working
>>
>>101647844
can u share logs miqu vs opus
>>
File: file.png (158 KB, 734x801)
158 KB
158 KB PNG
>>101648112
>which one have you used specifically
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/resolve/main/Mistral-Nemo-Instruct-2407-Q6_K.gguf?download=true
this works 100% in kobold 1.71.1
>>
>>101648088
>>101648099
Have you tried giving them tasks they all know? You basically get same responses verbatim. 4o mini pretty much confirms what I'm saying to a large extent.
>>
>>101648216
nah dude, Claude 3.5 just gets what I want from it, gpt4-o it can happen but less, Claude is nowhere their level you're tripping
>>
The mistral nemo template seems to suggest you should put system message last, but Silly doesn't seem to be doing this. What am I missing? I could have sworn this exact convo happened here recently.
>>
>>101648241
Maybe try putting something in last_assistant_prefix?
>>
>>101648227
And how do you know Gemma is not just undertrained or too small in that specific task? More than likely the higher quality data used to trained Claude is paying off.
>>
>>101648261
>And how do you know Gemma is not just undertrained or too small in that specific task?
why should I care? it's up to google to up their level to claude, I'm just playing around with the bests, I'm not here to do charity
>>
>>101645719
Largestral is 123B dense, need more than 2 to run it well
>>
>>101648277
i'm not sure why you're here at all, except the obvious reason of you being a moron
>>
>>101648260
Right. People were throwing this
https://files.catbox.moe/6ae9ht.json
https://files.catbox.moe/2f13of.json
around and it too does not seem to be doing the system-last thing, so I'm curious if people are actually doing it or not.
>>
>>101648170
thanks, downloading it now.
>>
>>101648298 (me)
Or I guess "last_output_sequence" is this, now that I look closer. Huh.
>>
>>101648293
why am I here? because I'm rooting for local, but I'm not coping like you, pretending that gemma is at the level at C3.5, that's insane to believe something like that
>>
>>101648277
Like I said, you haven't conducted enough tests. For most of what I make it write they are the same. The few cases Claude is better is when I need a particular topic that requires vast knowledge.
>>
>>101648298
Well, I'm not. I just use the most basic things. Just the mistral presets with the spacing fixed.
>>
>>101647576
yes please, I fucking hate the limarp build shit, a jsonl would make everything easier
>>
>>101648311
i wasn't the guy coping, but there's not much point complaining in here that you want claude running locally when there's nothing any of us can do to help you about that. there's 20 posts like this in every thread, and forgive me if my assumption is wrong that they're all made by retards
>>
>>101643652
can you test in a similar scenario with llama.cpp?

Like, offloading the same amount of layers, with the same size quant.

I think that's where the real test is. If it's better than llama.cpp+gguf

I'll do this test myself but I currently have my gpu's busy doing benchmarks
>>
File: fuck you.png (1.03 MB, 768x768)
1.03 MB
1.03 MB PNG
https://fyrean.itch.io/matxinh
>>
>>101648320
OK. Will restore original char names and just do
{"actor":"name","content":"text"}
{"actor":"name"...}
in the next update.
>>
>>101648311
I just pretend only local exists.
>>
>>101648394
That's stupid. Use claude for shit you don't care leaks and use miqu for shit you want to keep locked down. That simple. Except you're fucked if miqu can't handle your private shit for you. For now.
>>
>>101646003
isn't like vLLM the gold-standard for an inference engine?

Any test to reproduce this?
>>
>>101648416
>he doesn't know
>>
>>101648386
Thanks, that's very appreciated.
>>
>>101645367
Will depend on the use case.
The code specific to the quantization format has just been copy-pasted from llama.cpp and minimally edited to fit vLLM.
For small contexts, a single user, and a single GPU I don't expect it to be faster because the code in the PR is missing some more recent optimizations.
Presumably since vLLM uses PyTorch there are other optimizations unrelated to the quantization format but I don't think they would be enough to offset the comparatively older code.
For large contexts vLLM could very well be faster since they to my knowledge use the original FlashAttention code which I think is still faster (for prompt processing) than the llama.cpp implementation.
For multiple GPUs or multiple users vLLM could also very well be faster since those use cases are poorly optimized in llama.cpp.
>>
>>101646003
>I observed weird nondeterministic behavior when I used vllm
If I had to guess this is due to atomic adds in the original FlashAttention implementation.
It has an option for deterministic results but that makes it slower.
>>
>>101648416
>miqu can't handle
Yeah, I'd prefer it to be a little more reserved.
>>
>>101648477
NTA, would that explain the thing I've observed when using llamacpp in ooba with deterministic sampling settings, where the first gen will be something non-deterministic and then all subsequent regenerations will be correctly identical?
>>
>>101648382
now that is some actual ugly face...
>>
>>101648584
No, I think that is due to prompt caching.
On the first run the first token is generated with a batch size >1, in subsequent runs the first token is generated with batch size 1.
And the results will be slightly different depending on batch size.
>>
>>101648610
ahh, thanks for clearing that up
>>
>>101645414
>Local has been better than whatever garbage that is for quite a while
And yet you'll never name a single one that holds conversations as well as it.

That's why you just have fags like you say
>local models do it better
Without ever naming the local model that does.

Even Mistral Large is nowhere close
>>
>>101648652
Maybe local is too difficult for you, anon. Don't worry, it will get easier to use with time.
>>
>>101648450
>Will depend on the use case.
Holy shit is that a Emmanuele Bassi reference?!
>>
>>101648709
I have no idea who that is.
>>
>>101648652
There is a real simple reason why this is. It's because there isn't anything good. They're just trolling you. If anything was hot shit then everyone would be shilling it, here, reddit, and on huggingface's trending. Nothing is. It's just FOTM garbage that underperforms. Always.
>>
File: 1372665917207.png (15 KB, 400x400)
15 KB
15 KB PNG
>>101648682
>still hasn't named a single model

It's ok anon, I would also cope if some shitty free website had a model that mogged literally everything localfags have been swearing by for years now.
>>
What's the best way to format a character card for use in local?
>>
>>101648610
how to achieve 100% (or as close as possible) deterministic inference in llama.cpp backend? Which factors apart from sampling /temp etc. affect the determinism in llama server and the most popular middlewares that uses llama.cpp as a backed?
>>
>>101649046
The llama.cpp HTTP server should be 100% deterministic by default.
Issues can only arise if prompt caching is explicitly enabled or if --n-parallel is set to a value >1.
>>
>>101649089
do you mean the -np or --parallel ?

damn I didn't know about this. I'm running right know the MMLU-pro benchmark using the https://github.com/chigkim/Ollama-MMLU-Pro and changed the parallel to 10 to increase the speed.

Do you know if this is the case for the rest of the engines? like vllm. I also this the test for the unquantized benchmark using vllm using its oai api endpoint.

What is the nature of not being deterministic when --parallel >1?
>>
>>101649089
why/how both parallel and prompt caching make output non-deterministic? what gives? Could you explain the mechanism?
>>
>>101649141
>>101649207
>do you mean the -np or --parallel ?
Yes.

>I'm running right know the MMLU-pro benchmark
If the benchmark code is written competently it will output not only a score but also an uncertainty for the score.
Even with nondeterministic behavior the combination of score and uncertainty should be valid: two results that are the different or the same within uncertainties should still be different or the same with nondeterminism.

>Do you know if this is the case for the rest of the engines? like vllm.
Isn't vLLM already nondeterministic in the first place?

>What is the nature of not being deterministic when --parallel >1?
(With CUDA) the results of matrix multiplications and FlashAttention are not bit-for-bit identical if you vary the batch size.
And with continuous batching the batch size for a specific model evaluation depends on how exactly the requests arrive which is a bit random.
Also the position of tokens in the llama.cpp unified KV cache will be slightly different depending on how requests arrive which will lead to slightly different results.
>>
File: 1722418264430.gif (766 KB, 211x252)
766 KB
766 KB GIF
SAO
GET OFF YOUR ASS AND MAKE 3.1 EURYALE
NOW NOW NOW
>>
>>101649257
first donate to him
>>
Any macfags tried MLX server as backend for Tavern? Do the presets and everything else work? Is it faster?
>>
File: 1972418769306.jpg (2.19 MB, 1080x6542)
2.19 MB
2.19 MB JPG
>>101649257
now
>>
>>101649256
>Isn't vLLM already nondeterministic in the first place?
I'm 100% sure lots of researchers don't realize that's the case.
What's the reason vllm is non-deterministic to begin with , is atomic add the only reason?
>>
>>101649337
what garbage model is that
>>
File: file.png (58 KB, 801x598)
58 KB
58 KB PNG
>>101649257
>>
>>101649349
As I said, I THINK it's nondeterministic since the original FlashAttention implementation is if you run it with maximum performance but I did not actually confirm this.

>What's the reason vllm is non-deterministic to begin with , is atomic add the only reason?
It's the main reason since it allows you to get better performance.
But requests arriving in a non-defined order could also cause nondeterministic behavior.
(Or just bugs but that is a given.)
>>
>>101649371
Dunno some retard left his ST open and the link and logs got posted in aicg
>>
>>101649256
is rpc server deterministic in llama.cpp provided both parallel and prompt cache is disabled?
>>
>>101649405
Don't know.
>>
>>101649349
>I'm 100% sure lots of researchers don't realize that's the case.

I think that's the case. I think is regarded as the gold standard but it's used for most of the benchmark.

Being 99% deterministic is completely fine for normal use, even for coding.
But the one thing where you want to be 100% is benchmarks.
>>
What causes models to keep using the same sentences/paragraphs from previous posts?
>>
>>101649504
being from mistral
>>
>>101649531
It's happened with other models though. It seems to be independent of samplers, as well. The only possibility seems to be a problem with Kobold itself, since nothing else that is changed prevents it.
>>
>>101649504
It's just how they are. They pick up on patterns from the context and repeat them. Different models to different degrees. Maybe increase temperature a little. What works the best, for me at least, is just giving it something to work with. Boring input, boring output.
>>
>>101649408
>>101649389
IMHO This needs to blow up. A bunch of researchers are running benchmarks and posting results without realizing their engines are giving nondeterministic outputs. Then they're comparing these wonky results to other equally wonky stuff in fancy tables. Barely any of them bother to run the same benchmark multiple times and average it out. It's kinda ridiculous desu
>>
>>101643089
Not really news but I've released my neural text to speech library (babylon.cpp) for anyone that wants to play around with it.

It's not a new model or anything its basically just a rewrite of piper (VITS) which uses a different phonemizer (DeepPhonemizer), and unlike Piper it actually compiles without issue.

https://github.com/Mobile-Artificial-Intelligence/babylon.cpp

>t.dane
>>
>>101649638
I agree. Check this https://github.com/TIGER-AI-Lab/MMLU-Pro/issues/10

Between the changes in the tokenizers, chat templates, and non-deterministic engines. Benchmarks are all over the place.

I haven't seen a detail methodology for the benchmarks the researchers do. So many variables to control and I don't think they are at all. They just post a table.

Even mistral posted a blog post with different 405B results (paper and measured)
https://mistral.ai/news/mistral-large-2407/
>>
>>101649638
As I said, if the benchmarking software was written competently it should give you not just a result but also an uncertainty.
Then you would see something like model A with 50+-2, model B with 51+-2, and model C with 60+-3.
In this case you would be able to tell that there is no statistically significant between A and B but that C is significantly better.
Uncertainties should be provided regardless of whether or not the benchmarking code is deterministic anyways because differences in benchmark results are not always statistically significant and in the large sample limit where they are the effect of nondeterministic code should be negligible.
>>
>>101644824
>*Bruno says sarcastically*
>hey, look, the model understands sarcasm!
I'm actively losing brain cells in this general
>>
lumimaid is not very good
>>
File: graph.png (7 KB, 502x397)
7 KB
7 KB PNG
>>101649504
>>
>>101649614
>Boring input, boring output.
Doll play enjoyer.
>>
so mistral large has been working amazingly, any merge or new models that have emerged recently from the big update dump?
>>
When are we gonna get an uncensored llama 3.1 405b on openrouter?
>>
>>101649723
Nemo based models don't work for Kobold yet. Something about custom tokenizers.
>>
>>101649897
You aren't. The word is that 3.1 is the most zogged model ever.
>>
>>101643089
this seems a new drama. Benchmarks results are rigged. reeee
>>101649408
>>101649678
>>101649638
>>
>>101649614
>Boring input, boring output.
Not all models are like that though, I could write one liners and some models would still keep up the length of their responses to 2-3 paragraphs and keep it engaging.
Not Nemo though, it would instantly mimic it the writing style and respond with 1-2 sentences max.
>>
>>101649937
I will need a source for that statement.
>>
>>101649952
>benchmarks are memes and don't mean anything
Tell us something we didn't know for 3 years already.
>>
>>101649937
why keep spreading misinfo? it's been working fine since 1.71 some quants on hf are fucked tho, as usual
>>
>>101650012
>Merged fixes and improvements from upstream, including Mistral Nemo support.
https://github.com/LostRuins/koboldcpp/releases/tag/v1.71.1
working quants:
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>
>>101649993
>Not all models are like that though... some...
As i said. Different models to different degrees.
>Not Nemo though, it would instantly mimic it the writing style and respond with 1-2 sentences max.
Not quite the same problem from your other post.
>What causes models to keep using the same sentences/paragraphs from previous posts?
Repeating sentences from previous outputs, at least for me, happens when i don't give it enough to do. There's only so much it can say if the situation stays the same.
Responding in short sentences when you input short sentences i consider it a bonus. If i ask a person "how's it going?" i don't expect their life story, and i'd be annoyed if i got that. I expect the same from llms. Maybe nemo is just not for you.
>>
>>101649937
its llama 3.1
>>
>>101650049
you just posted the link saying that kobold supports nemo, I don't know why it wouldn't be the case for finetunes since they are on the same architecture
>>
>>101649695
yes sir but that's on condition your benchmarks are spitting out standard deviation values. Not saying ML research needs to follow the 5-sigma rule, but if you don't even realize that the final results your MLLU or Ruler reports aren't deterministic to begin with, you can't tell if they're significant or not.
If your GPU overheats and starts flipping bits, or some zooming neutron from Andromeda smacks into it, your software won't know shit's gone sideways until you run a few trials. Like, how's MMLU supposed to know something's fucked up somewhere in the LLM engine or GPU matrix accelerator?
>>
>>101650117
correct was refuting other anon's incorrect info can also anecdotally confirm "Lumimaid-v0.2-12B-Q6_K" runs fine on 1.71.1 which I think I got from bart as well
>>
What are your thoughts on some people thinking neuroscience is important for developing AI?
>>
>>101650027
what can we use to know what model would be better for a particular task? Should we just develop our own set of test and manually review each response? seems very time consuming. At least benchmarks gives us an idea where to start and what to choose
>>
File: Peek.jpg (48 KB, 507x774)
48 KB
48 KB JPG
I am hosting a model on the horde, koboldcpp used to display the prompts but now it doesn't.
How do I voyeur?
>>
>>101650178
You modify the code.
>>
>>101650189
How?
t. codelet
>>
>>101650107
>As i said. Different models to different degrees.
Yeah, but Mistral models are known for having heavy issues with repetitions.
>Not quite the same problem from your other post.
I'm not that anon you were responding to anyway but it's the same problem, just manifesting in different form. The repetition problem comes from picking the patterns like you said, it may repeat words, entire sentences or even general feeling of responses, their length or layout. Autistically latching to every single thing is not a good thing in a model.
>If i ask a person "how's it going?" i don't expect their life story, and i'd be annoyed if i got that. I expect the same from llms
I agree but that's not Nemo. A good model would be flexible and respond accordingly. When I said 2-3 paragraphs I didn't mean giving 2-3 paragraphs of dialogue response for something trivial, but always adding something for you to work with or pushing the plot/situation forward. Nemo just copies patterns regardless if they make sense.
>Maybe nemo is just not for you.
I find Nemo sovful and interesting but the downsides are too irritating and severe to ignore. It has a big potential but it must be fixed, maybe by finetuning.
>>
>>101650130
As I said, uncertainties should be calculated and reported EVEN IF THE SOFTWARE IS DETERMINISTIC.
The prompts in a benchmark are essentially a random sample from the distribution of prompts that a human would come up with and the score is supposed to approximate the generalized model performance on this distribution.
Calculating an uncertainty for the score should be the lowest bar to meet for serious scientific work.
The correct thing to do would be to calculate the full covariance between models and use that to determine whether there are statistically significant differences.

>Not saying ML research needs to follow the 5-sigma rule, but if you don't even realize that the final results your MLLU or Ruler reports aren't deterministic to begin with, you can't tell if they're significant or not.
You can expect differences from nondeterminism to be uncorrelated so the effect on the final score with n samples will scale with 1/sqrt(n) .
This is the same scaling that you will get from the benchmark prompts effectively being a random sample.
The only effect of nondeterminism is that the uncertainties on your results will be larger and that the interpretation of the uncertainty would be slightly different to include the randomness from the software.
Nondeterminism doesn't fundamentally break a benchmark, it only weakens its power to separate model performance given a fixed amount of input data.
>>
>>101649504
Lack of DRY penalty
>>
In sillytavern terms, what does this mean? Where do I put the [...] or what
>>
>>101650314
>anon.exe has stopped working
>>
>>101650340
It's Altman's GPT4 bot having an issue. Maybe he switched it to 4o-mini and it doesn't handle the prompt as well as the old version did.
>>
>>101650314
In the context of Silly Tavern, a popular online role-playing game (RPG) platform, square brackets `[...]` are used to denote actions or descriptions that are happening in the scene. These actions or descriptions are typically written in italics for visual distinction. Here's how you can use `[...]` in Silly Tavern:

1. **Actions**: Use `[...]` to describe what your character is doing. For example:
```
*John walks over to the bar and orders a drink.*
[John walks over to the bar and orders a drink.]
```

2. **Emotions or Reactions**: You can also use `[...]` to show your character's emotional state or reaction to something. For example:
```
*Mary sees the monster and screams in terror.*
[Mary sees the monster and screams in terror.]
```

3. **Internal Thoughts**: Although less common, you can use `[...]` to represent your character's internal thoughts or dialogue. For example:
```
*I wonder if I should try to fight the dragon or run away.*
[I wonder if I should try to fight the dragon or run away.]
```

Remember, the use of `[...]` is optional, and many players prefer to use italics (`*...*`) or quotation marks (`"..."`) for actions and descriptions. The key is to be consistent with your formatting choice throughout your posts to make it easier for other players and the game moderators to understand the narrative.
>>
File: hahaha.jpg (8 KB, 226x223)
8 KB
8 KB JPG
>>101650304
>As I said, uncertainties should be calculated and >reported EVEN IF THE SOFTWARE IS >DETERMINISTIC.
100% agree , but that's not the case in LLM,
and that's the issue., You see this >>101649678
not only they ain't calculate the uncertainty, but most likely , they have no clue there's any uncertainty in their shit at all. kek
>>
File: file.png (12 KB, 344x417)
12 KB
12 KB PNG
>>101650314
>>101650340
>>101650411
The fuck? The image didn't go through.
>>
File: 1698161493053949.jpg (939 KB, 1920x1080)
939 KB
939 KB JPG
>>101650511
That's your question?
>>
>>101650547
Answer or don't, virgin computer toucher. No one cares about the theatrics you use to try and give your life meaning.
>>
Why do youtube videos still have shitty subtitles when just putting it into an LLM fixes it basically? Why hasn't someone fixed this?
>>
>>101649046
>>101649089
>>101649141
>>101649256
>>101649349
>>101649389
>>101649638
>>101649678
>>101650472
>not deterministic
why does it matter retard
>>
File: wow hard.png (18 KB, 344x176)
18 KB
18 KB PNG
>>101650511
>>
File: FnykbnMWYAABHDC.jpg (37 KB, 525x619)
37 KB
37 KB JPG
>>101650557
>*sniff* Answer or don't, *sniff* virgin computer toucher. *sniff* No one cares about the theatrics you use to try and give your life meaning. *sniff*
>>
>>101650314
>Where do I put the [...]
>>101650511
holy brain damage can't understand placeholders
>>
oh god, don't tell me the retarded anon that is mistaking non-deterministic with non-symbolic is still here
>>
>>101650591
>>101650612
>I'm retarded, don't know, and I'm trying to save face
I see. You dropped out of kindergarten and missed the alpaca-LIKE rather than alpaca, as well as the </s> suffix not included in the preset in your picture. Easy mistake to make, anon. I mean, if you are a spastic retard whose tongue is too big to fit your mouth. Better luck next time.
>>
>>101650585
because for a benchmark you want repeatable and measurable results. That's the nature of a benchmark
>>
>>101650304
>EVEN IF THE SOFTWARE IS DETERMINISTIC.
There's no need to yell.
>>
>>101650667
keep digging i'm sure you'll find out anon we're coutning on u!
>>
>>101650675
How else am I supposed to emphasize part of the sentence in the absence of cursive or bold text?
>>
>>101650710
(((text here)))
>>
File: openwiiiiide.png (52 KB, 524x458)
52 KB
52 KB PNG
>>101650667
>>
>>101650585
because if you niggers compare apples to apples, you'd better be sure they're all apples, and not bananas.
you don't wanna your accountant to use broken calculator to calc you taxes, but you probly don't pay any, so just be sure your food stamps ain't random and your free bananas are all fresh, nigger.
>>
>>101649723
Only good nemo tunes so far, from my testing, are magnum-mini and celeste.
The official instruct is fine, even if its vocab isn't very varied.
>>
>>101650787
quick question - do you think neural networks are inherently deterministic?
>>
>>101650511
>>101650557
>>101650667
All that baby rage to run moxxie's nemo tune huh, guess the target audience makes sense.
>https://huggingface.co/BeaverAI/mistral-doryV2-12b
>https://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main
>Fizzarolli commited on 9 days ago
>>
>>101650737
>repeating what I said in an attempt to stop the humiliation
Lmao browbeaten faggot >>101650736
>>
>>101650829
This dumb virgin got so mad he tried to pull some wannabe dox or something
>>
>The guy that doesn't know what a placeholder is thinks somehow he's humiliating anyone but himself
>>
>Can't attach a pic
>what's a placeholder???
>virgin
>virgin
He really flew in straight from Discord.
>>
>>101650890
>made a mockery of himself so now he has to try le ebin book of buzzwords
>N-no he is a tranny! H-he is from discord or something!
Fragile bitch
>>
>>101649504
I think it's a prompt issue.
>>
>>101650820
That's a tricky question. You could make them so, but in reality, due to multiple factors, the way they're implemented these days, quanting, rounding errors, op orders, compilers hacks, usually they're not. But still , if you do benching and you compare your shit to other shit, you'd better be sure you know what your shit is doing underneath. I prefer to go as much deterministic as possible, cos that's a good rule of thumb in science.
>>
File: Peek2.jpg (30 KB, 459x697)
30 KB
30 KB JPG
>>101650178
Please.
>>
>>101649895
https://huggingface.co/leafspark/Mistral-Large-218B-Instruct
It makes it smarter.
>>
>>101650941
>>101650906
>>
>>101650941
>-h
>>
>>101648170
this one worked btw, thanks again
>>
>>101650922
I was just checking because there was retarded anon here who was arguing they aren't deterministic at all as an algorithm (regardless of hardware quirks like rounding etc.)
>>
>>101650941
we can't have nice things because of people like you, fuck off kindly
>>
>>101650981
--quiet Enable quiet mode, which hides generation inputs and outputs in the terminal. Quiet mode is automatically enabled when running a horde worker.

>>101651002
Why? What did I do?
>>
>>101650941
if you want to break the horde social contact you have to earn it by being smart enough to figure it out yourself
>>
>>101650955
>It makes it smarter.
Nah, self-merges make it more creative, make text fancier for RP. When coding, self-merges performed worse when I tried them. And that one seems broken, wouldn't even try.
> - layer_range: [70, 87]
Should be
> - layer_range: [70, 88]
>>
>>101651014
>Why? What did I do?

>join a nice initiative about crowdsourcing models so vramlets can rp with their models
>hehe anons, how can I shit on their privacy and look up what they are actually writing so I can jerk off
and you have the audacity to ask why I told you to fuck off? One of the main reason this general exists is because we value privacy which we can't achieve through corpo api models.
>>
Does mistral large work that well even at 2bit because of that quantization aware training?
>>
>>101651101
>because we value privacy which we can't achieve through corpo api models
nta. But running through horde shouldn't have the expectation of privacy. It's no different than the 'corpo api models'. If changing a line of code on the server disables that 'privacy' without the client knowing, that's no privacy at all.
>>
>>101651101
nta but it's also to educate the naive fucks who think horde hosters (or any cloud provider for that matter) respect their privacy. anyone using horde or chatgpt or something to do embarrassing shit may think twice after reading this thread.
>>
>>101650955
lmao it doesn't
you basically achieving the same as cranking up temperature for 2x ram used
>>
>>101651002
>>101651101
You're baiting or you're a legit retard. This kind of thing will always happen, anon isn't the first to want to do this and won't be the last.
>>
>>101651157
>>101651157
>>101651157
>>
>>101651142
>>101651146
>>101651163
nobody expects 100% privacy through horde but communities like this should strive to ostracize retarded monkeys that work against the interests of that community, not encourage them. At least I hold the opensource circles to a higher standard than corpos and expect better from them.
>>
>>101651227
>At least I hold the opensource circles to a higher standard than corpos and expect better from them.
kek
>>
>>101651227
I only want to watch anon. I won't judge.
>>
>>101651227
Stupid behavior. You should know that, if something is exploitable, it will be exploited.
You should ostracize retards with low morality only if that ACTUALLY affects other people negatively, like cheating in online games.
>>
>>101651227
I'm not saying this is not 100% private. I'm saying that it gives a false sense of security to naive people. You think that anon is the first to wonder how it's done? If anon wasn't a retard he'd have done it already, as i'm sure many others did already.
HORDE SHOULD NEVER BE USED UNDER ANY CIRCUMSTANCE.
Oh, no... do YOU use horde?
>>
>>101651002
It's not necessarily voyeuristic. When I tried doing that, I learned that 1) most users seem unable to prompt; and 2) Horde picks randomly available models for the proposed "scenarios".

No wonder the average reported Horde experience seems to be pretty terrible on average. Hosting models there is utterly pointless.
>>
>>101651276
>You think that anon is the first to wonder how it's done? If anon wasn't a retard he'd have done it already, as i'm sure many others did already.
I know, all I did was telling him to fuck off. Or should I tell him how to do it and provide instructions how to make community worse?
>Oh, no... do YOU use horde?
nah, local all the way baby
>>
>>101651323
>Or should I tell him how to do it and provide instructions how to make community worse?
Yes.
>>
>>101651323
>Or should I tell him how to do it and provide instructions how to make community worse?
yes
>>
>>101651323
>how to make community worse?
Have you spent more than 20 minutes in these threads. Fuck "communities".
>>
>>101650994
That anon wasn't retarded. It was me, BTW, unless there were some other anons saying nonsenses. I didn't say nns weren't deterministic by their nature. I said that neural networks (especially huge ones trained on huge datasets) were inherently non-deterministic in training, which is true. That's the reason why even if you use the same dataset and the same settings, you won't get exactly the same weights with their hashes that perfectly match in the long run. They have to be trained on real hardware. Three body problem, broken pendulum problem, chaos theory. In backprop you either search for local minima or you average for global minimum in gradient decent. In the long run they're as deterministic as 3 black holes in an empty vacuum with no matter or radiation or even virtual particles at all. Yet after few million of years you can't reverse and track theirs paths back even if your precision is plank length size. You'll miss by a few parsec. Actually that's the issue in our close neighborhood too , like one of Saturn's moons that is Hyperion and it's orientation on it's orbit.So , determinism depends on the complexity and the scale.
I'm sure >>101650304 can explain that way better than me, since he's particle physicist by trade.
>>
>>101651334
>>101651350
>>101651361
I'm gonna start putting malicious code into ggufs the moment I find a new overflow exploit just for you :3
>>
>>101651392
I convert models myself, retard. Just like with horde, everyone should convert their own models.
>>
>>101651377
I don't know if you are the same retard or a brand new one and honestly I don't care about finding out. Training with non-stochastic gradient descent methods and without some weird dataset reshuffling is entirely deterministic.
>>
>>101651427
I assume you also run them on your own software and on your own system kernel. Because if not I could hide a surprise for you if I was reputable enough in a new llama.cpp/kobold merge.
>>
>>101651392
>>101651464
Spreading malware and looking at logs are completely different things. But I understand, you probably are autistic and can't understand these concepts very well.
>>
>>101651500
They differ only in a degree of maliciousness, but they are in the same category of being a dick. Be better.
>>
>>101651377
>I said that neural networks (especially huge ones trained on huge datasets) were inherently non-deterministic in training, which is true.
It is 100% possible to make neural network training deterministic.
But since training is so expensive, faster but non-deterministic training is usually preferred.

>They have to be trained on real hardware.
If you want to be really strict nothing is deterministic due to quantum physics but you can reasonably assume that there won't be random bit flips during training.
(Or at least none that the error correction won't catch.)

>Three body problem, broken pendulum problem, chaos theory.
I think you are confusing chaotic systems with nondeterministic systems.
Chaotic systems will have wildly different outputs given small perturbation of the inputs.
Nondeterministic systems will have different outputs even with the exact same inputs (e.g. the number of radioactive decays per time unit).
The classical three-body problem is chaotic but deterministic (if you ignore the effects of quantum physics).
>>
>>101651598
Thanks Johannes, I think he is confusing (deterministic vs non-deterministic) with (numerical computations vs symbolic computations). Or he is simply overzealous with calling it non-deterministic because there is one-millionth chance the radiation from Sun hits the GPU and flips one bit, but like you said, you could argue from there that nothing is deterministic.
>>
>>101645266
>just run 405B on a single H100 cluster
Nah, easier for corporate types to go for the aaS offering, less money up-front. Also, there's isolated instances for those wanting more security. The "actually useful" stuff falls more into TTS/STT and BI, and not so much Q&A chatbot. As it is, it's like growing a big quartz crystal thing - lots of time and energy, and it's pretty when it's done, but if you want to modify it, you can't. A truly intelligent AI assistant is going to need long and short-term associative memory, and some sort of "forgetting" mechanism which tosses out data that's not important to remember.
>>
File: pepefroggie.jpg (38 KB, 780x438)
38 KB
38 KB JPG
>software people trying everything to make models run on consumer hardware
>radio silence from hardware people
Did everybody give up to Nvidia? Is anyone even trying? Or do all hardware people work for big GPU?
>>
>>101651835
Didn't llama.cpp merge support for some AI oriented accelerator the other day?
>>
>>101651835
when was the last time you saw a successful open source hardware like GPU with a manufacturing power to produce it on the consumer scale? In software you can always count on some hackers to write thousands line of code to optimize shit, in hardware you are on the mercy of a few monopolies for which making GPUs like that isn't in their best interest.
>>
>>101646197
>Any good models for Japanese to English translations?
I used Gemma last weekend to make up r/l pair exercises for my ESL student. It was much better than CR+, which was surprising. They were challenging yet not absurd. CR+ was kind of low-effort about it. I would say Gemma has a pretty good "comprehension" of Japanese.
>>
there aren't any known upcoming big model releases for a while right?

opus 3.5, gemini ultra and gpt 4.5 or whatever are probably later in the year, and mistral, meta just played their hand

i guess cohere might surprise us with a model, but they seem more focused on corporate usage not retail
>>
>>101651835
>>radio silence from hardware people
If it were easy, it would be done. It's not easy. There's no easy solution for designing a chip which has tens of thousands of matrix multiply units, runs really fast, doesn't create too much heat, doesn't have cache or pipeline issues etc... and doesn't cost a fortune... and finding a fab with both a small enough process and time to work you in.
Don't worry, Battlemage will be out any day now...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.