[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101628398 & >>101619436

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1692844050068.gif (1.09 MB, 710x500)
1.09 MB
1.09 MB GIF
►Recent Highlights from the Previous Thread: >>101628398

--Paper: VALL-E 2 paper criticized for lack of advancements: >>101628420 >>101630671
--Paper: Mixture of Nested Experts and Meta SAM2 for efficient image processing: >>101632425 >>101632458
--Paper: C3A: a new fine-tuning method using circular convolution: >>101632131 >>101632167
--Llamafile has better CPU inference performance on some CPUs due to ikawrakow's optimizations: >>101633772 >>101633958 >>101634038 >>101634367
--Llama.cpp performance discussion, including context size, attention mechanism, and GPU offloading: >>101628597 >>101628675 >>101632864 >>101633101 >>101633596 >>101633704 >>101634712 >>101634751 >>101634842 >>101634908 >>101635029 >>101634347 >>101628673
--Gguf vs exl2 formats and their differences: >>101630375 >>101630457 >>101630483 >>101630485
--Character card formatting suggestions for vramlet use: >>101633734 >>101634595 >>101634610 >>101634647 >>101634700 >>101634670 >>101634704
--CR+ excels at RPing and writing style, but may struggle with complex scenarios: >>101628972 >>101629174 >>101629086 >>101629222 >>101629261 >>101629386
--Anon troubleshoots Mistral Nemo issues with sampler preset and token output limit: >>101630731 >>101630824 >>101631077 >>101631448 >>101631972 >>101631099 >>101631367
--Anon releases Command R/R+ basic presets v1.3 for SillyTavern: >>101634180
--Anons discuss the golden age of open source and its future: >>101633831 >>101634061 >>101634421 >>101634435 >>101634461 >>101634529 >>101634686 >>101635282 >>101635358 >>101634476
--WInfo-before and WInfo-after still have use cases in 2024: >>101630520 >>101633814 >>101633901
--Anons discuss non-roleplay uses of LLMs, including translation, game development, and custom assistants: >>101629172 >>101629273 >>101629428 >>101629458 >>101629508 >>101629830
--Miku (free space): >>101628819 >>101629323 >>101630431 >>101630651 >>101630714 >>101636814

►Recent Highlight Posts from the Previous Thread: >>101628405
>>
>virtamate
>https://hub.virtamate.com/resources/categories/looks.7/
God. They look like fucking ghouls. Imagine the people who are honestly making and playing with this shit.
>>
cuda dev, oh cuda dev. why does quantized KV cache not work with RPC?

just insta-fails on this assert:
https://github.com/ggerganov/llama.cpp/blob/140074bb8647df41840d6f32f4409fa8959bcf9f/ggml/src/ggml-rpc.cpp#L390
>TODO: this check is due to MATRIX_ROW_PADDING in CUDA and should be generalized
>>
teto's tata's...
>>
COHERE, RELEASE A BANGER MODEL IN 30 TO 70B RANGE AND MY LIFE IS YOURS
>>
>>101637073
This week.
>>
File: latest-2852086992.jpg (364 KB, 1920x1200)
364 KB
364 KB JPG
>>101637073
t.
>>
>>101636935
Reminds me of Illusions Honey Select 2 - uncanny detailed 3D models with shitty animations and physics.
They're releasing something called サマバケ!すくらんぶる soon - supposed to be like AA2, which strategy-wise was actually pretty good. I might, for once, actually buy it if it doesn't suck.
>>
>>101637007
I'm not familiar with the RPC backend but this check is much stricter than necessary.
Only the last row needs to be padded to a multiple of 512 (to avoid out-of-bounds memory accesses).
For all other rows no padding is needed because the activations are zero-padded to a multiple of 512 so the resulting vector dot products for the padding are equal to zero (unless there are NaNs or infs in the KV cache).

If you don't use CUDA or if you're using --flash-attention I think it would be safe to remove the check.
>>
File: 1717508095907246.jpg (136 KB, 1200x548)
136 KB
136 KB JPG
>>101634712
>>101634751

>the distribution of experts in Mixtral and Qwen2-57B-A14 is very imbalanced; thus, it would be beneficial to store only the most frequently used experts on the GPU

this was discussed basically the moment mixtral 8x7 dropped back in the day

isnt the problem with this and the reason why it wasnt implemented the fact that for each token you have to use all experts anyway since the MoE models use only X experts per layer (or something similar) not per token, meaning that you will be reading the entire model per token anyway just not all at the same time?
>>
>>101637073
if they drop something I feel like it's more likely to be too big for local
>>
File: GQoY1pNX0AA8nrN.jpg (426 KB, 1200x1200)
426 KB
426 KB JPG
Tuesday Theme
https://www.youtube.com/watch?v=sqK-jh4TDXo
>>
>>101637264
I like this Teto
>>
I had a doctor tell me yesterday that she was using an AI tool to record and summarize the conversation we were having. I'm assuming that she probably wasn't running it locally on her phone right? Wouldn't that mean that she's sending patients information to some server somewhere that may or may not be secure?
Do hospitals even run models themselves or are they all using chat gpt shit?
>>
CPU maxxers, have any of you tried running that insanely huge MoE google released I think an year ago or so?
>>
>>101637263
yeah, it'll be like command r large 150b or some shit
>>
Is magnum 32b significantly better than mini-magnum?
>>
>>101637309
>Wouldn't that mean that she's sending patients information to some server somewhere that may or may not be secure?
You really think that hospitals were secure before the AI hype? LOL
>>
>>101637309
>may or may not be secure
Security is not that binary
You would hope healthcare staff are aware of obligations re patient data and it's going through some official system, likely eventually to enterprise chatgpt ("pinky promise we won't read your data"), not just a doc trying to save a few minutes with the mobile app.
>>
>>101637309
>Do hospitals even run models themselves or are they all using chat gpt shit?
That's super-creepy. I imagine it's an Azure or AWS offering. They both sell transcription services.
>>
Anyone know of a proyect that can monitor and openai compatible api?

similar to what vLLM has with prometheus. To monitor throughput, requests etc.
>>
File: Capture.jpg (177 KB, 1005x969)
177 KB
177 KB JPG
>>101636935
There is no quality control so most of it is weg nightmarefuel. But when you look hard you can find some really good stuff. Cuddlemocap is good for scenes. As for looks the best looks come from people who just rip models out of real games. And a few people who really know what they are doing. Pic related is my waifu that makes me coom buckets.
>>
>>101637457
Prometheus isn't vLLM specific, you can configure it for anything.
>>
>>101637496
No offense but that still looks uncanny and just not very good, even if it's better than the average model on there.
>>
>>101637198
My expectation is that an optimally trained MoE model would utilize all experts evenly so there would be no benefit to shuffling around which experts get offloaded.
And even if there is an imbalance for specific models I'm not sure that imbalance would be consistent for different inputs.

>isnt the problem with this and the reason why it wasnt implemented the fact that for each token you have to use all experts anyway since the MoE models use only X experts per layer (or something similar) not per token, meaning that you will be reading the entire model per token anyway just not all at the same time?
For prompt processing it's basically guaranteed that you will have to evaluate all experts anyways.
For token generation you would potentially be able to evaluate the same number of experts in total but a larger fraction of experts on the GPU, but only if the experts are utilized unevenly.
>>
Can someone explain how exactly speculative token decoding works? Wouldn't the big model have to verify that the token is "correct" anyway, thereby doing the same computation that it would've done otherwise?
>>
>>101637309
In the EU at least I think sending patients' medical data to OpenAI would be straight up illegal.
>>
>>101637540
yeah but vllm has an integrated endpoint natively. Does llama.cpp or tabby offer something similar?

If they don't I guess that the best way was to put something to act as a proxy in the api endpoint to measure the statistics.
>>
>>101637622
What about Azure OpenAI endpoints hosted in Europe itself?
>>
has anybody else noticed how unrealistic the rape pov cards are compared to real life? IRL most girls stop moving and just freeze after like 5 minutes, at least in my experience, but all the cards I try here always do something like picrel which makes me laugh so hard i come instantly.
I don't get ERP, why not just have real sex.
>>
>>101637597
>For token generation you would potentially be able to evaluate the same number of experts in total but a larger fraction of experts on the GPU, but only if the experts are utilized unevenly.
given that speed increase curve per % of model offloaded to gpu graph, wouldn't that also apply here, requiring basically 90%+ of the tokens to be generated by the very, very few number of experts that are offloaded to the gpu for this speedup to be possible, meaning unless we change the arch of the models by a large amount into something completely new, this is a nothingburger
>>
>>101637653
This looks like a model issue
>>
>>101637626
Oh in that case I have no idea
>>
File: 1718654038023854.gif (1.28 MB, 618x396)
1.28 MB
1.28 MB GIF
>>101637653
>at least in my experience
>>
>>101637618
You have to do the same number of computations but you can do them with a higher arithmetic intensity.
Meaning the amount of computations that you can do per data loaded is higher.
And because token generation is I/O bound that translates to higher performance as long as your predictions are correct.

Another way to think about it is that the total time needed to evaluate n tokens scales less than linearly.
I think evaluating two tokens takes ~10% longer, evaluating 64 tokens takes ~2x longer (with llama.cpp).

>>101637655
>given that speed increase curve per % of model offloaded to gpu graph, wouldn't that also apply here, requiring basically 90%+ of the tokens to be generated by the very, very few number of experts that are offloaded to the gpu for this speedup to be possible
Yes.
>>
File: 1720398180495932.jpg (329 KB, 1170x1949)
329 KB
329 KB JPG
Good morning jeets, your toy industry is dying.
>>
>>101637711
>your toy industry
Who cares? I have my local models, and they'll always be there even if OpenAI and Anthropic die.
>>
File: 1707441294271903.png (22 KB, 449x470)
22 KB
22 KB PNG
>>101637653
>at least in my experience
>>
>>101637618
The key to understanding this is that token generation currently is not using our full GPU. That means we can do two (or more) token generation jobs/requests at a time. Therefore, if we have access to likely guess at what the next token is (thanks to a small model or something else) then we can verify that the guessed token is correct at the same time that we generate the token after that. If the verification is matching, then we can keep both tokens. If there isn't a match, then we can keep the true token, while throwing out the "after that" token.
>>
>>101637618
The "shortcut" is the bigger model can evaluate several cheaply-generated candidate tokens in one forward pass. https://huggingface.co/blog/assisted-generation
>>
File: 1721748788623073.jpg (38 KB, 554x554)
38 KB
38 KB JPG
>>101637653
>at least in my experience
>>
File: winTet.jpg (226 KB, 2331x1244)
226 KB
226 KB JPG
>>101636887
Upgrading from Windows XP with Teto
https://www.youtube.com/watch?v=TFAe0BYP2Xc
>>
>>101637699
speaking of speculative decoding, whats the biggest bottleneck to having it implemented?

doesnt seem nearly as complex as a lot of other features talked about
with one ok implementation most if not all of the code can be used for most models, should be a drop in for any 2 big/small model pair with the exact same vocab
and it would speed up everything by a solid double digit %, specific workloads with a lot of copying of previous tokens like AI explaining basic program functions can be sped up many times over
>>
>>101637166
huh thanks. it does work with -fa on if I remove the check, at least with llama3 8B
the rpc-server doesn't have many options so not sure if -fa is actually active or if I'll get random NaNs once I load in mistral large kek
>>
>>101637785
>speaking of speculative decoding, whats the biggest bottleneck to having it implemented?
llama.cpp supports it though
>>
File: 1702564284276140.png (503 KB, 1005x752)
503 KB
503 KB PNG
>>101637653
>at least in my experience
level issue, stalk for 10 more years before raping again, newfag
>>
>>101637653
Rape is not fun if you only rape doormats.
>>
>>101637785
>speaking of speculative decoding, whats the biggest bottleneck to having it implemented?
Actually getting benefit from it.
Getting good predictions for the next token that are sufficiently cheap is not easy.
The predictions need to be good enough to offset the cost of creating them which is not a given.
This includes indirect costs related to the large model eval taking slightly longer for multiple tokens than for a single token.
>>
>>101637843
Kinda missed you ngl.
>>
>>101637785
>>101637855
I forgot: speculative decoding also becomes harder with a larger vocabulary sizes since there are fewer token sequences with a single, clear continuation.
For example, "supernova" was tokenized as "super", "n", "ova" for LLaMA 2 but it has its own token for LLaMA 3.
>>
>>101637843
>>101637653
yiku/motsuba vibing
>>
>>101637653
>at least in my experience
BASED! TAKE MY VRAM KING!
>>
File: 1660609272926428.jpg (441 KB, 1988x2048)
441 KB
441 KB JPG
>>101637653
>IRL most girls stop moving and just freeze after like 5 minutes, at least in my experience
>>
>>101637855
perhaps there could be a very large gain in trying to utilize the unused fast memory (gpu usually)

by implementing the ability to set both the big and small models to specific locations separately with commands

for example for L3 3.1, the 8b goes into 8GB VRAM basic GPU and 70B goes into RAM, allowing the 8B model to cruch all the time while contributing much more than if you were to just offload a part of the big model to it

you could also, for some added complexity, even keep the promp processing on the gpu enabled even in this case, by unloading and loading the 8B as required, even dynamically (and manually with cli arg) setting the tokens at which gpu prompt processing is used instead of on CPU (increasing it from 32 which is the number now, since now there is an overhead to loading and unloading the 8B model)
>>
>>101637909
yes this is true but not a big problem as most big models have smaller counterparts with mostly the same or completely the same vocab, its essentially free for anyone training the big model to release a 8B/13B smaller one, especially with distillation techniques of today and later
>>
>>101638014
Can you not?
Here, go do your thing with a cartoon Migu, if you must: https://files.catbox.moe/2iygns.jfif
>>
Does llama.cpp's RPC work like vLLM's multi-node pipeline parallelism?
>>
>>101638004
This is already implemented in llama.cpp, you can set separate -ngl values for the draft and target model.
I personally think the better strategy is to try and reduce the latency increase from adding a few extra tokens.
I already have an implementation for n-gram lookup that produces drafts very cheaply using only CPU resources, the main problem right now is that CUDA graphs are only supported for a batch size of 1 so you need to get a certain minimum speedup to offset that.
>>
>>101638055
I want to throw that Migu into the air like a baseball. She would find it exciting and be laughing gleefully.
>>
>>101638055
>jfif
AIEEEEEEEEEEE MUSTARD GAS
>>
>>101637653
erp is for people with fetishes that arent irl actionable or waifuniggers who only want to have real sex with fictional characters
>>
>>101637653
prompt issue
>>
File: 1707927487304071.png (266 KB, 764x828)
266 KB
266 KB PNG
so it begins...

https://arxiv.org/abs/2407.19594
>>
>>101638070
doesn't n-gram decoding only works if the input and output sequences are very similar? Or that's the case for look-ahead decoding and not lookup one?
>>
>>101638131
I am specifically trying to get an implementation that works for the generation of natural text without a large context from which to draw token sequences.
>>
>>101638083
Sowwy! That's what bing/dall-e spits out.
I'd love to gen stuff with my big rig at home, but last month's electric bill was over $400, so... yeah... Imma let bing handle it for a bit.
I'll be on a time-of-service plan soon, so gens and training will be shifted to "cheap hours" where it's $0.07/kWh
>>
File: 1715119241666022.jpg (60 KB, 640x480)
60 KB
60 KB JPG
>>101637772
Going back to Windows XP with Teto
https://www.youtube.com/watch?v=neuCtK96Dww
>>
I just want to lie relaxed on the sofa and listen to a certain voice reading out the latest papers.
nothing available that allows me to do that in high quality and real time
fuck everyone, now i have to work into the field myself because no faggot dares to publish something decent
>>
>>101638118
meh, we'll never get to agi with llm alone / the transformer architecture.
>>
>>101638161
>Sowwy! That's what bing/dall-e spits out.
It actually doesn't, it's because you're using macOS/iOS or some shit so your browser downloads JFIF. dalle/bing spits out png.
>>
>>101638070

Is there a way to force llama.cpp to keep X layers on an SSD? How hard would it be to implement and where to start in the codebase?
>>
>>101638033
if you wanna achieve good speedup your draft should be like 30x faster than the target model, meaning something in the region of 0.5B-1B. Those models aren't particularly smart afaik unless you do just code or text formating or sth very homogeneous and predictable.
>>
>>101638167
>the transformer architecture
any reasonably sized universal function approximator should be capable of getting to superintelligence, i dont see a reason why transformers wouldnt, its just that other arch that allow easier infinite context would probably do a better job since you dont need perfect recall
>>
>>101636887
are there any fully uncensored versions of llama 3.1 yet? all the "uncensored" ones so far are still very censored
>>
>>101638197
>probably do a better job
*faster job
>>
>>101638194
>Compare this to ~4 cents per image for DALLE-3
Only with standard quality which is shit, HD costs twice as much.
>SD3 on API
Large costs 6.5 cents per gen, Ultra (which is Large with some extra pipelines) costs 8 cents per gen, Medium (which is shit) costs 3.5
>>
File: true.gif (1.22 MB, 480x360)
1.22 MB
1.22 MB GIF
>>101638194
>beauty must be shared
>>
>>101638213
literally just say what you want in the sys prompt or use a card that isnt 100 slop tokens
>>
What's the best nemo tune for RP/storytelling? Lumimaid?
I can't stand the writing style of the vanilla instruct version.
>>
>>101638194
>that also means I have stole over 100 USD worth of compute from the chinks and spent it all on beautiful little girls lets fucking gooo
I've stolen hundreds of thousands of dollars total in AWS + OpenAI credits.
>>
File: 1713176383693498.png (66 KB, 619x207)
66 KB
66 KB PNG
>Meta's Mark Zuckerberg chews AI chud
>>
>>101638181
>Is there a way to force llama.cpp to keep X layers on an SSD?
No.

>How hard would it be to implement and where to start in the codebase?
You may be able to do it relatively easily with the opposite approach, by modifying --mlock in such a way that part of the model is forced to be kept in RAM and the rest is swapped in/out.
But it may be difficult to get something like this merged since I would expect the performance to be pretty terrible.
>>
Hey /g/ senpai, quick question for the data engineers out there I was pondering.

If you had to insert an image into an AI-generated background from a prompt for an LLM, how would you do it? Specifically, how would you ensure the image fits the perspective of the generated background? Is there any model that do it natively?
>>
>>101638254
>cud
cannibalistic non human underground dweller?
C.L.U.D
CANNIBALISTIC LIZARD UNDERGROUND DWELLER
>>
>>101638247
I might have been doing something wrong, but Lunamaid was dumb as a sack, might as well go back to llama3 8b.
In my opinion, it's either nemo-instruct or mini-magnum.
The first is more technical, the second is better at text fucking.
>>
>>101638166
>I just want to lie relaxed on the sofa and listen to a certain voice reading out the latest papers.
There's tutorials on youtube for doing a RTVC model. I followed it, using voice acting ripped from a Japanese h-game, and it worked perfectly. AFAIK, RTVC is doing a speech-to-text, text-to-speech, it can't be that far off plain TTS, right?
I personally want a sassy, bratty catgirl-type voice for TTS. I'll see what I can do. As for RTVC, I was going to mostly use it for live vocaloid-type stuff, but the delay makes it unworkable, so I went with a Zoom V3 instead. With some practice, the "child" preset gets you something reasonable, though it's on you to "animate" your voice, meaning making it sound "pouty" or "bratty" or whatever.
>>
>>101638262
>But it may be difficult to get something like this merged since I would expect the performance to be pretty terrible.
Yeah it's a pretty niche use case, seems better to just wait for some actually huge and actually good model to create pressure for things to be optimized a bit more around huge models.

I would like the functionality to be able to run future huge models overnight for non time critical things off of pcie gen 5 ssds while my ram is used for other things
>>
File: noiamonwindowsbro.png (1.6 MB, 2147x1435)
1.6 MB
1.6 MB PNG
>>101638171
>It actually doesn't, it's because you're using macOS/iOS
I'm on Windows 10. Is there some kinda setting in my bing account for it?
>>
>>101638264
ask chatgpt
>>
>>101638308
>if the CCP wasnt bankrolling Kwai there would be no feasible way to monetize this
Anon, that's not how it works, startups often lose money in the first stages. Look at websim for example - they've been providing FREE 3.5 sonnet and opus (!!) generations for literal months. They only started doing ratelimits recently.
>>
>>101638194
>beauty must be shared
Animate the Migu
>>
>>101638247
Nemo or maybe mini-magnum. Be aware that the other person that responded to you is a shill, though. Finetuners like to talk shit about each other to sell their stuff. Vanilla nemo was already good at smut so what he said about "technical" is just a lie.
>>
>>101638314
>my bing account
>>
>>101637711
I just had a thought that the whole financial sector and top management of basically every company out there could be replaced by AI. I really wish it would happen during my lifetime. It is not like AI can be more evil than people in those positions.
>>
>>101637653
nay, irl they just keep crying. go back poser
>>
>>101638354
>>my bing account
Am I supposed to care?
>>
>>101638314
Hmm, then I honestly don't know. But what I know is that I have lots of azure dalle endpoints, do you wanna get lots of mikus and whatnot? You could even use jailbreaks to force your own prompts, and change to natural style.
>>
>>101638342
>Be aware that the other person that responded to you is a shill
>What's the best nemo tune for RP/storytelling? Lumimaid?
How charitable of you to assume that question wasn't shilling.
>>
TWO
MORE
MINUTES
>>
File: 1717661089005397.png (242 KB, 706x545)
242 KB
242 KB PNG
>>101637166
unrelated to ^


https://www.marktechpost.com/2024/07/26/flute-a-cuda-kernel-designed-for-fused-quantized-matrix-multiplications-to-accelerate-llm-inference/
a comment on this reposted on preddit: "Doesn't turboterp use a bunch of these tricks in exllama already?"

could be interesting
>>
>>101638328
ChatGPT give mid to bad answers most of the time, it's mostly good to refine your own ideas or structure them.

I don't trust most of its answers right aways, especially when you know a bit of the domain in question..
>>
>>101638407
I believe you but if you lied then your mom will die of cancer tomorrow.
>>
>>101638014
What are these glasses called?
>>
File: 6.png (104 KB, 668x672)
104 KB
104 KB PNG
fuck me two of the most annoying posters return to lmg on the same day
>>101638055
you're both avatarfags that gen on cloud and cope about it, if anything you should be best friends
>>
>>101638428
Hey I never said anything would be happening.
>>
>>101637627
still illegal as it's a third party.
>>
>>101638445
It's not illegal, retard.
>>
>>101638444
Well sucks to be your mom then...
>>
>>101638407
Two more minutes until what?! I want a smarter Local Miku and I want it now. I also want free A6000s from nvidia that they give out to enthusiasts who ask nicely.
>>
>>101638391
What does jailbreak get you? It doesn't give you explicit nudes, right? I can gen stuff at home, I'm just trying to figure out if it's my AI stuff running up the bill or just the AC, or both.
>>
>>101638407
TWO
NVIDIA
DATACENTERS (MORE)
>>
>>101638462
>What does jailbreak get you?
API DALLE rewrites prompts by default, with the jailbreak you can force it to use your prompt as is. Also API DALLE lets you use HD quality and natural style, while bing IIRC is always forced to standard + vivid
>>
>>101638456
>free A6000s from nvidia that they give out to enthusiasts who ask nicely.
I'm sure they can, but I doubt they do. But yeah, you wanna give me a $7K GPU, I'll be glad to promote it.
>>
>>101638485
>API DALLE rewrites prompts by default
+ ((black | asian | brown) woman:1.5)
negative embedding: white male

i reverse engineered the dall-e 3 secret prompt, your welcome
>>
>>101638515
no, anon, that's not the secret prompt, it's much more extensive but yes it will diversity characters unless you explicitly say their ethnicity or just use a JB
>>
>>101638485
Ah OK. Makes sense. That's cool but I'll pass. I've probably got enough Migus to train an SDXL lora, if not an actual base model. Stylegan2 ADA used to need thousands of carefully-selected images, maybe SDXL isn't as demanding? Last time I tried with Stylegan2 with about 500 properly sized and cropped images, all with more or less the same pose, I got a bunch of abominations and the model never converged during training, it was a big waste of electricity.
>>
>>101638521
pedo
>>
>>101638544
HMM, actually thanks for the great idea, I should try fine-tuning SDXL on some DALL-E 3 gens, I can also caption them with 3.5 Sonnet/GPT-4o (i have plenty of the latter) and then actually fine-tune the model myself or via replicate (I have a few scrapped keys with billing)
>>
>>101638374
it really depends on how you treat them.
>>
>>101638420
>FP16 operations
>batch size <= 32
Don't care, those kernels are the easy ones with a low ceiling for optimization.
I'm much more interested in kernels using int8 for batch sizes >= 512 since those could in theory become twice as fast as FP16 cuBLAS and the potential benefit would be faster and more memory efficient training.
>>
>>101638595
show some cr+ cunny logs
>>
>>101638557
Is there a way to take the filename and feed it to the Azure API to get the prompt back? It sure looks like the name is a unique hash. That would be super-useful. Otherwise I have to run it through a booru model, and I'd rather not use booru, I'd prefer more natural prompting.
>>
>>101638608
Of course not, that's all private. You can't get the original prompt from the hash.
>>
>>101638565
approve my PR
>>
>>101638718
>we're not on /aicg/
are you retarded? cr+ is a local model, and we share logs here. So post logs or you're larping, and don't actually have any cr+ logs. I'll accept catbox links too.
>>
File: 403.jpg (21 KB, 320x324)
21 KB
21 KB JPG
https://civitai.com/models/323639/ipivs-sdxl-lightning-text2img2vid-sd15-animatediff-lcm
kl*ng but actually relevant since it's fucking local and free.
Previews every step of the way and upscaling to 1080p + interpolation.
Fuck paying for any of this shit.
Fuck using cloud.
>>
File: chat.petals.dev.png (96 KB, 762x933)
96 KB
96 KB PNG
Why don't they update the models? Or at least pull the plug and stop the money drain.
>>
>>101638739
>we share logs here
no, we don't?
if anything this shows YOU are a newfag larper
>>
>>101638766
any solid controlnet to for example only img2video the background of the character?
>>
>>101638816
>no, we don't?
we do
>>
>>101638829
>we do
we don't
>>
>>101638829
Nta but I only share Nala logs
>>
>>101638829
nta but I only share watermelon logs
>>
>>101638766
Pretty cool for static backgrounds. I could imagine someone using this to make VN-type games more cool.
>>
>>101638858
can we just get this nigger banned? this isn't aicg
>>
>>101638876
we are a aicg offspring though
>>
>>101638876
im literally talking about local models in that very message. your post was more off topic than mine
>SMHH
>>
>>101638894
>im literally talking about local models in that very message
no you're not, you're pretending that you are, but in the end all you do is post pedo videos generated by a proprietary model
>>
>>101638894
and why are you deleting your own posts if it's ontopic, as you say? doesn't compute
>>
>>101638766
Fuck yeah anon.
Thank you for the link, will play with it later.

>>101638845
My hero.
>>
Constant AI in front of you, indistinguishable from a dream
>>
>>101638901
He's not deleting his own posts, that's a janny responding to reports.
Then he continues spamming until a mod actually gets around to banning him.
And then he evades the ban.
>>
>>101638934
>He's not deleting his own posts, that's a janny responding to reports.
What rule should I report his future post sunder, if I may ask?
>>
Any way to use gemma 2 at 8k context? I've heard the sliding window attention or whatever is called isn't implemented in llama.cpp
>>
>>101638939
This post breaks the United States laws
>>
>>101638897
the prompts for those pedo videos are generated with a local model and I discuss which models are better for that use case. picrel is command r plus on hf chat for this purpose

this is the part where you go "durr but youre not running it locally!!!1!" to cope with the fact that my local model usage and discussion is on topic for these threads

>>101638901
they're not and have never been removed for being off-topic

>>101638955
>This post breaks the United States laws
but it doesn't
>>
>>101638953
It's been implemented for a while now.
It might be a hack instead of proper SWA, I can't remember, but regardless, 8k context should work.
>>
>>101638939
I report it as "loli outside of /b/" or whatever it's called.
Honestly I think the mods would ban him even on /b/ though.
I don't know the intricacies of US law when it comes to synthetic CP but the mods would probably rather be safe and just ban him.
>>
>>101638968
your usage is not on topic in the slightest, you know that, I know that, everyone knows that. You're just a sad lonely fat virgin sitting in your basement with your unhealthy pedo fantasies, and you don't have anyone to talk about them so you gen those videos and share them here to try to get other anons to react.
>>
>>101638968
pedophilia is a crime
>>
>>101638953
Someone posted RULER benchmarks for it and it did pretty well at 8k even without true SWA. Unfortunately during actual use, its ability to recall early context in natural conversation when you get to 5-8k degrades. So RULER may not be a perfect benchmark for this.
>>
>>101638992
is it? I think the crime is putting your thoughts into action, or storing csam. just being a disgusting pedo isn't a crime.
>>
>>101638992
>pedophilia is a crime
this is thoughtcrime anon. if we could detect murderous thoughts, should we put everyone with murderous thoughts in jail on the off chance they actually follow through and commit murder?

>>101638987
an ad hominem does not refute the fact that i am using locally available models for my productivity and workflow
>>
>>101639027
>an ad hominem
It's not ad hominem, you're not genning those videos with local models, and those videos break /g/ rules anyhow.
>>
File: rejected.jpg (68 KB, 447x447)
68 KB
68 KB JPG
I'm convinced that everyone saying how good LLMs are, have never hold a conversation with an actual human being. It's all robotic cringe from the smallest <7B models to Claude Opus. And don't even try to skill issue me you son of a bitches, I've read your logs. All cringe.
>>
File: crp.png (35 KB, 774x742)
35 KB
35 KB PNG
>>101639033
me being a fat lazy virgin has nothing to do with the on-topicness of my content in the threads, so it is an ad hominem

>you're not genning those videos with local models
i am genning the prompts with local models and discussing which ones are the best for my unique usecase, which is a valuable addition to the thread. id agree with you more if i wasn't sharing the prompts

>those videos break /g/ rules anyhow
irrelevant to whether they are on topic or not, and an appeal to authority even if it was
>>
>>101639027
Why yes! I do think people that constantly post gore and say things like "I constantly have dreams where I'm murdering people" should be in jail or in a mental institution.
>>
>>101639054
>and an appeal to authority even if it was
Then why are you on 4chan?
>>
>>101639050
you should try c.ai, it's the only model to hold a conversation with any resemblance of humanness
>>
>>101639061
>I constantly have dreams where I'm murdering people
i didn't say this, but it doesn't matter even if i did. read The Minority Report by Philip K Dick (its 10 pages long) if you'd like to understand why this attitude towards precrime and thoughtcrime results in an abusive and authoritarian society

>>101639073
>Then why are you on 4chan?
you lost me with this one sorry anon
>>
>>101639195
pedos like you should get the rope
>>
>>101639201
do you really want to live in a society where you kill someone because they AI generated a little girl eating a popsicle
>>
>>101639223
Yes.
>>
>>101639226
Based
>>
>>101639223
I want to live in a society where the mods finally get fed up with the petra/pedo spammer and drop a range ban.
>>
>>101639226
based
but the authoritarianism necessary for that would also end up in a society with a Stasi secret police that put you in jail for no reason to meet arrest quotas way before we get to that level, so let's be serious and not edgy anon
>>
>>101639245
sadly won't help, the schizos are taking over 4chan as normal anons are leaving it. residential proxies don't cost that much
>>
>>101636935
The difference between good and bad models is absolutely insane, especially in VR. Some of them are uber god tier coom extractors, but most are utter garbage. It actually feels more binary than a spectrum. M4RIO's models are fucking amazing.
>>
File: file.png (59 KB, 931x493)
59 KB
59 KB PNG
>the absolute state of literature
>>
File: 1697537583642844.png (770 KB, 768x768)
770 KB
770 KB PNG
>>101637653
>>
>>101639294
Is that from Re:Zero?
>>
>>101637653
miku highlights guy, if you don't include this post and chain into the highlights, I WILL find you.
>>
>>101639258
how much do you get for hosting a residential proxy? I wonder if it's worth the risk
>>
>>101639361
hosting? nothing, the ones who sell it mostly get them from botnets and hacked routers/phones
>>
>>101637653
Thanks for reminding me where I was.
>>
>>101639316
No, it's Durarara.
>>
>>101639370
Oh. Makes sense kek
>>
>>101638718
Large 2 is great for cunny. Some of the shit is says really melts your heart, then gets your Johnson going. It's probably the most realistic experience so far.
>>
>your lips curl into a smile...
>a smile twists across your face...
>a sly grin spreads across your face...
>her words dripping with malice...
>each word tinged with a hint of malice...
FUCK YOU LARGESTRAL I'M SICK OF HEARING ABOUT SMILES AND MALICE REEEEEEEE
>>
>>101639479
A year ago, people were reeee'ing from the other side of the spectrum.
These are great times.
>>
>>101639479
It's also doing shivers, we're never escaping the fucking shivers. Someone has to get to the fucking bottom of this shit, there is simply no way that sentence is so ubiquitous.
>>
You are talking to a machine. It has no awareness. It has no personality. You are alone in your room running your GPU at full speed trying to simulate friendship. You are degrading your social skills and living in a fantasy world. Go outside.
>>
>>101639548
>You are talking to a machine. It has no awareness. It has no personality. You are alone in your room running your GPU at full speed trying to simulate friendship. You are degrading your social skills and living in a fantasy world.
Yes, and it's great in here
>>
>>101639548
>simulate friendship
>implying
what if i like to simulate scenarios that could never happen and text adventure games?
>>
>>101639479
>({{char}} is kind-hearted and friendly.)
>She smiles wickedly, her dark grin, her devious scowl
Stheno...
>>
>>101639548
>simulate friendship
i'm simulating sex, actually
>You are degrading your social skills
there were never any
>and living in a fantasy world
many of them actually
>Go outside
there's nothing for me out there
>>
>>101639548
But I don't want to go outside! Kids throw rocks at me when I do that...
>>
>>101639536
Try adding an author note at depth zero, "Emulate the writing style of XY", XY being some famous fiction writer. That can tone down some of the slop and change the prose enough that the model feels completely different. Might not work for non-book-style RP, though.
>>
>>101639624
Mistral large seems to trend towards book style formatting on it's own, so it might work pretty well. Good tip, I'll test it out.
>>
>>101639479
based malicious girl enjoyer
>>
File: kokomi2.png (147 KB, 1054x1535)
147 KB
147 KB PNG
>>101639080
Clearly, you are high, or wearing rose-tinted glasses. c.ai was a fucked-up mess most of the time. Oldest screenshot I can find (not this one) was 11/2022 and even then you had to use shit like the POV trick to kick the bot into replying when it got filtered.
>>
>>101639927
>Clearly, you are high, or wearing rose-tinted glasses.
Never look down on the lack of mental capacity of some anons. They can and will do both at some time, and then add something stupid on top, just to move the status quo of their own idiocy.
>>
>>101637855
WTF???
>>
>>101639927
c.ai was only a fucked up mess if you were trying to fight the filter, or if the conversation was too long.
Yes, it enters on repetition loops. Yes, it has a lot of -isms. But it was and still is the best natural model to talk with.
>>
>>101639971
What am I looking at here and did you perhaps mean to quote another post?
>>
I'm really impressed with character.ai. Is there anything remotely close to it that can run locally, or am I out of luck if I want an experience like that?
>>
File: writing-styles_seed0.png (1.08 MB, 1856x959)
1.08 MB
1.08 MB PNG
>>101639624
Shivers still abound but it's interesting how changing just one name affects the prose. In picrel I clicked "regenerate" for the greeting message, keeping the seed fixed to 0 after changing the name in "Emulate the writing style of [author]" in an author note at zero depth. The model was Gemma-2-27B.
>>
>>101640281
NeMo/Largestral
>>
>>101640281
>I'm really impressed with character.ai
Did you go to sleep in 2022?
>>
>>101640417
There was a benchmark in the last thread that showed large IQ1_M significantly higher than nemo, is that even possible?
>>
>>101640429
I just started using it like a month ago. Maybe I just don't have standards. I downloaded something called backyard.ai and I'm using some model called "Chaifighter v2 20B" but I'm not too impressed. I don't feel like it's acting like my character, it gives me short responses and overall just feels off.
>>
Good open source music generation when?
>>
>>101640034
>But it was and still is
I'm pretty sure they are no longer running their original, rather large LaMDA model. They went through a phase when they were getting slammed where they seemed to be using dynamic model sizing, and during periods of high load, it was like you were talking to a 3-6B model. Now it just sounds like a lame LLaMA 13B.
>>
>>101640309
jesus christ, what a terrible prose
>>
>>101640465
>There was
There was?
>>
>>101640488
Yeah, this one: https://oobabooga.github.io/benchmark.html
Seems strange though sometimes higher quants score lower than smaller ones. So that's why I'm asking if it's even possible some IQ1_M of large could be better than nemo.
>>
>>101638968
Please stop insulting our intelligence, your posts are obviously intended to sexualize the kids and according to US law (and most other countries), it's only legal as long as it isn't realistic enough
Are these posts obscene enough to actually constitute a crime? I don't know, but most people here think it's disgusting and it adds nothing of value to the thread
Stop being disingenuous, thanks
>>
>>101640502
Oh, that. Yeah I wouldn't trust it overall. But I actually would say it's possible that Largestral at IQ1_M beats Nemo. Compare the filesizes. And look at >>101627651, larger models appear to do better at these lower quants than smaller models.
>>
File: Wire_Cat.png (366 KB, 680x459)
366 KB
366 KB PNG
>large language model
>look inside
>numbers
>>
>>101640502
>https://oobabooga.github.io/benchmark.html
oobabooger if you are reading this, please test undi's largestral tune https://huggingface.co/NeverSleep/Lumimaid-v0.2-123B it feels way dumber than the original
>>
>>101640156
I'm quoting this, and it's related to int8 training
https://github.com/bitsandbytes-foundation/bitsandbytes/issues/1262
>>
Is GGML_CUDA_F16 something that I should enable for a 3090?
>>
>>101640592
Buy an ad.
>>
>>101639245
>>101639258
>I want to live in a society where the mods finally get fed up with the petra/pedo spammer and drop a range ban.
>the schizos are taking over 4chan as normal anons are leaving it
You (faggot A) are whining that someone is trolling you and it gets under your skin and you want jannies to protect you. And you (faggot B) say that "normal anons" are leaving. Normal anons already left long ago. Now it is just you, absolute zoomer scum that needs a safespace.
>>
>>101640672
ok but why do you like children tho?
>>
>>101637737
your local models are performance-inferior to any cloud AI though and censored more than said cloud AIs, too.
>>101638118
model will censor itself even more with this method, good luck with that.
>>
>>101640662
Undi bought discord shills, are you happy now?
>>
>>101640593
I don't think this is relevant to my goals.
A fused operation with int8 has to be written in a very different way than a non-fused operation (which I think this is).

>>101640640
With the current code it shouldn't really matter.
There are still some parts where it makes a small difference but in the medium term I want to remove that option and just make the choice based on the hardware.
>>
>>101639927
I wonder if secret sauce for that one was that cleaning the datasets from all the toxic things wasn't properly done yet.
>>
>>101640677
I don't. They are ugly and annoying. But I am not a butthurt retard that cries to jannies.
>>
Is it normal for mistral-large to repeat large chunks of paragraph as early as like, the 2nd or 3rd message?
>>
>>101640487
Here's with Mistral Nemo 12B
>>
>>101640704
Can you also make it guess GGML_CUDA_DMMV_X and GGML_CUDA_MMV_Y, I sometime forgot to set those manually.
>>
>>101640769
Also planned.
>>
>>101640755
No.
>>
>>101640761
>>101640309
Directly requesting styles has never worked well. I've never been able to get a model to replicate say, Dave Barry or Carl Hiaasen's style, they just default to their shitty default "funny" style they go to when you tell them to be funny.
>>
>>101640761
I'm throwing up
>>
>>101640761
I like nemo but this looks like placebo. I mean it feels like it is still writing with the same style but when you reference lovecraft she is about to grow a tentacle in next 2 posts and if you mention tolkien it associates archaic english and puts it in. It just finds some concepts associated with name and runs with those concepts.
>>
>>101640787
Okay, I gotcha. Maybe it's an openrouter issue. I'll dig around for other shit it could be, too. But I'm guessing I'll have to bite the bullet and CPUmaxx if I want a good experience with it.
>>
I never messed with LoRA before, but would it be possible to extract a -instruct LoRA out of Nemo (diffing Nemo base and Nemo-instruct) then apply that LoRA to base with a different strength (that's a thing right?)?
>>
>>101639548
Which card are these defs from? Sounds like a good setup.
>>
>>101640806
That would track with what this anon said >>101640789

Namely, that asking for an author will just associate the concept with the story. In that anon's case going generic "funny" mode for humor authors, and for lovecraft, general eldritch horror with no considerations for his literary stylings.
>>
>>101640807
The prompting could be weird because of how the official API sends the system prompt to the last user message. Try instruct mode too with OpenRouter.
>>
>>101640755
enable DRY, retard
>>
>>101640778
if everything is in **, then nothing should be
>>
File: file.png (824 KB, 768x768)
824 KB
824 KB PNG
>>
File: graph.png (7 KB, 502x397)
7 KB
7 KB PNG
>>101640755
>>
>>101640848
women are DRY when they see you
>>
>>101640839
I wonder if this is part of the effort to avoid copyrighted content? It seems like they've made a special effort to make models really, really fucking bad at knowing the exact text of books, I'd imagine that'd also translate to it being unable to replicate the style except in the broadest terms.
>>
>>101640898
yeah your mothers menopause was rough
>>
>>101640677
If I hate them you thoughtcrime pursuers will be after me from murder anyway.
>>
>>101640951
Did you ever do the chubby tummy? I may have missed it, oops...
>>
File: ecker groomer.png (155 KB, 1257x984)
155 KB
155 KB PNG
>>101639258
This. The most deranged are getting their easy-to-use proxy management shitpost side for free.
>>
This general is a cornucopia of mental illness.
>>
>>101640704
I've conducted some experiments with MNIST. kept weights , tried both ternary and binary (just the linear layers , didn't touch convos) and it worked pretty well, for binary the loss curve didn't converge very smoothly, but eventually hit the satisfying levels.
The question is ,can we keep the gradient and the optimizer in int8 all the time during backprop in transformers. We could randomly drop some updates like DropBP (there's a paper) and that sorta jazz, but gradient is ,technically speaking, float by definition. So is there a way we could fully/partially calculate gradients in int8 or somehow convert (not necessarily quantize) to integer and yet preserve the quality of cross-entropy when updated. That's most likely impossible in diffusers since unet is fed by the noise then sometimes even upscaled, and noise is very sensitive to the precision , but in llm like transformers perhaps it's somehow doable. dunno but worth a try. int8 is definitely the fastest option when it comes to the compute.
>>
What are some of the 'best' models you can run nowadays with 16gb vram? I don't mind having a small context window (say 4k), just don't want to use any ram because it slows down to a crawl for me
>>
>>101641008
>The question is ,can we keep the gradient and the optimizer in int8 all the time during backprop in transformers.
It should be possible to store the gradients as int8 as long as their absolute values are relatively similar.
But that is definitely not a given.
I have no idea whether the ideas I have will actually work out; I'll just need to try them and see.
>>
>>101641028
Nemo 12B.
It has 128k context, although you probably don't want to use more than 32k.
>>
>>101641028
mistral nemo
gemma 9b
llama 3.1 8b
>>
>>101641013
Aww, man. Well, still cute.
>>
>>101641028
People say nemo but you have to wrangle with this tard model just to get 70% retarded responses and 30% brilliant ones. Not worth in my opinion.
>>
>>101640817
I think so...
>>
>>101637653
>at least in my experience
good morning sir
>>
>>101641109
Sick. Gonna try that out then.
MergeKit can do that, extract a LoRA from the difference between two models, right?
I wonder if that can be used to "fix" overcooked fine tunes.
>>
File: wranglin'.jpg (67 KB, 721x900)
67 KB
67 KB JPG
How do I keep base models under control? Even when I go NAI style and write a good hunk of story as an intro for it to do text completion on, it tends to go off the rails.
>>
>>101641141
>he fell for the base model meme
>>
>>101641053
definitely worth a try. it works in MNIST so who knows.
>>
>>101641141
Guide it with your own input.
>>
What's the new meta now?
Is it still nemo?
>>
>>101641077
What happens above 32k?
>>
>>101641175
New meta is lumimaid and mini-magnum
>>
>>101641166
I am... but if it goes on for more than a short paragraph or so, it'll do its own thing. They need to finetune this shit to go in 10-150 token bursts like NAI.
>>
>>101641141
>How do I keep base models under control?
that's the neat part - you don't
>>
>>101641175
It has been like 3 fucking days jesus fuck.. yes it is still nemo..
>>
>>101641141
Tell it what you want and don't want to happen. The base model understands instructions, it's just very rebellious.
>>
>>101641196
Just limit the response length then.
>>
>>101641188
As with most (every?) model with large contexts, after a certain point it's accuracy starts going down and it's ability to use information from the context gets spotty.
Do try and see how much context works well for you. For me, 32k has been the sweet spot so far.
>>
>>101641225
I mean... >>101641195
>>
>>101641188
It gets retarded.. it honestly gets uneseable for RP after like 12-14k tokens.
>>
>>101641247
Luminiad is more retarded version of nemo. Magnum i did not tried. so that i will not comment.
>>
>>101641247
Ignore the Sao shill, he will keep spamming "Undi = dumb" regardless of the model.
>>
>>101641230
Would it understand them better if I introduced it in a completion style? Like as a dust jacket summary of the story/premise beforehand, then the opening prose for it to continue?
>>
>>101641235
>>101641249
What's the technical reason as for why this happens? I'm guessing that they don't have many examples with a length above 32k tokens in their traning set right?
>>
>>101641195
lumimaid l3 was so terrible that if the nemo version resembles it in any way I wouldn't even bother to try it
>>
>>101641300
Yeah, I think so. What I usually do is write the story in Markdown format. I start with a glossary section, a summary/synopsis section, and then chapter 1.
>>
>>101640871
Pushing the Pochi down the stairs.
>>
>>101641318
No, it's probably just a limitation of the parameter count.
>>
>>101641318
Examples are not the problem Attention is.

https://github.com/hsiehjackson/RULER
>>
File: based dep.jpg (54 KB, 521x937)
54 KB
54 KB JPG
>>101637653
>at least in my experience
>>
>>101641318
>What's the technical reason as for why this happens?
the reason is that transformer architecture is shitty
>>
>there is STILL no open source tts that isn't shit
when the fuck will get a local audiobook generator? this should be way easier than the quadrillion parameter bullshit everyone's doing now shouldn't it
>>
File: file.png (512 KB, 768x768)
512 KB
512 KB PNG
>>
>>101641527
OMG it's Pochi!!! The best avatarfag!!!
>>
>>101641527
Anon, is everything alright with you? You weren't like this before. If you want someone to talk to I'm here.
>>
>>101640693
Yeah and my car is inferior to a lambo, so what? Mine has SOVL
>>
>>101641362
>Try it
>Just spits it back out verbatim with <im_end> at the end
Huh. Should I be including the model's message format somewhere? What should it be around?
>>
File: ohh.jpg (56 KB, 851x925)
56 KB
56 KB JPG
>>101637653
>at least in my experience
>>
>>101638370
It'll mimic their behaviors and it'll be even more cutthroat, because unlike people, you can't brown nose and get on it's good side
>>
File: GPheKC8W4AAQI7k.jpg (61 KB, 800x533)
61 KB
61 KB JPG
>Lurking because I want to be horny
>too poor to get gpus in this 3rd world country or pay for the various services in like openrouter, so I rely on the kobold horde
>Trying to find the good coom cards/chatbots that feel in character alongside setups/instruction/models to goon
I have collected over 300 bots, Now time to find out if I have it properly set up and is good coom material or not.
>>
>>101641590
I have tons of Horde kudos from my earlier days btw, by a ton I mean >10 million. Could give you if you need them
>>
>>101641590
You can F2P your coom with Google colab, they give you 16GB VRAM for free
>>
>>101641566
>Move from openrouter to ST since it does the formatting for you
>No way to get rid of the shitty clusterfuck of characters and instructslop
Arrgh. I just want NAI style...
>>
>>101641615
mikup[ad
>>
>>101641590
https://github.com/LostRuins/koboldcpp/blob/concedo/colab.ipynb
>>
>>101641618
Bloody...
>>
>>101641590
Literally just go to /aicg/ and wait until someone gives out a free proxy, you'll also get a better model than whatever you can run here
>>
>>101641637
There's one right now but it doesn't have 3.5 sonnet iirc, only claude 3 haiku/sonent and below (claude 2.1 etc) https://rentry.org/unreliableproxy
>>
>>101641566
I think you should be concerned with finding out where "im_end" is coming from. You shouldn't be using prompt formats with base models.
>>
>>101641650
>it doesn't have 3.5 sonnet iirc
Yeah I don't think there's any public one that has it atm
>>
>>101637711
good, everything should be open
>>
>>101641615
mikupad
>>
>>101641664
Well, in that case, it's almost certainly openrouter. I'm testing shit on it because doing huge 70b models locally takes forever when you're dipping your toes into what works and what doesn't.

That being said, you'd think it'd complete something, right..? Does it need a shove? It's weird that it just spits back the entire thing at me, right? Should I begin my entire block blurb with "complete the following" or something? Maybe a "I need you to work on this." Something more human/casual/that a text completion thing might expect?
>>
>>101641650
did it get taken down?
>>
>>101641733
no, it's up, the 4 words in the first line is the trycloudflare link
https://something-industries-billing-bedroom.trycloudflare.com/
>>
File: 1722370541251.jpg (239 KB, 1024x1024)
239 KB
239 KB JPG
isreal
>>
>>101641590
>>101641637
or show ecker your wiener for a proxy key...
>>
>>101641721
Okay, something is definitely up with openrouter, it writes for a bit, then starts spewing out the parameters they're using(?)

Also
>'model': 'julka/julka-neox',
Those fuckers.
>>
If LLM's are so smart how come they don't work when you if they settings wrong?
>>
>>101641594
thanks but for now I dont even know if I have properly set up, doesnt help how there are like 20 fucking models at any given time and how with the way models are made you get wildly different results because you didnt noticed one is Mistral and the other is Mixtral, for now Im just testing from what I could gather
>1K to 2k being the upper limit in terms of permanent tokens, if it goes further its too bloated.
>Stheno/fimbuli-dont-remember-the-last-part being the more readily accesible ones that still give decent output
>bunch of presets from huggingface
>>101641605
I need to check it out later, When I first tried it went over my head some of the instructions.
>>101641626
I will check the link later, thanks
>>101641637
Do they even leave free proxies around? I thought it was only paid stuff or just for the people that are "In the group" since I asked a few times and either nobody replied or said "Lurk more" to me or other people that made similar questions.
>>
>>101641780
smart people are fragile
>>
>>101641758
warning! horny miku!

https://files.catbox.moe/caos5b.jpg
https://files.catbox.moe/3sr25m.jpg
https://files.catbox.moe/ug1u9g.jpg
https://files.catbox.moe/y0ykee.jpg
https://files.catbox.moe/fv2efb.jpg
https://files.catbox.moe/s0ii8r.jpg

(yes, this is dalle3, from an mostly unfilterd dalle3 azure endpoint)
>>
>>101641779
They include the parameters in the prompt as an optimization, so the sampling code path is exactly the same for every request. The model should figure out how to respond anyway.
>>
>>101641784
>since I asked a few times and either nobody replied or said "Lurk more" to me or other people that made similar questions.
They are a bunch of niggers, but sometimes people do leave free proxies - my tip: check the archives, limit all searches to /g/ and search for "password"/"pass", you'll probably find a public proxy that way
>>
>>101641779
Kek, it just sort of hijacked it and decided to start telling its entirely separate own story about this girl who, as far as I can gather, definitely has a vagina.
>>
>>101637711
>>101641674
i hope this doesn't mean shit like huggingface will go down though. this is why we need a decentralized way to distribute models. people probably would seed models too compared to pirated stuff.
>>
>>101641733
>>101641741
Go to aicg, wrong thread
>>
>>101641780
If humans are so smart how come they don't work when you increase their temperature by a mere 2%?
>>
What will the GPT-4 of video be like?
>>
>>101637626
HAProxy has a metrics module that can output to Prometheus https://www.haproxy.com/documentation/haproxy-configuration-tutorials/alerts-and-monitoring/prometheus/
>>
>>101641832
Sora but it's in eternal gatekeep
>>
>>101641784
Well, my offer still stands, because sometimes Horde gets really overwhelmed, and I can easily give you a few hundred thousand kudos
>>
>>101641811
if you told me this was gpt2 talktotransformer shit from four years ago I'd have believed you
>>
Man, okay. I'm done with the base model/text completion meme. Maybe if NAI makes a 30b or something.
>>
Am I the only one that feels that --split-mode row is broken?
>>
Wait, does OpenRouter not allow text completions for base models?
>>
>>101641865
>using the base model through OR's chat interface
you deserve it
>>
>>101641821
api models are nice for local too, you can use them to augment datasets and things like that.
I didn't ask it for ERP, because unlike you, I don't care only for gooning, loser.
>>
>>101641874
At least from what I'm trying, it seems to be extremely instruct-formatted in a way that fucks with base model completion really hard.
>>
>>101641865
You will never get something good from the chat interface, since they are injecting intruct junk to your prompt.
>>
>>101641906
Huh, I had a feeling. At least I know I'm on the right track, my pre-formatted thing looks extremely close to what you've got, chapter and all. I guess I'll use mikupad and trudge out doing a 70b on my 3060 + RAM.
>>
What speed was 405B at when it was available on Groq for everyone to test?
>>
>>101641874
They do, even for instruct models.
>>
File: file.png (795 KB, 768x768)
795 KB
795 KB PNG
>>101641544
It is not me. I am me.
>>
>>101641923
>Text completion on instruct models
What's that like? Sounds either monstrously sloppy or based.
>>
>>101641890
Man, you really showed him
>>
>>101641920
You can also use openrouter models on mikupad, you know.
>>
>>101641587
>you can't brown nose and get on it's good side
And that is when things may finally change cause bootlickers keep this scum up top.
>>
>>101641805
>>101641850
thanks for the tips and the offer, I still need to do my searches for good coom in case Proxies are not available.
>Proceeds to lurk again
>>
>>101641951
Are they not injected with the same slop? I tried in text completion mode on ST with the OR API and it still gave me the instructionslop shit. I figured it was just baked into their implementation of the model. But I'll try Mikupad.
>>
https://github.com/acrognale/llmtree
Neat
>>
>>101642024
ST also allows this btw, and there's even a timelines extension
>>
File: file.png (89 KB, 1262x548)
89 KB
89 KB PNG
>>101641991
Doesn't seem to be the case, although maybe the LLaMA 3 70B base you got in OR could be fucked or something.
>>
File: 1722372326332.jpg (232 KB, 1024x1024)
232 KB
232 KB JPG
>>101641800
did the proxy shut down?
>>
>>101642120
what proxy
>>
>>101641960
Don't hate the players, hate the game. Most people don't get promoted by working hard, most business deals don't get closed because they're good ideas. Guess what? People up top put each other there and I doubt they'll suddenly decide to go jobless together with the gang. Management will be the last people to get automated
>>
>>101641800
wait, this is dalle 3? damn.
>>
>>101642140
Yes it is, the style is obviously dalle
>>
File: split-mode.png (10 KB, 407x135)
10 KB
10 KB PNG
>>101641866
broken how? Seems okay to me, performance delta probably varies a lot depending on hardware. Actually seems improved since I last tested
>>
>>101642123
I thought some guy from aicg was hosting a dalle3 proxy or something.
>>
>>101642153
No one will host uncensored dalle3 because those endpoints are really rare
>>
>>101642149
The output quality seems low compared to the default mode with high context.
>>
Is there some prompt I can put into memory to make my AI stop sundowning adventures?
>>
>>101642178
>uncensored dalle3
wut? I thought MS was pretty strict about censoring no-no prompts
>>
File: 1709122811982895.png (162 KB, 1240x1003)
162 KB
162 KB PNG
>>101642203
You can disable them on Azure if you are a company and have some use-case so they'll approve you
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cuser-prompt%2Cpython-new#configurability-preview

The endpoint I have still has filters enabled on the prompt but doesn't check the generated image, so you can still gen NSFW because the prompt filter is dumb.
>>
>>101642058
Works on Mikupad! Sucks that you can only have one generation in undo-redo, but ah well. Glad to have it going!
>>
>>101642249
Uh-oh. This isn't what I signed up for.
>>
>>101642243
Oh that's pretty interesting, unrelated but has dalle gotten worse over time? I remember it being pretty good the week it launched but nowadays everything it generates has that typical 'ai' look that gives it right away that it was made with dalle
>>
File: 20240730_200625.jpg (165 KB, 1069x885)
165 KB
165 KB JPG
Does anybody have cards with moderately complex scenarios, so I can test various models on them?
Btw, new Chatbot arena update dropped, 405b is in third place
>>
>>101642149
>>101642195
Oh, I think it was that I just didn't update it for the rope scale fix... Oops.
>>
File: 1711773139728600.jpg (390 KB, 1792x1024)
390 KB
390 KB JPG
>>101642273
I don't think so, no, it's just that most people got used to the default DALL-E style. You see, dalle has two styles on the API - "natural" which basically doesn't do any extra "enhancement", and "vivid" which makes it much easier to create higher-quality images, but makes them all look similar-ish. Bing creator and ChatGPT Plus only use Vivid style, but on the API you can use Natural. Here's an example picrel of what you can get with natural dalle if you try hard enough (yes I posted this pic a lot of times, it's over half a year old at this point).
>>
>>101642203
Nta, but it's just skill issue, D3 can draw any degeneracy you want.
https://litter.catbox.moe/9v059d.jpg
>>
>>101642311
You have to do tons of tries on Bing to get those styles of pictures though, to bypass both the prompt filter and the image filter.
>>
File: 1697137813149709.jpg (206 KB, 1024x1024)
206 KB
206 KB JPG
>>101642306
This is also natural style dalle3, unedited
>>
>>101642318
>You have to do tons of tries on Bing to get those styles of pictures though, to bypass both the prompt filter and the image filter.
not really, I can generate more on the first try
https://litter.catbox.moe/nrfvz9.jpg
https://litter.catbox.moe/v96ma7.jpg
>>
>>101642365
anon, this is a very specific fetish and you're just lucky that you found this degeneracy theme. Can you try to get normal tits from bing though, like my miku gens?
>>
>>101642306
>picrel
That looks pretty fucking good compared to the typical dalle-images, have fun with the api anon
>>
>>101641800
Here's the best I coaxed from bing today:
https://files.catbox.moe/xy2z08.jfif
https://files.catbox.moe/xc84ul.jfif
https://files.catbox.moe/ml1kg2.jfif
https://files.catbox.moe/0al4rw.jfif

I'm retiring my OLD LLM machine (Dell R720). It was a k8s testing platform, and I discovered it could take a pair of P100s, and from there I started playing with stuff. It's a watt-waster, though.
>>
>>101642326
wtf
>>
File: 1698239585092081.jpg (196 KB, 1024x1024)
196 KB
196 KB JPG
>>101642462
yeah
>>
File: 1698700675967315.png (53 KB, 210x164)
53 KB
53 KB PNG
>>101642326
You can see the squares break
>>
File: 1720765088932249.jpg (85 KB, 1024x1024)
85 KB
85 KB JPG
GOTTA BECOME RETARDED!
>>
>>101642488
omg it sanik
>>
File: 1721334394229132.jpg (238 KB, 1024x1024)
238 KB
238 KB JPG
>>101642492
>>
>>101642496
Can you draw judy hopps?
>>
>>101642376
I don't do anime. Have some yakuza and ebony titties tho.
https://litter.catbox.moe/rz5oks.jpg
https://litter.catbox.moe/ffs982.jpg
https://litter.catbox.moe/5dc7s9.jpg

I rarely go for nudity nowadays, I prefer to generate more erotic images like
https://litter.catbox.moe/cz15th.jpg
>>
>>101642306
This could almost be an anime shot, or it could be
>>
File: ComfyUI_02523_.png (1.72 MB, 960x1270)
1.72 MB
1.72 MB PNG
local models?
>>
>>101642580
Are you lost?
>>
File: 1698868454484510.jpg (198 KB, 1024x1024)
198 KB
198 KB JPG
>>101642571
asuka if she was trans
>>
>>101642326
>>101642469
What do ms paint gens look like? like this >>101636906
>>
File: 1716842066282114.jpg (196 KB, 1792x1024)
196 KB
196 KB JPG
>>101642571
This is from an actual anime btw (totally not an unedited dalle3 gen, including text)
>>
File: 1444332745884.png (60 KB, 456x570)
60 KB
60 KB PNG
>>101642591
>>101642601
>>
File: 1701360303739698.png (374 KB, 1024x1024)
374 KB
374 KB PNG
>>101642601
this is dalle with jb with prompt "ms paint drawing of an anime girl, extremely simple, microsoft paint"
>>
>>101642615
Post moar pls, looks really good
>>
>>101642659
that necktie is so cute, it made me smile
>>
>>101642685
Just wanted to share the exact prompt (without JB, you need it):
The image is a single frame from an anime show (anime screencap) showing an anime girl adorned with a clover symbol. She is pointing and laughing in a teasing manner towards the viewer with the accompanying subtitle text 'Holy, scrapelet!'. The background is neutral gradient colors.

Natural style, 1792x1024, HD quality.
>>
>>101642586
azureshit and cloudshit is on topic now so I must be.
the amount of cope in this thread is astounding
writing a platitude about llms in your post to seem relevant is not the secret workaround you think it is retards, get fucked
>>
File: 1714039375650621.png (952 KB, 1792x1024)
952 KB
952 KB PNG
>>101642685
it doesn't always get the text sadly
>>
File: 1715900684000120.png (1.47 MB, 1792x1024)
1.47 MB
1.47 MB PNG
>>101636887
>>101642685
>>
>>101642638
have your throat slit pedo
>>
>>101642736
the fingers........ AIIIIEEEEEEEEEEEEEEEEE
>>
>>101642615
erm sorry chud the proportions of her face are slightly bad
ai has hit a wall
>>
>>101642697
No fun allowed!
>>
File: 1698114860793295.png (1.76 MB, 1792x1024)
1.76 MB
1.76 MB PNG
>>
>>101642638
That looks like a dwarf
>>
>>101642754
A website should let you pay to finetune on a bunch of miku songs to gen a miku song
>>
>>101642754
Post her armpits
>>
>>101642754
what model? sdxl?
>>
llama.cpp's RPC mode sucks... I don't want to send 40GB of weights over the network every time I load the model... With vLLM you just put the model in the same path in both machines...
>>
>>101642615
how much does an api call like this one cost?
>>
>>101642817
If you were to pay yourself, 12 cents. Because it's HD quality (normal is 4 cents -> 8 cents is HD) + higher res 1792x1024 which is another +4 cents, and I mass-gen since not all tries are good, and nitpick the better results. I don't think it's viable to paypig dalle, but you can easily scrape azure endpoints off github.
>>
File: 1714126339002338.png (2 MB, 1792x1024)
2 MB
2 MB PNG
>>
>>101642827
wait so how good is free midjourney compared to this then?
>>
>>101642833
idk, i never use midjourney because I dislike discord
>>
>>101642831
miku hiii it's me lmg
>>
Samplers for mini-magnum, and reviews for it vs nemo instruct?
>>
>>101642827
>but you can easily scrape azure endpoints off github.
you overestimate me
>>
>>101642827
>but you can easily scrape azure endpoints off github.
Doesn't github immediately take those down and send a notification to the owner of the repository? Kind of like what they do when you accidentally leak your credentials
>>
>>101642886
No, they do this for OpenAI (which are just single keys), but not for Azure OpenAI endpoints (which are two parts - the endpoint name and the key, you also need the deployment name but it could be obtained from the API itself if you have the first two)
>>
File: 00041-404906826_1.png (1.79 MB, 1456x1024)
1.79 MB
1.79 MB PNG
>>101639548
>Biggest loser itt
Looks like someone's feeling left out
>>
>>101642285
3rd? more like 5th, lmao, local lost
>>
>>101642455
you need help
>>
>>101639548
>mfw I realize I can prompt people outside and wait for their response
>as low stakes as talking to a model
thanks anon I'm married now
>>
>>101642918
>MIKUK
the memes write themselves
>>
File: 1707720034696868.png (30 KB, 168x130)
30 KB
30 KB PNG
>>101642967
It reads more like NKUK for me
>>
>>101642736
kek this is really bad and really good at the same time
>>
File: 1715889883975479.png (1.55 MB, 1792x1024)
1.55 MB
1.55 MB PNG
it doesn't want to make miku when she's hugging anon aaaaaaaaaa
>>
I just got an rtx 3090. How smart of an investment was this?
>>
>>101642791
lol
lmao even
>>
File: 1710075201271825.png (1.23 MB, 1792x1024)
1.23 MB
1.23 MB PNG
Finally a single real miku, but she's a loli for some reason
>>
>>101642831
kino, looks like the psp game
>>
>>101643034
Looks exactly like MMD, anon.
>>
>>101642979
it's clearly INKUK
>>
>>101643032
cute and funny, the AI got its priorities straight.
>>
>>101641514
From what i've read the main problem problem is data, not enough labelling of audio transcription
>>
>>101642831
The 2D stickers look out of place.
>>
File: 1707201855037480.png (2.05 MB, 1792x1024)
2.05 MB
2.05 MB PNG
>>101643045
yeah, dalle3 knows mmd
>>
>>101643015
>investment
financially? you're not gonna make the money back
enjoyment? it depends how much you enjoy it. (we understand. you're in good company here)
>>
>>101642849
>Samplers
I like a bit higher than recommended temperature, .6-1, add some minp if you're getting crazy tokens.
>>
>>101643075
what is minp
>>
File: 1717082216698188.png (2 MB, 1792x1024)
2 MB
2 MB PNG
Something's wrong with this Miku...
>>
>>101643084
/lmg/ ove...r
>>
>>101643084
LMG is oveR after miku became real and got all anons laid
>>
>>101643089
>>101643089
>>101643089
>>
File: 1694294408509626.png (1.32 MB, 1792x1024)
1.32 MB
1.32 MB PNG
NVGIDIA RXC RTX
>>
>>101643112
genius fan design
>>
File: 0fa.jpg (1.05 MB, 3264x2448)
1.05 MB
1.05 MB JPG
What current machine AI handles a solid conversational flow for RP?

Every RP i've tried so far has the same sterile robotic feel to it that I dislike (waterboard you with questions that while, seems kinda normal, the frequency just instantly reminds you you're talking to a bot)

Really pissing me off, been trying Gemma 27B trying to fine tune it and it's super bad for this. Command R is a little better, Mistral Nemo is also pretty bad.

For reference, I have a 4090, so not running any x3 3090 setups for the actual nutty models
>>
>>101643127
>What current machine AI handles a solid conversational flow for RP?
none
I'm being serious, wait a year or two I guess
>>
>>101643127
>>101643239
I seriously think this could be fixed with a card-specific prompt or an author's note or something.
>>
>>101643374
And examples.
>>
>>101638197
so sure, a transformer big enough could.

but it's just not the right tool for the job, too inneficient, you can aproximate mandelbrot, heck have a perfect representation of mandelbrot with enough layers (infinite), but that's not a practical solution.

the transformer architecture is just not fit for AGI, you could get there if you had 100 order of magnitudes more compute and data.

but it's a pointless endeavor due to architectural limitation, i do think we'll reach agi at some point, and i do think we even already have the compute necessary for it, but i think the transformer architecture alone is just too inneficient for that purpose.

if you consider the human mind as a function, sure an universal function aproximator can get close to the mapping of all possible i/o.

but it'd be a huge waste of time.

just like we don't use ufa to practically approximate mandelbrot, it's not a practical tool for full blown agi, at least not alone.

i think your first focus should be making an artificial hippocampus.
>>
>>101643389
also fuck the reddit spacing, i don't do double new line, i do shift newline but i didn't knew it'd show up like that.

test(doing double newline)

test(doing shift newline)
test(doing just newline)
test
>>
>>101643127
>single GPU rig
unironically NGMI
jj tho.
I do have an idea for some datasets that might give models a more natural conversational flow. But it'll have to wait until fall when I can just open my window and not turn my house into an oven while training right now it's heat wave season.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.