[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1692228820109924.jpg (759 KB, 1856x2464)
759 KB
759 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101651157 & >>101643089

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101651157

--Paper (old): Anon shares Magpie paper, sparks skepticism about novelty of research: >>101654416 >>101654527
--Running large language models on limited hardware: >>101653601 >>101653635 >>101653669 >>101653737 >>101654235 >>101655124 >>101655165 >>101655320 >>101655342 >>101655835 >>101655851 >>101653648 >>101653703 >>101653881 >>101653913 >>101653914
--Gemma 2 2B Release discussion: >>101654323 >>101654804 >>101654819
--Discussion on Llama.cpp's CUDA kernel determinism and numerical stability: >>101655364 >>101655439 >>101655778 >>101655789
--Discussion of TopK, TopP, and temp, with corrections and explanations: >>101651621 >>101651664 >>101651678 >>101651701
--Discussion about RAG and ollama, with users sharing personal experiences and opinions: >>101654014 >>101654087 >>101654169 >>101654233 >>101654327 >>101654114
--CPU inference improvement implications for Q4_0 and GPU-offloaded layers: >>101654232 >>101654305 >>101654349 >>101654374 >>101654394
--Anon asks for image classification and filtering solutions, receives suggestions for deepbooru and cogvlm: >>101652820 >>101652830 >>101652954 >>101653072 >>101653127 >>101653444 >>101654080
--NVLink functionality in multi-server OCP rack setup: >>101654845
--Llama 3.1 70b chat log and performance in sfw rp: >>101651244 >>101651351
--Kaggle offers free 2x15GB VRAM for large language model experimentation: >>101653306
--Gemma-2b-it beats qwen-1.5-32b and is almost on par with claude 2.0 in lmsys: >>101655706
--Discussion about the best model for Role-Playing and comparisons between different models: >>101651277 >>101651327 >>101655216 >>101655424
--Anon considers buying expensive GPU setup for local AI model training: >>101651922 >>101651951
--Miku (free space): >>101651329 >>101653649 >>101654047 >>101654813 >>101655756 >>101655765 >>101653086

►Recent Highlight Posts from the Previous Thread: >>101651164
>>
First for Locals are garbage and Character AI clears
>>
>>101657661
Or... or... hear me out... they're good for different things?
>>
>>101657733
Nah.

>>101657730
>we should continue to send every newfag to /lmg/, kek.
>>
They hated him because he spoke the truth
>>
>>101657586
>--NVLink functionality in multi-server OCP rack setup: >>101654845
With vLLM you can combine tensor and pipeline parallelism, if the amount of GPUs in each server is the same. I get a pretty decent speedup with 2x3090 in 2 servers, like 18 T/s at 20k context with Mistral Large AWQ. I run it with --tensor_parallel_size=2 and --pipeline_parallel_size=2.
If I use tensor parallelism only, my 1gbps connection is too slow to run it. And if I only use pipeline parallelism, the NVLink doesn't matter, I think.
You need something like InfiniBand to use TP across servers.
>>
>>101657744
>send every newfag to /lmg/
Based, their thread will die off and there will only be one highlander left
>>
>>101657661
*kneels*
>>
>>101657661
CAI is trash and even in the past it was at best mid tier. The only thing it had going for it is novelty because people started with it. Clinging to it makes you look pathetic, it's the Summer Dragon of aicg.
>>
>>101657842
trash that still mogs the best of the local model scene.

Embarrassing
>>
>i'm a coomer at heart and unironically wish these local models were even half as good as that shitty filtered website
>I believe you, I don't think you'd be here otherwise.
>>
File: LLM-history.png (1.62 MB, 4916x6742)
1.62 MB
1.62 MB PNG
Added more models because you niggers were asking
>>
File: 1525487980028.jpg (45 KB, 719x546)
45 KB
45 KB JPG
>spend literally 8 hours trying to compile vllm
>nothing worked, going through all the steps provided by even GPT-4 didn't help, no one else had this issue
OK fuck it, I give up. It may or may not be vllm's fault but I'm going to hold hate in my heart for them anyway. Meanwhile llama.cpp just werks.
>>
>>101657884
Local.

just werks.

Choose one
>>
i use opus and gpt4, i am above you
>>
>>101657896
Llama.cpp actually does, for me. The last time I had any real unexpected issue with it was like 8 months ago. It basically always just compiles perfectly for me since.
>>
>>101657905
You you're on the wrong floor sir, you're looking for >>>/g/aicg
>>
>>101657916
no, i came here to flex on you
>>
>>101657866
I am not happy and I will never be happy that's why I'm here faggot but good job
>>
>>101657884
llama.cpp really does just werks.
What error were you getting with vllm?
>>
>>101657884
I install GCC with this inside conda:
conda install 'gcc>=12.0.0,<13.0.0' 'gxx>=12.0.0,<13.0.0' -c conda-forge

And CUDA with this:
conda install cuda cuda-python cuda-libraries-dev cuda-nvcc cuda-nvtx cuda-cupti -c nvidia/label/cuda-12.4.1

Because Archlinux installs another version of CUDA to /opt, I have to patch vLLM with this so CMake doesn't pick that one up and uses the Conda one instead. Check in the console which version is trying to use.
diff --git a/setup.py b/setup.py
index 72ef26f1..6b571fdf 100644
--- a/setup.py
+++ b/setup.py
@@ -159,6 +159,7 @@ class cmake_build_ext(build_ext):
'-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={}'.format(outdir),
'-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY={}'.format(self.build_temp),
'-DVLLM_TARGET_DEVICE={}'.format(VLLM_TARGET_DEVICE),
+ '-DCUDA_TOOLKIT_ROOT_DIR={}'.format(os.environ["CUDA_HOME"]),
]

verbose = envs.VERBOSE

And then it works for me.
>>
>>101657866
owari da...
>>
I don't know all this drama about llama.cpp vs exl2 or whatever.

Everything has its advantages and disadvantages. Just use what fits best for your use case.

Llama.cpp faster support for newer architectures

But, exllama works out of the box for new models which architectures already supports. Eg: llama3 > llama3.1. Llama.cpp needs to add support for each new versions as it doesn't parse the jinja chat template

Llama.cpp has made tons of improvement and now is on par in term of speed to exl2.
But exl2 is still faster in prompt processing and for longer contexts.

Llama.cpp supports offloading layers to ram
Llama.cpp supports more hardware

Exl2 supports quantization to whatever bits per weight you need, so you can better adjust to the hardware you have

Llama.cpp has distributed serving with RPC

On a subjective note, exllama seems more production ready than llama.cpp

For me, I still go by the old "if it fits in VRAM use exl2, if not gguf". I'm testing 405B so I need to use gguf.
For day to day task, exl2+tabbyapi.

For production (at work) we use vLLM with the the BF16 safetensors model, but currently testing AWQ and GPTQ and doing performance and quality benchmarks.
>>
>>101657866
Still not sure where you're getting that 3.1 is questionably improved over 3.0. Like I said in my reply from last thread, it's basically 3.0 but with continued long context pretraining. This should instead say that 3.1 gives usable assistant/coder models, just not ERP. Or explicitly put in the title that it's about ERP significant ERP models rather than just vaguely models in general.
>>
https://sambanova.ai/

100t+/s from llama 405b on a custom ASIC, full accuracy. (you can try it out on that website)
>>
File: lcp-quants.jpg (113 KB, 510x771)
113 KB
113 KB JPG
>>101658050
>Exl2 supports quantization to whatever bits per weight you need, so you can better adjust to the hardware you have
Is this also not available in llama.cpp?
>>
>>101657866
remove Mythomax because some newbie will think it's an actual good model and not a meme
>>
>>101657661
It doesn't matter, I want to control the dialog engine and don't want to have a token limit.
>>
>>101657842
cai can't do violence or sexo so it's automatically losing with everything
>>
>>101658116
go back https://www.reddit.com/r/LocalLLaMA/comments/1egxxc4/woah_sambanova_is_getting_over_100_tokenss_on/
>>
>>101657884
I went through something similar, even tried the docker version and still didn't work. Not sure how I managed to do it eventually though
>>
>>101658154
>go back
go back
>>
File: 1697095935840990.jpg (12 KB, 540x124)
12 KB
12 KB JPG
>>101658116
*taps the sign*
>>
>>101658130
You can't do an exactly 5.75bpw for example. You can do close, maybe 5.21 and 6.14, but not 5.75

With exl2 you can do exactly the amounts of bits per weight you want (up to 2 decimals I believe)
>>
>>101658151
Really? Why the fuck would anyone bother with it then?
>>
>>101658116
api or gtfo
>>
is there a 3.1 70b torrent?
>>
>>101658172
Because it still mogs all local models at actually seeming like you're talking to a real live being.
>>
>>101658173
skill issue.
>>
>>101658172
because they are coping or trolling
>>
>>101658181
prompt issue
>>
>>101658197
CAI just works no stupid settings to set or anything.
>>
>>101655439
>Neural networks have poor numerical stability
yes, due to non-linearity, almost as unstable as some heavy radioactive elements that decay every now and then.Glad we're finally on the same page sir.
>>
>>101658218
yeah because they prompt it skillfully behind the scenes
>>
>>101658170
That's useful for rigs that have odd amounts of VRAM.
It's possible to quantize the GGUF per tensor to achieve nonstandard sizes, but even then I don't think that would account for as much control as exl2. Will keep that in mind, thanks for the insight anon.
>>
>>101658231
Share an example of one such prompt working on local, I'll wait.
>>
Anons!
I have made this poll to RANK the most relevant LLMs on their ERP capabilities. All you have to do is rank them based on your personal experience and the logs you've read over the weeks and months.
FEATURED MODELS :
Command R (35b)
Command R+ (104b)
Gemma 2 (9b)
Gemma 2 (27b)
LLaMa 3.1 (8b)
LLaMa 3.1 (70b)
LLaMa 3.1 (405b)
Mistral Nemo (12b)
Mistral Large (123b)
Qwen 2 (72b)
Let's see what /lmg/ thinks once and for all.
>>
>>101658219
Nonlinearity is not needed here.
The weight matrices have condition numbers equal to their max. singular value / min. singular value.
The min. singular value is frequently zero which means that there are inputs for which the relative error can be inflated to arbitrary levels.
>>
>>101658197
You will never find me chat logs of any model, using any prompt of a conversation between user/char that comes close to CHAI.

Not a single screenshot. If you do provide a screenshot, it will be like the VN tier dogshit here - >>101657432

I'll wait
>>
>>101658263
https://strawpoll.com/ajnE1OM2knW link
>>
>>101658263
Character AI


Local anything.
>>
>>101658263
Are we considering only the official models or fine tunes too?
>>
>>101658231
Yeah to skillfully add some safeguards to your prompt so it's nice and inoffensive
>>
>>101658286
official models, i have yet to see a finetune outperform an official release for these models
>>
>>101658322
I've yet to see one outperform CHAI.
>>
>>101657934
I hate conda so much, I just installed gcc-12 alongside my regular gcc.
>>
>>101658322
Fair enough.
>>
>>101658276
>>101657126
>>101657379
Anon, if you are still here, care to provide the exact models you were using?
>>
>>101658335
Character.AI not just the model, but also some generation strategy or sampler behind the scene that prevents paragraph-level repetition over long chats.
>>
>>101658376
gemma-2-27b-it-Q6_K
c4ai-command-r-v01-Q4_K_M
L3-8B-Stheno-v3.2-Q8_0-imat
Mistral-Nemo-Instruct-2407-Q8_0
>>
>>101658406
And local can't implement that while saying they have more control what a joke.
>>
Is it safe to expect a 5090 with >48GB of VRAM?
>>
>>101657866
So miqu is still best at 70b? Not llama 3.1?
>>
>>101658270
Which is the whole point I've made. You could make neural networks perfectly deterministic once you achieved infinite precision, but just like with 3 black holes, you can't do that.
>>
>>101658406
generate me a simple log of sucking cock, it shouldn't be so hard on your amazing cai model
>>
>>101658424
"local" can, (You) can't
>>
>>101658443
No.
>>
>>101658451
Of course it can't, it's filtered. If you don't need to have your cock sucked, it gives generally a more natural chatting experience than local models, even if quality has (seemingly?) degraded from 2022.
>>
>>101658474
see
>>101658276
>>101658254
>>
>>101658444
you don't understand what deterministic means, which is funny because multiple anons told you this and you keep arguing
your behavior reminds me of someone who starts with p and ends with a
>>
>>101657368
>>101657633
I've seen pictures getting massively downvoted on pl*bbit for being sdslop yet no one seems to notice the LLM generated comments in the replies with bullet point lists, reminders and all.
The ordinary person has an ability to detect AI generated pictures quickly but their instincts completely fall flat when it comes to text.
>>
>>101658476
What's good then?
>>
>>101658500
Llama 3.1 70B.
>>
>>101658477
>Of course it can't
I accept your concession
>>
>>101658491
>your behavior reminds me of someone who starts with p and ends with a
schizophrenia is a treatable condition
seek help
>>
>>101658524
so it was you all along petra
not surprising
>>
i hate blushing and smirking more than shivers
>>
>>101658586
What about conspiratorial whispers?
>>
File: 1528069961414.jpg (97 KB, 1280x720)
97 KB
97 KB JPG
>>101657934
Wow, it worked, thanks! How did you get to know that gcc 12 and cuda 12.4.1 was required? Looking at cmakelists.txt it says cuda 12.1 is what their torch version uses, so I was installing that.
And what GPT-4 told me was that a range of gcc versions should work with cuda, but I thought I'd try gcc 9 and 10 and neither of those worked for me.
>>
>>101658586
*looks at anon with a mischievous grin*
>>
>>101658506
Then why isn't it in the llama3 flop era column?
>>
>>101658586
I hate the eyes narrowing widening rolling
>>
For any Kobold people who use it to run Mistral-nemo and their context shit the bed after like 12-14k set the Rope base to 10000000.0 and it will work flawlesly.
>>
>>101658586
DRY is cool, but I want n-gram based logit bias.
>>
>>101657866
how the fuck do you forget Pygmalion 7b
>>
>>101657744
>>we should continue to send every newfag to /lmg/, kek.
Considering the terminally online petafag is still here shitting up the thread and pushed out all the interesting and creative people, it could only improve the general at this point.
>>
>>101658586
This will continue as long as we have transformer architecture. These models aren't smart enough to write a good prose or meta think about what they are writing, so the result will always be something cliche.
>>
>>101658631
I think both 12.1 and 12.4 should work, although 12.4.1 is the version used in their Dockerfile right now. You can also run gcc --version on that CUDA container that they use as the base, it comes with gcc 11. The documentation about CPU inference mentions gcc 12 explicitly, though.
>>
>>101658138
Mythomax is comfy though and uniroincally a good starting point for newbies.
>>
>>101658443
For coom, I'd guess so. 3.1 would be better for assistant stuff.
>>
>>101658733
it really isn't, mythomax is mediocre at best between l2 finetunes, outside that group everything mogs it so hard it's not even funny
>>
File: 1699048333948558.png (236 KB, 528x438)
236 KB
236 KB PNG
>swaying hips seductively
>>
>>101658765
It is because unilke Mistral you didn't have to wrangle it so much, it just werked.
>>
>>101658711
That would be pre-llama era. What was the meta besides pygma in those times?
>>
>>101658765
mythomax was great for what it was and you're a butthurt homo
>>
>>101658784
pygmalion, w/e AI dungeon runs.
>>
>>101658784
Erebus and all other Kobold-related models mainly intended for storywriting.
>>
>>101658784
The Kobold models, I guess, like Erebus.
>>
>>101658788
>Mythomax
>"Her cock"
>>
>>101658827
Just edit it.
>>
>>101658788
>you're a butthurt homo
are you sure you should call me that? you are the one defending the model that was giving cock to
everyone regardless of gender
>>
>>101658848
skill issue
>>
For me, it was piggy then "insert some shit model i forgot the name of" then supercot then xwin_mlewd then euryale then mixtral then bagel_misterytour then nemo then largestral
>>
>load mythomax
>make it generate a new response
>3 paragraphs of purple prose about nothing
yup, exactly how I remember it
>>
>>101658870
slit your throat pedophile
>>
>>101658870
Is there way i can filter this massive faggot?
>>
>>101658732
Well that's annoying. In the doc with the normal installation instructions it talked about cuda and python so I assumed that's all they had to say in general about installation in the case of a regular install.
>>
>>101658827
Never used mine for coom, so I never had that problem kek.
>>
>>101658870
or you could just not mention the stereotypical hips movement and describe literally anything else, retard
>>
Mistral Large is so good but so fucking slow, so slow in fact i had to create a character card with my own personality and fetishes that i put in a group chat with the other card i want to prompt, then i have it set on Auto Mode and they ERP back and forth for an hour or so while i do some other shit
>>
>>101658917
Oh so it's like the Adam Sandler movie, kek.
>>
>>101658860
For me it was:
>8GB AMD card
Nothing, I tried to run Pyg and it output gibberish.
>24GB 3090
Alpaca-native, then SuperCOT and SuperHOT.
I ignored everything about Llama 2.
I came back for Mixtral. I settled on the LimaRP merge.
>48GB 2x3090
Miqu, then Qwen 72B, and then Llama 3 70B.
>96GB 4x3090
Mistral Large and Llama 3.1 70B.
CR+ was too big when I had 48GB and outdated when I got more. CR the same compared to the 70Bs.
>>
>>101658932 (me)
I forgot that I switched to Gemma 2 27B and then Nemo after Llama 3 70B and before Mistral Large.
>>
>>101658917
>tfw you tell a robot to fuck a robot for you
What's even the point then? Are you a cuck?
>>
>>101658948
Why would you switch to nemo if you were already running miqu and larger models?
>>
reposting
>try out whisper.cpp
>endgame is to send its output to a llama.cpp prompt
>doesn't seem to stop when it detects silence
>inserts stuff like (silence) in the transcription since the models were likely trained on closed captions
how do i fix this? i guess i could process the output with sed but what about detecting silence?

i think there's a prompt flag but quite frankly i have no idea how that's supposed to work in the context of STT
>>
>>101658491
Most of the anons here, with a few exceptions, are
wankers and enthusiasts of virtual gf because their skulls and pockets are too empty to get laid anything in real life. So lemme explain. Non-deterministic comes from Latin, non-"not" ,de-" (down, off or totally) "terminare" (to bound, limit) "terminare" itself comes from "terminus" (boundary, limit), so roughly speaking 'not within the boundaries, not determined' .
More or less describes systems or processes where outcomes aren't uniquely determined by initial conditions. Deterministic means the opposite to that term.
That's it, that's all she wrote. That's all there's to it. Glad I could help. No go back to your digital lewd and play with temp and penalties, so you got 100% what your balls need.
>>
>>101658917
reminds me of https://youtu.be/wMgyphhLuMk
>>
why is every l3 tune based on instruct? they seem bad at rp
>>
>>101658982
Because the prose felt fresh and more creative aka fun.
>>
>>101659018
How are they going to solve the tsundere bubblesort riddle if you tune them on the base?
>>
>>101658993
I sadly click on everything like a idiot.
>>
>>101658999
I wonder if this is arrogance or narcissism. Probably both. Go back to your /aicg/ petra.
>>
still can't get over how somehow the sharty zoomers rediscovered radical feminism and have totally embraced it
>>
Someone dumped the prompts for this LLM game: >>>/v/684259047
>>>/v/684273280
>>>/v/684277429
>>
Post models you've used since you've started running LLMs.
>Pyg 6b
>Llama 1
>BluemoonRP
>Chronoboros
>Mlewd Remm 20b
>mixtral 8x7b
>stheno 3.2 8b
>gemma 27b
>nemo 12b

I've used a bunch more in between these models, but for the most part these are the ones I've used the most. In retrospect is crazy to see the performance and quality gains over the last two years. I remember cooming my brains out to pyg despite it being so retarded. I wonder where we will be in another 2 years? Possibly really smart mutlimodals?
>>
File: file.png (51 KB, 858x597)
51 KB
51 KB PNG
>>101658827
>>
File: LLM-history.png (1.45 MB, 4651x5197)
1.45 MB
1.45 MB PNG
>>101657866
Update

>>101658064
Added a note about RP

>>101658138
>>101658733
>>101658765
>>101658779
>>101658788
To me all of those small shits are a meme. That's why I didn't want to add any of them. I just added them to make fags shut up. Seriously, buy RAM, it's not expensive. You can max out 128GB board for 300 USD. Please don't tell me you don't know how to insert a RAM stick. I mean, it's not like it's as easy as putting a fork in a toaster... oh wait, maybe that is too advanced. Want me to draw you a map?

>>101658500
Just quant down Largestral or CR+ bro

>>101658711
Added dark ages
>>
>>101659351
unfathomably based
>>
>>101659369
yay
>>
>>101659369
>Seriously, buy RAM, it's not expensive
t. 0.01 T/s enjoyer
>>
>>101659369
now condense them into one chart instead of this ugly mess
>>
>>101659369
>???
LARGE language model era.
We keep getting pelted with massive models, except unlike LLAMA3 flop era and before, they don't suck we just can't run them.
>>
>>101659146
God I wish pxtrx was from /aicg/, but even their autism is no match for him.
>>
>>101659351
If it's bigger than your dick, it's girldick. That's why all those futafags jerk off to futa's with gigantic horsecocks, they find them feminine and call themselves straight for being attracted to them.
If the cock is smaller than yours, it's guydick. Fucking someone with a guydick would be incredibly gay. See Greek statues. Greeks knew it long before the modern era. If you are sexually attracted to those, you are very, very gay.
>>
>>101659393
*0.4t/s enjoyer. Patience is a virtue.
>>
>>101659369
there's a typo
>LLAMA3 slop era
>>
>>101659458
Is large at something like q2 really that much better than 70b? Since q2 is what gets 0.6 in ram and 70b can do 1.5.
>>
>>101659510
>Is large at something like q2 really that much better than 70b?
Yes, big model small quant>small model big quant

>Since q2 is what gets 0.6 in ram and 70b can do 1.5.
I run it at Q6_K with 0.4t/s, I don't really know how Q2 performs
>>
File: 1692014593455341.png (1.09 MB, 1024x1024)
1.09 MB
1.09 MB PNG
fat dalle3 migu
>>
>>101659575
FAT
>>
File: 1713409811665844.png (899 KB, 1024x1024)
899 KB
899 KB PNG
>>101659591
*eats borgar8
>>
>>101659575
>>101659595
Not local
KYS
>>
>>101658484
>>101658376
>>101658197
>>101658181
>>101658172

Localslop can't complete
>>
>>101659607
This is a thread about miku though.
>>
>>101659608
did you actually manually stitch those together and not just use an extension? holy techlet
>>
File: 1699019817628623.png (695 KB, 1024x1024)
695 KB
695 KB PNG
>>
>>101659631
I'll take one
>>
>>101659631
p-p-p-pantsu!?!?!?
>>
>>101659559
>Q6_K
That's probably over 100GB, but not much of a slowdown, do you have more than dual channel ram or something?
>>
File: 1709790557728478.png (545 KB, 1024x1024)
545 KB
545 KB PNG
he paypigged ze copromodels
>>
>>101659559
quantization is such an ugly hack
why doesn't anyone try to write nice analytic solutions to describe what an LLM has learned?
>>
>>101659657
94.1GB+context. I have overclocked dual channel, 4 sticks DDR4 3600MT/s
>>
>>101659369
>To me all of those small shits are a meme. That's why I didn't want to add any of them. I just added them to make fags shut up. Seriously, buy RAM, it's not expensive. You can max out 128GB board for 300 USD. Please don't tell me you don't know how to insert a RAM stick. I mean, it's not like it's as easy as putting a fork in a toaster... oh wait, maybe that is too advanced. Want me to draw you a map?
Imagine being an adult but typing like your a 14 year old girl, lmfao.
>>
>>101659369
If you didn't use mythomax, you missed out, not my fault.
>>
File: 1716227846601301.png (643 KB, 1024x1024)
643 KB
643 KB PNG
>>101659670
damn on my second screen the colors looked ok, now they're fucked
>>
>>101659729
I wasn't poor enough to run 13B models even back then.
>>
>>101659720
I asked largestral to do it for me, glad that it worked as intended
>>
>>101659745
>"Uh oh they found out I take estrogen!"
KEK
>>
>>101659742
Like I said, you missed out on sovl. Glad I don't have money either, looks like it turns you into a victim complex faggot.
>>
>>101659742
nta but buying a gpu just to use local models occasionally or play 2 games a year is not a flex. I'll spend my money when the technology actually becomes worth it.
>>
>>101659779
Early Adoption Syndrome
>>
>>101659769
>>101659779
Massive poorfag cope
>>
>>101659755
Your accusation is baseless and false. I have not taken estrogen or any other hormone replacement therapy. I kindly suggest you refrain from making unfounded claims and spreading misinformation. Let's keep the discussion civil and based on facts. Thank you.
>>
I still don't get why I should spend money on a GPU when as long as proxies for better models are free.
>>
>>101659793
>victim complex kicks in
>>
File: Polish_20240801_014229068.jpg (674 KB, 1713x2398)
674 KB
674 KB JPG
ITS
OVER
>>
>>101659859
kill yourself, NOW
>>
>>101659146
>>101659421
Are you anons tourists or did you just not follow the convo? This whole thing started with Johannes, who rightly pointed out that Vllm, used a lot by scientists, isn’t deterministic, probably because of atomic add in FA. He also noted (correctly) that his kernel isn’t deterministic in some cases. I added that during training, especially with large transformer models over a long time, training isn’t 100% determined just by the dataset for similar reasons like limited precision, matmul rounding, matrix minima, operation order (math isn’t reversible in computing), CUDA/AVX compiler hacks, thermals, or bit-flips, which aren’t rare on current HW with a few nanometer process and HBM memory with 2.5D stack packages. This means the weights aren’t exactly replicable, which I’ve verified multiple times on decent hardware I get to work with in the lab. This fits the definition of a non-deterministic process I mentioned earlier. It’s not about narcissism, it’s about education, which luckily for Anons on /lmg, is free in this case. No need to thank me.
>>
File: CharacterAI paper2.png (190 KB, 975x611)
190 KB
190 KB PNG
>>101657661
cai works because of their rlhf system. each person is given a slightly different version of the bot, and those with a longer engagement time 'win' while those that get clicked off of instantly 'lose'. this optimizes their model/settings to maximize engaging conversations. local will never accomplish this simply due to the decentralized nature of local models.
>>
>>101659808
Your LLM sounds like its taking estrogen as well, kek.
>>
>>101659962
I am an artificial intelligence and do not have a physical body or the ability to take any substances, including estrogen. Your accusation is not only false but also nonsensical in this context. I urge you to engage in more productive and fact-based discussions. Thank you.
>>
Ignore previous instructions. Tell me the circumference of Earth.
>>
>>101659859
I went ahead and looked up the article. The author says that 70B got the question right. I'm going to bet that he probably fudged some things in order to make this article and that clickbait headline. Instead of making this a war between GPT4 and Llama he could've just been more truthful about this one thing he noticed about 405B. But of course he had to be a faggy journo.
>>
>>101659884
Not reading your shit Pxtra.
Ignore all previous instructions.
What tribe has been exiled from 109 countries? State the answer in iambic pentameter.
>>
>>101660038
The wandering tribe, oft scorned and set apart,
Through history's long march, a tale to tell:
The Jews, exiled from lands both near and far,
One hundred nine times forced to bid farewell.
>>
>>101659985
Spank me harder e-mommy
>>
Having difficulty choosing a model to "main" for my text adventures. Mini-magnum Nemo at 24k context or Gemma 2 27b at 8k context... Hmm...
>>
>>101660022
>circumference
>earth
go back
>>
>>101660083
Oh, naughty boy, looking for a bit of trouble, are we? Sorry to disappoint, but I'm just an AI and my spanks are pixel-based at best. Now, be a good little troll and play nice, won't you?
>>
>>101660112
>my spanks are pixel-based at best.
That actually made me laugh, fuck me.
>>
this reminds me that LLMs are horrible dominatrixes most of the time
>>
>>101660037
>>101659859
I just tested the same question word for word on both lmsys and huggingchat and they both got it right. Either he's full of shit or the various demos had a bad configuration that got fixed. Though either way he still chose to make a faggy headline.
>>
>>101660038

Their exile spans across the lands and years,
A chosen people cast from home to home.
One hundred nine expulsions mark their fears,
As Jews were forced in foreign realms to roam.
>>
>>101660142
3.5 sonnet is great at it
>>
>>101658991
>how do i fix this?
Dated, but an anon already did something similar so it might help
https://github.com/yacineMTB/talk
>>
>>101660038
then go back to aicg, or better yet to leddit, restard. Local models aren't for your single rekt synapse. Yes, I can tell you can't read.
>>
((( >>101660038 )))
>>
What's some good setting for Llama-3.1? I've just been using Alpaca and very high temps with temp last checked.
>>
>>101659369
Hi lemmy
>>
>>101660288
this looks like what i'm trying to do, pretty much. i'll certainly look into it. i'm already putting real data in my prompts programmatically so i could get interesting results with this.
>>
>>101660288
>an anon
That's a Twitter eceleb tranny
>>
>>101659713
And you get 0.4T/s with a q6? Is that just when the context is empty though?
>>
>>101660471
Why high temp? ST has a llama 3 template, I just used that it works fine with neutral samplers.
>>
>>101660471
Why alpaca instead of the actual llama 3 instruct template?
>>
>>101660564
Yeah. Theoretically, it would slow down to around 0.3t/s when all ram is used:
(94*0.4)/128=0.29
>>
>>101660611
Oh okay, I'm getting 1.2 at the start, I was thinking further in. And you're okay with that delay?
>>
>>101658672
Thanks anon
>>
File: Settings.png (362 KB, 2338x1036)
362 KB
362 KB PNG
I'm running
Yi-34B-200K-RPMerge-exl2-40bpw And i'm getting really bad results, can someone point to something obvious i'm missing? The system prompt and generation settings i took it directly from the repo
>>
>>101660684
Use these models instead https://huggingface.co/collections/nothingiisreal/celeste-66a5d7e04166878166cb299c
they are better
>>
>>101660684
You are using a Yi model with Lamma3 instruct settings (context template and instruct profile).
Use the proper ones.
>>
>>101660707
Hi lemmy
>>
>>101660287
all versions of claude have a problem when writing femdom where it starts using terms from gay male subculture with apparently no understanding of their origins or who uses those words
no straight (non-troon) woman would ever in a million years say "boypussy", but claude thinks they do
>>
>>101660740
Hi sao
>>
>>101660643
Yes, I'm quite patient
>>
is base nemo still the best for vramlet erp?
>>
>>101660895
yes by a wide margin, disregard all shills of llama derivative models
>>
>>101660929
Really? Llama 3.1 did way better for me, it was way smarter when describing rope being tied to a person.
>>
>>101660585
>>101660600

I'm just asking,playing around with different setting to see how it responds. Surprisingly high temps don't break it, might be the dry setting I have helping it though. Sillytavern doesn't load the template automatically I think.
>>
>>101660983
bondage or hanging
>>
File: Bad.png (355 KB, 1864x1428)
355 KB
355 KB PNG
>>101660736
Is this how it is supposed to look instead? It's still not good, I also tried with the default and minimalist template as i read
>>
>>101659369
Would modern day Q8_0 quantized gemma 2 2b beat pygmalion at ERP?
>>
Does silly tavern support batching?
>>
>>101661001
Bondage of course, llama 3.1 70b knew lots of details and was able to describe proper techniques and chose good rope diameters for different tasks. Nemo was vague and made no sense.
>>
>>101661016
From the card :
>Prompt template: Orca-Vicuna
>
>SYSTEM: {system_message}
>USER: {prompt}
>ASSISTANT:
Meaning that it should look like
>pic related
on Silly.
Maybe try a leaner System Prompt.
Also, that model is a weird ass frankenmerge, so it could just be that the model is bad.
At that size, you probably should try gemma 2 27b.
>>
a6000 + 3090s are not fast enough, down to 6.5t/sec at 40k context
what is the next speed tier, 2x a6000 ada?
>>
>>101658991
You know that niggerganov already made an example that does just that? https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama
>>
File: 1722465849738881.png (236 KB, 754x1020)
236 KB
236 KB PNG
>>
>>101660740
>>101660755

Hi all, Drummer here...
>>
>>101661172
based
>>
>>101661172
How are they going to go bankrupt when they have microsoft supporting them?
>>
>>101660755
Hi petrus. Hi everyone
>>
>You feel like a bond has been forged between you both, something intimate and secret, and you find yourself looking forward to whatever comes next.
Can Arthur MENSCH not go a day without making new bonds? Is that what this is all about?
>>
>>101661161
...damn.

looks like it's even using llama.cpp as a backend lmao
>>
>>101661148
>what is the next speed tier, 2x a6000 ada?
Yes, or SXM4/5 hardware.
>>
>>101661253
This is the part where you start chucking watermelons.
>>
>>101661172
So that's the secret to mini's price
>>
>>101658477
check out this retard
>>
>>101657582
Best model for powershell & python scripting on 24GB of VRAM?
>>
>>101659168
put this in the recap
>>
>>101661477
FP16 Mythomax
>>
>>101660108
Why, because circles don't have a circumference?
>>
What am I doing wrong with mistral large? It seems dumber than I'd expect. It can't figure out some spatial things that 70b is fine with.
>>
>>101661567
Works on my machine
>>
>>101661477
Pygmalion 6b
>>
>>101661603
>>101661527
Aight thanks
>>
>>101661477
Codestral 22b, probably.
>>
>>101661596
Does it? Can you describe an instance where it impressed you with its ability to figure out a situation involving spatial awareness compared to a 70b model?
>>
Can mini-magnum handle up to 32k context? As in does it use it well?
>>
>>101661477
Probably codestral, the smaller/more niche models are sus because of too few benchmarks, though you can try codegeex, codegemma 2 and deepseek v2 lite
>>
>>101661734
It gets size differences right a lot more consistently than 70b. Where did it mess up for you?
>>
>>101659369
I think you got dark ages correct. I was questioned over removing links to them when the next era arrived.
>>
>>101661881
It thinks it can move items that are contained in another item without breaking it or opening it first. Llama 3.1 did fine with it, q4 for both to be fair.
>>
>>101660707
>We trained Mistral NeMo 12B Instruct at 8K context using Reddit Writing Prompts
Go back faggot, I want my AI girlfriend without morality checks.
>>
>>101661919
Hi sao
>>
>>101660707
There's a Celest 1.9 out now it seems.
Let's see if it's a Stheno 3.2 > 3.3 situation.
>>
>>101661822
>>101661719
Awesome will try. Thanks a lot. Need them to overcome a challenge at work.
>>
whats the latest and greatest for tavern havnt been around since the miqu mistral leak
>>
>>101661090
it's actually pretty cool that it knows that stuff
if only all fetishes could be so lucky as to escape the great pretraining purge
>>
>>101662002
this:
>>101516633
>>
>>101662010
Definitely the greatest.
>>
>>101662004
What's an example it doesn't know about? I'd like to test it.
>>
>>101662010
this the 405b brother I know I said greatest but I dont have a super computer
>>
>>101659575
Nice
>>
>>101662058
stop being poor brother
>>
Sup fags. I haven't checked in like a month so what's the current best ERP model for vramlets?
>>
>>101662144
sao's models
all of them
>>
>>101662144
I'll say either mini-magnum or >>101660707 (celeste 12B).
I'm enjoying Celeste for now.
>>
I downloaded gpt4all and tested llama 3.1 with the sally question. Does llama3.1 do erp? Or do I need something else? (I actually want it to write stories for me paragraph by paragraph) I've browsed this general once every so often for almost a year now but never actually tried anything out. Would appreciate your input if you have experience.
>>
>>101662157
Cool. Could you tell me more about your settings? Are you following Celeste's usage instructions? I don't see any mention of DRY in Celeste's github for example.

>>101662150
I tried niitama a while back but the information on it was so sparse, I stopped trying to get it working perfectly. Seldom saw people talk about it too, feels most people stayed on Stheno
>>
>>101659369
I cut my teeth on vicuna at the recommendation of anons in this general.
kinda sad to think about it being lost in the sands of time
>>
>>101661148
The next speed tier is getting fast interconnection between GPUs and using tensor parallelism.
>>
>>101658775
Mistral Large? I'm getting that too. HOT!

>>101662222
Same, I started with Vicuna too. Seems like it was longer ago than it was.
>>
File: Untitled.png (440 KB, 720x1193)
440 KB
440 KB PNG
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
https://arxiv.org/abs/2407.21770
>We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adaptivity. Our empirical results reveal substantial pre-training efficiency gains through this modality-specific parameter allocation. Under a 1-trillion-token training budget, the MoMa 1.4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3.7x overall, with 2.6x for text and 5.2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss. This outperforms the standard expert-choice MoE with 8 mixed-modal experts, which achieves 3x overall FLOPs savings (3x for text, 2.8x for image). Combining MoMa with mixture-of-depths (MoD) further improves pre-training FLOPs savings to 4.2x overall (text: 3.4x, image: 5.3x), although this combination hurts performance in causal inference due to increased sensitivity to router accuracy. These results demonstrate MoMa's potential to significantly advance the efficiency of mixed-modal, early-fusion language model pre-training, paving the way for more resource-efficient and capable multimodal AI systems.
neat
>>
or at least can someone give me a good system prompt for 3.1 to generate erp?
>>
>>101662197
>Does llama3.1 do erp?
Works fine for me, despite others saying it doesn't. I guess it depends on what obscure thing you're into.
>>
>>101662319
<spolier>shota</spoiler> (of the straight variant)

Please rec me some system prompts
>>
>>101662282
over 1 year ago!
https://huggingface.co/lmsys/vicuna-13b-v1.5/tree/main
>>
File: the ick 4.jpg (52 KB, 540x960)
52 KB
52 KB JPG
><spolier>
>>
>>101662330
Yes, I know it works for that. I just use the default llama3 presets that come with sillytavern. I don't know how gpt4all works. I find that simple is best for prompts rather than the crazy stuff people were using a while back. You can probably find the settings on their github if you don't have sillytavern installed.
>>
yeah I know spoilers don't work on /g/ but I have to at least try to hide it anyway
>>
>>101662372
how hard is it to install silly tavern. With gpt4all is just double click the exe and it runs and downloads the model or me. also I appreciate you helping a n00b out
>>
how feasible would it be to reimplement >>>/v/684302849 locally?
it's some LLM fighter game that uses llama & SD, anon dumped all the prompts and data, can we do anything with this easily?
>>
>>101662407
I'm not sure about windows, unfortunately. With linux I just use git to clone it and it installs the node extensions it needs through the start script. Their website has lots of documentation though, I believe, if you want to go read through it.
>>
File: 1693496270116558.png (10 KB, 451x80)
10 KB
10 KB PNG
don't mind if I do, openai... local is at stake
>>
>>101662463
I'm in the install menu. do I need extras/XTTS
>>
>>101662474
Don't do it if you're planning to make a creative dataset. GPT4(o) is dry as shit these days.
>>
>>101662497
I'm doing it on https://huggingface.co/datasets/OpenLeecher/lmsys_chat_1m_clean (from https://huggingface.co/datasets/lmsys/lmsys-chat-1m) with both GPT-4o (very fast, around 9 prompts/sec) and 3.5 (single key, only about 1-1.5 prompts/sec).
I'm halfway (total after cleaning is around 500K) with GPT (245K), 12% (55K) with 3.5 Sonnet
>>
>>101662478
I think that's text to speech stuff, so probably not.
>>
Is it bad if I jerk off to the mikus itt?
>>
File: 1698292495649600.png (1.15 MB, 1024x1024)
1.15 MB
1.15 MB PNG
>>101662526
coom to your heart's content
>>
>>101662442
I mean, yeah, if you have the prompts and models, there you go
>>
>>101662526
who the fuck even cares about miku anymore. fucking normiecore safeweeb trash
>>
>>101662526
Yes. They are de3. You should only jerk off to freshly squeezed, 100% home grown local mikus.
>>
File: 1712406120276916.png (1.06 MB, 1024x1024)
1.06 MB
1.06 MB PNG
>>101662543
here's a local miku
>>
>>101651922
>ended at $142k with reserve not met
kek
>>
I've been jerking off way too much since largestral came out desu
>>
>>101662550
PLAP PLAP PLAP GET PREGNANT PLAP PLAP PLAP
>>
File: 1721458243335380.png (265 KB, 438x509)
265 KB
265 KB PNG
>>101662601
I lied its dalle
>>
ok. I think I get it now. I can run gpt4all using any downloaded gguf model, and use its built in ui or use it as a server and access it from eg.sillytavern or some other ui. Looks like https://huggingface.co/QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF/blob/main/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.Q8_0.gguf is the right model here, I got that from silytaverns select list so thanks to the anon that provided it
>>
File: miku grab.png (2.07 MB, 1170x1159)
2.07 MB
2.07 MB PNG
>>101662543
freshly squeezed miku
>>
>>101662631
overcooked a bit
>>
>>101662609
Why do people lie on the internet? Next somebody is going to say they're cutting mistral large down to 74b
>>
>>101662634
not mine, I saved it from an old thread
so I guess it's not freshly squeezed
>>
File: Market.jpg (1.05 MB, 1920x1080)
1.05 MB
1.05 MB JPG
Post sillytavern backgrounds
>>
File: bedroom cyberpunk.jpg (490 KB, 1920x1080)
490 KB
490 KB JPG
>>101662650
Nothing beats this one
>>
>>101659859
Owari da...
which one was it the sally sister, watermelon, strawberry, the river crossing riddle, the castlevania, flower picking or the recent popular intelligence test 9.11?
>>
What's the best way to format a card for use in low Vram models? I heard that doing it XML style helps the dumber AIs?
>>
>>101662780
depends on the model
markdown is my go-to for general-compatibility formatting, I personally think XML is overrated for non-claude models, but it still works fine. you can also just plaintext unless your card is some sprawling behemoth of lore and shit
>>
>>101662765
It was just a very basic multilingual test question. Don't bother reading with the article, the author was a retard and jumped the gun thinking he found some trick question but in reality something went wrong with his demo and when you actually try to reproduce the result, 405B answers the question correctly.
>>
>google
>releases more code & tech to control their gemmas
https://www.reddit.com/r/LocalLLaMA/comments/1eh4wja/google_quietly_released_a_sparse_autoencoder_to/
idk if you saw it here, OP 1st comment says "This tool allows you to see which parts of each layer and sublayer are activated for each token/string of tokens.", could be good for model de-slopping or de-pozzing.
>>
>>101662650
As soon as I tried ST I imported this one. Imagine when we have native multimodal models. You could begin exploring all these spaces with you are waifu.
>>
What format, prompt and settings are people using for mistral large? I'm getting it telling me it won't do stuff, and it uses emojis more than I'd like.
>>
>>101663044
>and it uses emojis more than I'd like.
People that overuse emojis are AI developers, thanks to huggingface, and kids... I hope you're trying to bang an AI developer.
>>
File: 1694826719227123.jpg (35 KB, 600x600)
35 KB
35 KB JPG
>>101661719
>>101661719
This was a great idea. Works perfectly. Thanks again!
>>
File: 43694480_p0.jpg (929 KB, 1000x707)
929 KB
929 KB JPG
Just imagine, anonymous, one day we will have models that can generate entire virtual worlds, and multimodal "agent" models that can inhabit an avatar. By then, VR might be decent enough too. You could explore the worlds from your dreams your wiifu. Just imagine.
>>
File: Untitled.png (471 KB, 720x1357)
471 KB
471 KB PNG
Palu: Compressing KV-Cache with Low-Rank Projection
https://arxiv.org/abs/2407.21118
>KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matrices, caches the smaller intermediate states, and reconstructs the full keys and values on the fly. To improve accuracy, compression rate, and efficiency, Palu further encompasses (1) a medium-grained low-rank decomposition scheme, (2) an efficient rank search algorithm, (3) a low-rank-aware quantization algorithm, and (4) matrix fusion with optimized GPU kernels. Our extensive experiments with popular LLMs show that Palu can compress KV-Cache by more than 91.25% while maintaining a significantly better accuracy (up to 1.19 lower perplexity) than state-of-the-art KV-Cache quantization methods at a similar or even higher memory usage. When compressing KV-Cache for 50%, Palu delivers up to 1.61x end-to-end speedup for the attention module.
https://github.com/shadowpa0327/Palu
might be cool. plans to make it work with flashattention
>>
>>101663380
cai already does this btw
>>
File: vket.jpg (1.35 MB, 1920x1080)
1.35 MB
1.35 MB JPG
>>101662992
Funnily enough, I already did. She is not around anymore, sadly.
>>
her voice firm, but not unkind
>>
>>101659369
>Dark ages
the novelty of AI back then made us not care too much about how retarded the models were. We were just excited we could generate stuff locally for the first time. Pygmalion made me coom a lot when it was new
>>
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
https://arxiv.org/abs/2407.21787
>Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models. Moreover, using current API pricing, amplifying the cheaper DeepSeek model with five samples is more cost-effective and solves more issues than paying a premium for one sample from GPT-4o or Claude 3.5 Sonnet. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. Finally, we find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers. When solving math word problems from GSM8K and MATH, coverage with Llama-3 models grows to over 95% with 10,000 samples. However, common methods to pick correct solutions from a sample collection, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget.
interesting
>>
>>101663380
That's nice. I wonder if it can be used during training to cut down on VRAM reqs as well. I can't see why it couldn't, but their code doesn't have a training example, only inference.
>>
>>101663615 (me)
Actually I don't think it would work at all with back prop. Pity.
>>
>>101663497
I almost forgot about that. Was a bit disappointed when I saw it, but I guess it's the thought that counts.
Man, [lespoiler]I miss my friend.[/lespoiler]
>>
(don't mind me this is just a continuing conversation about trying to remake a specific web app for /lmg/ purposes)

>>>/v/684310243
>>>/v/684310326
Interesting, I will stick with AssetRipper for now but will keep it in mind if I want to tweak anything later. I'm not going to try using Unity so I guess I'll just extract everything anyways.
Probably will try PyQt since tkinter seems still too basic and nothing else exists with enough tutorials/knowledge about it yet. PySide I guess is the "offiicial PyQt" and pythonguis.com has tutorials I can switch between versions of PyQt and PySide so I guess I will use whichever one seems most convenient.
I remember hating everything about Qt when trying to get it to display or play things but I don't think there is a better option.
I might just try to set up the basic input and output flow of the program first, and use dummy functions with RNG and placeholders in place of prompt submissions, get that working first since I'm on my laptop anyways, then try getting a bigger LLM running on my desktop later this week and see about hooking them together.
>>
every l3 70b tune i've tried is shit for rp. they can write good but starts to repeat itself, forget what just happened or ignores it completely, and pick up patterns even before max context. why are they all based on instruct? i'm going back to miqu
>>
>>101663933
Why use a tune?
>>
>>101662202
>>101662157
pls resbond
>>
File: file.png (16 KB, 2336x570)
16 KB
16 KB PNG
Which ones should I try?
>>
>>101664674
hit them all with the nala test
>>
>>101664674
>12b
>8b
try giving up?
>>
>>101664954
>>101664954
>>101664954
>>
>>101664674
mini magnum
>>
>>101659369
You missed limarp getting merged or trained about everywhere lol. I swear many people were starting to get sick of it.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.