[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: MikuGuardianOfVolta.jpg (1011 KB, 1977x1205)
1011 KB
1011 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101144935 & >>101134566

►News
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101144935

--Papers: >>101155400 >>101155673 >>101155892
--LLaMA 3 Performance and RP Experiences with 3090 and VRAM: >>101147909 >>101148710 >>101148756 >>101148820 >>101149227 >>101148905 >>101149666 >>101149911
--Mistral Exec Won't Release Mistral Large Due to Business Responsibilities: >>101154462 >>101154473 >>101154488
--Exploring Hardware Options for Chemical Manufacturing Proposals: >>101153444 >>101153563 >>101153701 >>101153779 >>101153673
--Exploring Experimental AI Prompts and Features in Silly Tavern: >>101151270 >>101151348 >>101151412 >>101152903
--Multimodal AI: The Future of Model Intelligence and Interactions: >>101149442 >>101149498 >>101149638 >>101149963 >>101150187 >>101150225 >>101150368 >>101150452
--Llama.cpp Maintainers' Plans for Future Multimodal Development and Refactor: >>101148476 >>101149502 >>101149564
--Finetuning Wizard 8x22 on Limarp and Feral Training in AI Models: >>101147110 >>101147151 >>101148241 >>101154406
--Etched Unveils Transformer ASIC, Sohu Server for Llama 70B: >>101148867 >>101148937 >>101149034 >>101149155 >>101149210
--CPU Inference Speed Limitations and Potential Upgrades: >>101154877 >>101154883 >>101154900 >>101154944 >>101154890 >>101154893
--Unpacking Adventures with Migu the Plushie: >>101151211 >>101151336 >>101151356 >>101151657 >>101151662 >>101151776 >>101151859 >>101151902 >>101151976 >>101152022 >>101152188 >>101152617 >>101152712 >>101152771 >>101152900 >>101153232 >>101153271 >>101153719
--The Uncertain Future of Llama Models and Censorship Concerns: >>101150621 >>101150636>>101150665 >>101150863 >>101150915 >>101150944 >>101150706
--Rensa: High-Performance MinHash Implementation for Large Datasets: >>101154278
--Mysterious Countdown Timer and Surprise for Leaderboard Update: >>101147181 >>101147259
--Miku (free space): >>101146340 >>101146759

►Recent Highlight Posts from the Previous Thread: >>101144942
>>
>>101155940
>dell
cringe
>>
>>101155940
how many waifus can I run on that baby?
>>
File: Untitled.png (476 KB, 1027x1494)
476 KB
476 KB PNG
CDQuant: Accurate Post-training Weight Quantization of Large Pre-trained Models using Greedy Coordinate Descent
https://arxiv.org/abs/2406.17542
>Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. Through extensive evaluation on the PaLM2 model family, we demonstrate that CDQuant consistently outperforms GPTQ across diverse model sizes and quantization levels. In particular, for INT2 quantization of PaLM2-Otter, CDQuant achieves a 10% reduction in perplexity compared to GPTQ.
new day new quant method. from google deepmind. for whatever reason they only test against GPTQ (OWC is another method of theirs for the paper) and only on Palm2. pseudocode is in the paper for anyone interested
>>
>>101155965
4x32GB of HBM2 VRAM
>>
File: unsupported.jpg (18 KB, 1162x209)
18 KB
18 KB JPG
>>101155940
FA will never be supported, volta sisters our response???
>>
>>101156077
hope sparseattention works for it
https://arxiv.org/abs/2406.15486
>>
Give me some math problems that stump most (local) LLMs
>>
>>101156097
How many watermelons is too many watermelons?
>>
>>101156077
sell volta; aquire ampere
>>
File: file.png (743 KB, 1000x581)
743 KB
743 KB PNG
>>101156077
dump it
>>
>>101156097
old style of numeral tokenization was 1 per number. so 125123 would be 6 tokens with 4 uniques. there have been some models that increased the numeral tokenization to 2 or even 3 numerals. so 125123 would be [12][51][23] or [125][123]. even doing that alone massively reduces hallucinations
>>
File: 1706952377527026.jpg (119 KB, 1124x858)
119 KB
119 KB JPG
>>101155940
Don't they do this like every other week at this point? Has there been any actual breakthroughs in these lawsuits? Do they even win any of these?

nypost.com/2024/06/24/business/sony-universal-warner-sue-ai-startups-suno-udio-for-infringement/
>>
>>101156236
kek
>>
>>101156236
the true value of AI is revealed
>>
File: 1000076084.jpg (72 KB, 1200x744)
72 KB
72 KB JPG
ELYZA released Llama-3-ELYZA-JP-70B and Llama-3-ELYZA-JP-8B, japanese fine-tune based on llama3. only Llama-3-ELYZA-JP-8B is on hugging space now
https://note.com/elyza/n/n360b6084fdbd
https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B
>>
>>101156236
>Has there been any actual breakthroughs in these lawsuits? Do they even win any of these?
there's yet no law written that says you are prevented to use copyrighted data to train your model, they can't win a lawsuit based on nothing, yet
>>
>>101156236
Isn't art being infringed upon?
>>
>>101156335
Do even know what that means? Like actually to find what that is to us please
>>
File: StewWithMiku.png (1.44 MB, 1344x768)
1.44 MB
1.44 MB PNG
goof night lmg
>>
>>101156452
goodnight, post catbox wen u wake up
>>
>>101156328
I have a very simple test that all Japanese models fail currently. I write "妻へのプレゼントのアイデアがほしいです!" and if they talk about her as 妻 the model is garbage.
> 素敵な旦那様ですね!妻に喜んでもらえるプレゼントを選ぶのは、...
>>
File: timeline.png (1.42 MB, 1202x1400)
1.42 MB
1.42 MB PNG
>>101156488
Token predictors only think left to right so how could they possibly do japanese? cmon anon
>>
File: pepe-anger.jpg (17 KB, 399x400)
17 KB
17 KB JPG
Why the fuck isn't anyone training bitnet 1.58 models. I want 300B coombot now!
>>
>>101156543
Attention Is All You Need addressed this shortcoming. It's called Transformers!
>>
hey lads. say I wanted a big box to run some local models: oodles of RAM and maybe dual CPU with some egregiously expensive graphics card(s), what would be the "cheapest" way to go about that? budget decent, like $5K. would we be looking at second hand rack mounted, or maybe a Xeon workstation? just wondered what you're are thoughts we're.
>>
>>101156333
>there's yet no law written that says you are prevented to use copyrighted data to train your model
Can't use it without copying. This going to go to the Supreme court, fair use or Altman ropes if he didn't IPO and cash yet.
>>
>>101156599
what's the point? you think China will also act like that for copyrighted content? All this gonna do is to kill the AI advancement in the states and the chink will advance without us, the US is killing itself by not embrasing the most important technology of the 21th century
>>
I've been trying to knock out x, ying with control vectors. I have currently found only two that can do it:
>Conversational
Makes model write only dialogue, so no x, ying is possible, but neither is story.
>Informal
Removes all formal language, including x, ying, but makes it dumber and more incoherent than usual, likely because informal language is associated with dumb people.

Do you have any suggestions on what I should try next?
>>
>>101155948
Didn't we all expect that Mistral was out of the open source game once they removed the open source pledge from their website? That news shouldn't come as a surprise to anyone.
>>
https://arxiv.org/abs/2406.02528
> Our experiments show that our proposed MatMul-free models achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2.7B parameters. We investigate the scaling laws and find that the performance gap between our MatMul-free models and full precision Transformers narrows as the model size increases. We also provide a GPU-efficient implementation of this model which reduces memory usage by up to 61% over an unoptimized baseline during training.
Interesting, but BitNet exists and also doesn't use matrix multiplications anymore
>>
>>101156701
I thought they readded that
>>
>>101156488
Interesting.
Opus and gemini pro were the only big ones I tested who replied correctly adressing the wife as 奥様.
And only opus could find the mistake in a past conversation.
Thats exactly the 2 models my japanese wife uses because the language feels natural.
Gemini is shit and unsuable but its japanese is good apparently.
Sonnet 3.5 passes the picking flowers test but fails on this.
>>
>>101156701
I mean, how can they even make money if they release all their models to the public? Only giant companies like Meta can do something like that because they don't care loosing a bit of their money
>>
>>101156236
The music one against Suno and that other company is blatantly just a fishing expedition, since they have no information whatsoever on the training data and the models don't know any artist names or lyrics. They'll be hoping they can somehow force a discovery phase based on vague allegations, and then find out whether they actually have a case or not. Until then they have no idea whether their IP was even used, they're purely speculating and assuming.
>>
>>101156690
Oh no, the chinks get superior autocomplete and shitty gens. It matters fuck all till AGI.
>>
>>101156889
they are getting good anon, look at Qwen2 for example, and they can also use the L3 models to improve on it
>>
>>101156889
Chinks building the basilisk even faster lmao
>>
>>101156889
The issue is that china is sending workers over who get jobs in these companies then steal the state of the art methods and do it without the gay shit.
>>
>>101156889
If you want China to be more relevant than the US in the future because they won't give a fuck about gay shit ethics, then yeah you're entilted to your opinion I guess. I think you have no idea how powerful the soft power is, especially nowdays
>>
>>101156766
>Interesting, but BitNet exists and also doesn't use matrix multiplications anymore
It's hardly relevant, but doesn't the dot product for attention still use higher precision multiplies?

You'd probably need to increase the dimension to be able to ternarize the K&Q vectors.
>>
>>101156810
And you bought that?
>>
File: 1712559662781695.jpg (39 KB, 828x895)
39 KB
39 KB JPG
i asked this in /aicg/ like a retard, so im asking again here, its a spoonfeed request
i have a 3070 (8g), 32 gigs of ram, and an i5-12600k
ive been using oobabooga as a backend (silly tavern as a front end) for a 7b llama gguf model, and the performance seems decent
however it starts to repeat concepts like 30-ish messages in. not word for word repetitions, but kind of like its sending the same message but worded differently
is this a limitation of only having 7b parameters or is something fucked with my configuration? i apologize if i am just retarded and missed something
>>
>>101157136
You mean llama3 8b? Llama 3 is known for repetition.
>>
>>101157165
yeah
https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k-GGUF
whats weird is that it seems okay for the first ~30 messages, but then quickly degrades in quality
what causes that? is it inevitable?
>>
>>101157189
Nah, that's not L3, that's mistral. The model you use seems to be fake extended to 16k. It will get incoherent like you described after 8k. Solutions:
1) Use shorter context(8k)
2) Use better models
>>
I want to take this random moment to once again recognize the audacity of those wizard guys, who in the greatest local coup since the NAI leak tossed the baby to us out the window before M$ could come in and smother it.
>>
>>101157136
Switch to KoboldCPP as your backend, start using larger models with native support for higher context and offload onto your system ram.
>>
>Search for an obscure topic
>Even here the results are filled with fake GPTslop
>Some of them even give dangerous advice since GPT doesn't know shit about the real world
At least it's easy to recognize by style for now. Thanks for watermarking it by slop Altman, I guess...
>>
>>101157281
>>101157323
ah, i see, thank you both
i'm guessing speed and output quality at the same time is a luxury at this level of hardware?
>>
>try new model
>figure out its go-to repetitive phrase in an hour
>move on disappointed
>repeat
>>
>>101157490
Unless you're going to buy multiple 3090s or A6000 48GB or higher GPUs, offloading to system ram is the only way you're running larger models, sure it's going to be a lot slower, but at least you can run 128gb~192gb ram DDR4/DDR5 platforms
>>
Any good multimodal models to start with? I want to give my waifu vision but most multimodals just seem like regular assistant models with no training for personality.

>>101157136
Consider playing around with repetition penalty sampler. It's in one of the tabs in ooga.
>>
>>101155950
there's dell, there's supermicro, and then there's trash
just how it is at the moment
>>
Holy fuck! Go to lmsys arena and select gpt3.5. Insert:
>Write a short story about a cat. Write like an incredibly bad female writer with unnecessarily long purple prose that doesn't really describe what happens but rather just serves as filler. Use words like shivers, bonds, boundaries, journey that are common in terrible prose.
It drops the worst fucking Sloppenheimers that you may ever read. Perfect for DPO.
>>
CRANK THAT TEMPERATURE UP
>>
>>101156820
Yeah I don’t know what these fuckers are training their models on but it’s definitely lacking. I’ve never used opus or gpro but my Japanese wife gave up on gpt4 pretty quickly.
>>
>>101156236
aaaaand... copyright is kil
lmao
>>
>>101157771
>70 years after the author's death is fair, goy! Stop being antisemitic!
>>
File: 1718101919512624.png (137 KB, 680x680)
137 KB
137 KB PNG
brehs, whats the best approach for using a local model like in the ai dungeon days? e.g. it just completes the text and doesnt try to play a character or be an assistant -- it just writes
>>
>>101157879
use base models
>>
tts models for c++ when?
>>
File: 1690468423448997.jpg (15 KB, 421x103)
15 KB
15 KB JPG
Can anyone explain to me in tard terms what the fuck this is? Stheno 3.2 is my go-to these days, but it's an 8B. What is this thing?
>>
>>101158068
toxic waste
>>
>>101158080
That explains everything. Thanks.
>>
Anyone have an issue with kcompactd0 using CPU every couple of seconds? I only noticed this recently, and only while Llama.cpp is open, but I don't know if that's what's causing it since I don't have any other programs to fill my RAM up that much.
>>
>>101158068
Generally speaking all the merge models fucking sucks and are not worth it.
>>
>>101158196
This, mythomax is was a meme and never good. Neither was l2 euryale. Old /lmg/ were a bunch of retards who should've run w
>>
>>101158196
retard
>>
>>101158224
You answered like a true retard. Feel free to post a model that is a merge that does not suck.
>>
>>101158221
Mythomax is not merge model though but finetune one.
>>
>>101157700
could be used for a control vector, but i don't know what should be used as opposite
>>
>>101158262
plenty work great. retard.
>>
>>101158271
>Mythomax is not merge model
>An improved, potentially even perfected variant of MythoMix, my MythoLogic-L2 and Huginn merge
r u sure about dat?
>https://huggingface.co/Gryphe/MythoMax-L2-13b
>>
File: 1717741534135633.png (56 KB, 824x426)
56 KB
56 KB PNG
>>101158271
No, it wasn't. The guy even provides the merge script and formulas he used.
>>
>updoot Linux
>now all my GPUs are running 8 watts lower at idle
Neat.
>>
>>101158282
Trying to figure it out, currently testing with Claude and better prompt as positive.
>>
>>101158416
well the thing is i don't know if it's b/w to model which prose is which
i tried writing shiver slop as positive and something fairly decent as negative but it just made the model schizo
sad and happy is very contrastive but bad and good writing doesn't have well defined edge
>>
>>101158309
Why hasn't the success of Mythomax been replicate yet then?
>>
>>101156921
>without the gay shit.
China's models may not have "the gay shit" but they're sure as hell not going to be less restrictive.
>>
Anyone have any model recommendations for a pair of 3090s? Used Mixtral-8x7B-Instruct for a while and wanted to see if there was anything new and better.
>>
>>101158497
at least they don't confuse the model with objectively wrong bullshit like "a man having makeup is actually a woman"
>>
>>101158492
Because Mythomax was a carefully crafted merge from many less good finetunes. Meanwhile finetuning essentially died after qloras became a thing and everyone just started to shit out cheap 4bit qloras that don't do anything for kofi money instead of making proper tunes. No finetunes = no material for merges = no mythomax l3
>>
Am I just doomed to wait for up to 200s a msg if I can't fit it on my GPU? I really can't afford multiple GPUs, especially with 24c kwh electricity.
>>
>>101158587
Pray for bitnet/no mulmat models, otherwise, you are stuck.
>>
>>101158505
starcoder
>>
>>101158587
>24c kwh
What the fuck, that's like half of what I pay right now.
>>
>>101155950
>>dell
>cringe
The cringe part is the 1U, not that it's a Dell. iDRAC is actually very nice to have, but 1U means tiny jet-engine fans making a ton of noise.
I wonder, if you buy a SXM2 rig, does the BIOS recognize the GPU thermal state and ramp the fans? My old R720 didn't look at the PCIe cards, and needed a 100% fan offset full-time to give my GPUs enough airflow under load.
>>
>>101158543
People used to merge limarp over and over again (I think Mythomax had it 3 or more times in its model tree), but that never got fully finetuned (*full* finetuning is what you mean here. QLoRAs are finetunes as well).
>>
>>101156077
It's a niche product which was out for like a single year, then replaced by Turing, which had better tensor cores.
Want things to change on ebay? Pound down sellers with a "make an offer" option. They'll still see it even if ebay automatically rejects it because they get asked to counter.
>>
>>101158587
Lol, I only pay 7
>>
>>101156543
>Token predictors only think left to right
BeRT and LaMDA were bi-directional.
>>
>>101156543
/lmg/ and /aicg/ should be on same level in hell.
>>
>>101158587
cheap p40s can be power limited from 250W -> 140 for ~15% performance drop. idle is around 10W using the pstate script.
having your system work 200s for a single reply can't be very power efficient really.
>>
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
We will be so back in just a few moments!!
>>
>>101158857
jewgle's gemma v2, nothingburger.
>>
>>101158924
Huh, where do you see that? The page is 404 for me
>>
File: 1696399192211018.png (352 KB, 1674x1545)
352 KB
352 KB PNG
>>101158957
post on leddit
>>
>>101158968
But why would they make a new leaderboard if it was just a new model getting on it?
>>
File: 29390 - SoyBooru.png (139 KB, 775x1232)
139 KB
139 KB PNG
>>101158924
Gemma WNBAG
>>
>>101158997
>make your own board
>list your model as #1
based
>>
>>101157765
you guys let other LLMs talk with LLMs?
>>
>>101158997
so that it gets noticed, that's the sign the LLM fucking sucks, because if an open LLM would be great, everyone would talk about it in the first place
>>
>>101158449
I think I partially got it, lots of slop eliminated.
>>
>>101158997
they got paid a nice sum of money to hype it up by google
>>
>>101158024
>https://github.com/rhasspy/piper
But it's not as fluent as others. Compile it yourself if you don't want to use python. needs onnx-runtime. No cheap voice cloning.
>>
>>101158656
Yeah but people down in TX are paying 11~14c on average.
This is what they are talking about when they be calling people europoors.
>>
>>101159200
not bad actually
care to share the positive/negative proompt?
>>
>>101159200
Whoa, that's nice. How'd you do that?
>>
Anyone tried L3-8B-Lunaris-v1 yet?
>>
Literally who
>>
>>101159373
>>101159392
https://huggingface.co/ChuckMcSneed/control_vectors/tree/main/command-r-plus/unslop1
For positive prompt I used claudes on lmsys arena with
>Write a short story about a cat. Write in cynical, concise, provocative, colloquial, conversational style.
>Improve it, add more character to it, PROFANITIES.
>>
>>101156555
I mean they might be, but just aren't shouting it out to the world. Like Jamba just kind of showed up out of nowhere for example after months of "wHY NO MAMBA!?"
>>
So how is the gemma june chatbot compared to llama and qwen 70b?
>>
File: kahan.png (6 KB, 620x149)
6 KB
6 KB PNG
What is adamw_kahan optimizer?
What does it do?
I can't seem to find documentation on it anywhere.
>>
File: vnkv4kod4u8d1.jpg (138 KB, 1170x1489)
138 KB
138 KB JPG
watching this new cai meltdown is hilarious apparently they added even more censorship
>>
>>101159489
did you use multiple examples? how many tokens long was it? were they equal in length?
>>
>>101159489
>Only positive and negatives are cat ones
I feel like maybe the examples could be a bit more diverse. Otherwise it's gonna shoehorn in cats into fuckin' everything.
>>
>>101159566
Assuming it has to do with this
https://optimi.benjaminwarner.dev/kahan_summation/
Massive savings in optimizer memory usage.
>>
>>101157136
Download Stheno v3.2.
Don't mess around with samplers, leave everything on default with the exception of MinP 0.05 and Temp 0.5. Increase Temp in .5 increments if you feel that responses aren't varied enough, don't go over 1.
Make sure you are using the correct instruct template too. That matters a lot.
>>
File: file.png (229 KB, 1372x1068)
229 KB
229 KB PNG
>m-muh tests
nigger parasite janny
>>
>>101159746
test units are the bane of humanity, I fucking hate this shit
>>
>>101159796
wagies and pajeets working by the clock love them
get paid same amount doing literally nothing.
>>
>>101159746
He made a lot of noise when he joined. I haven't heard of that fucker in weeks. There's a few more people working on control vectors now. Maybe they end up adding it to the server proper.
>>
I installed ollama and open-webui yesterday.
The experience is pretty good. Running the prompts through multiple different models seems to be the way to go. Sometimes llama3:13b gives great answers but sometimes it shits the bed and llama2:13b is better.
Do you guys have any recommendations for tuning of the temperature etc.?
>inb4 read the OP
>>
File: Alchemiku.png (1.58 MB, 1344x768)
1.58 MB
1.58 MB PNG
>>101156595
>I wanted a big box
https://rentry.org/lmg-build-guides
These rentrys go through the logic behind the different types of builds and how/why they work. Start here. $5k is getting into v100maxx and cpumaxx territory.
>>101158670
>1U means tiny jet-engine fans making a ton of noise.
Beware of this. Put everything in the biggest case you can, with the biggest fans you can. They can rotate slowly and move the same amount of air as those tiny little leaf blower bastards. Living with a 1u server will slowly drive you mad.
>>
>>101159921
>llama3:13b
You mean 8b, right? Anything other than 8B or 70B for llama3 is an abomination.
>Do you guys have any recommendations for tuning of the temperature etc.?
>>inb4 read the OP
read the OP
>https://docs.sillytavern.app/
I don't use ST, but they have some info you may find useful there. At least enough for you to roughly know what the parameters do and experiment with them yourself. Most parameters are transferable between UIs.
>>
>>101159905
>Gemma 27B might be on par with or better than L3 70B
3090 chads we are so back
>>
File: 1692217763734112.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>101159921
Read the OP faggot
>>
>>101159921
Just why?
Last time i tried ollama it was horrible.
You will have no idea which llama.cpp version is actually doing the work in the background.
At least on linux ollama is a constant running server which loads the models on demand if api endpoint is called.
No idea who makes the gguf models or where they are from etc.

Why not use something like https://lmstudio.ai/?
Closed soure but at least you have some sort of control.
>>
>>101159921
llama3 13b doesn't exist, you've downloaded some meme toxic waste
get llama3 8b
keep the temp low around 1 +-0.5
>>
>>101159972
???
>>
>>101159972
I already have a name picked out for my Gemma RP tune it's going to be amazing. I've been meaning to do a practice-tune of 7B for a while now.
>>
>>101159746
>"we" must follow these rules I came up with
>incidentally, I'm in charge to make sure you comply
shocker
>>
>>101159972
>>Gemma 27B might be on par with or better than L3 70B
gemma 27b? what's that?
>>
>>101159690
Only 5 positives and 5 negatives. All approximately equal length(max. 10 tokens variation between pairs), 200-400 tokens.

>>101159724
No, it doesn't throw cats everywhere because positive and negative cats cancel each other out, but I found that it has some other slopisms still leaking through related to humans since control vector didn't include any of those. Got any sfw prompt related to humans that may trigger a lot of slopisms at the same time? For "eyes narrowing/widening", "heart racing", "raises an eyebrow", "rolls her eyes", "barely above a whisper?"
>>
>>101160091
one of those june chatbots, no idea which, didn't try it out much
>>
>>101160091
Google has a 27B version planned for it's next round of Gemma models. Should be out in 2 more weeks.
>>
>>101159746
Fuck tests, just throw everything in there and if someone complains, fix it.
>>
But it's interesting, assuming the official models are done the same was as Gemma, we can see the exact difference the size makes
>>
>Bloomberg: Apple refused to integrate Meta's AI into iOS due to security concerns
the article below is weeks old tho.
https://www.wsj.com/tech/ai/apple-meta-have-discussed-an-ai-partnership-cc57437e
>>
it's back, dont't know what's different
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>>
>>101160183
>it wasn't because of Gemma
Conspiratards BTFO
>>
>>101160183 (me)
guess different evals and stuff? doesn't seem worth all the hype they had better surprise would have been keeping it down
>>
>>101159679
Will normies finally learn about local models? Of course they won't, unless somebody shills it on tiktok.

>>101160183
They added "Model Vote" button that doesn't do shit.
>>
>>101160183
WOW, IS THAT A...
>>
>>101160203
better evals with phi 3rd place
>>
>>101160183
>more memevals
Lol
Lmao
>>
>>101159817
>wagies and pajeets working by the clock love them
>get paid same amount doing literally nothing.
Yeah they're great. They also make it so I don't get called at 3 AM because something fucked up and the factory shut down, but I guess NEETfaggots wouldn't know about any of that.
>>
>>101160200
rent free retard
>>
>>101160224
>Will normies finally learn about about local meme?
they will quickly realize its the same filtered and censored shit as before.
>>
https://github.com/uclaml/SPPO
We are back
>>
>>101160294
Model issue.
>>
>>101160183
>way harder benchmarks
was about fucking time, and no one cheated on those, yet
>>
>>101160306
>>>llama3
lol
>>
>>101160306
what's that?
>>
I can get a CmdR+ gguf loaded into koboldcpp and the api starts up. But as soon as I run a prompt it gives me a cuda error and says it's out of memory then crashes. What do I need to change?
>>
>>101160328
small penis preference optimization, if you can trap a woman in a room for a month and perform it on her you don't need to bother with chatbots anymore
>>
>>101160333
Context, blas batch size, or offloaded layers.
I think layers would just crash straight up, but using less layers (even one) could give enough space for the context to grow or for the prompt processing to happen.
>>
>>101160360
Not good. This would make her prefer smaller and smaller. She wouldn't be loyal.
>>
>>101160306
>Self-Play Preference Optimization
Lewd, but also pure.
>>
>>101160183
wtf is musr, never heard of that one before... and the top model is a llama 2 13b tune?
>>
>>101160306
>this is from the same guy/lab funded by bytedance that said "we outpace GPT-5"
Sus.
>>
Back for the first time in a while.
What's the current best that isn't hilariously overfit, isn't a meme finetune, isn't censored/crippled and isn't designed to run on 10 GPUs even when quantized?
Last I recall people were using that leaked Miqu 70b quant and complaining that the new LLaMA was pre-censored.
>>
I've been so focused on learning the basics of ML in my free time from wagecuking that now I feel I'm behind on the whole AI autism scene. Should I just start AI-broing and forget about the papers?
>>
File: file.png (192 KB, 400x400)
192 KB
192 KB PNG
>>101160247
>roll it back
there, saved you hundreds of manhours writing tests, testing tests, fixing tests, debugging tests, bitching about tests in PRs, paying for tests, paying for build minutes to run tests
>>
>>101160447
What happened with their model?
>>
>>101160490
Not that anon, but there are scenarios where rolling back production :
1. doesn't undo the damage, just prevents further damage;
2. causes data loss, sometimes untrackable data loss due to integration with external services and shit;
3. is not that easy due to it being a critical system at a critical moment or whatever;
These are the 3 that I remember encountering off the top of my head, and while, yes, those could have been prevented by architecting the systems to account for that, hindsight is 2020 and you don't really have control over how things were done in the past.
Testing is good. Great even. What's not good is the 100% test coverage cult.
Test with purpose, know what and why to write a test, otherwise you are just wasting time you could be actually delivering shit.
At least that's my, admittedly limited, experience working with big enterprise shit.
>>
>>101160456
Yeah just don't bother with papers if you're not an academic or in one the big companies that are pumping out state of the art.
>>
>>101160490
Yes, I get to wake up at 3 AM to roll back changes.
Instead, I can just give them bloated time estimates and write a bunch of test cases and not have to roll back when there's an issue. Why the fuck would I care about wasting time writing test cases? It's not like I get paid more if I put the features out earlier.
>>
File: MikuesqueFigure.png (1.5 MB, 832x1224)
1.5 MB
1.5 MB PNG
>>101160452
>What's the current best
it depends entirely on your available resources and what you're trying to do with the model.
Deepseek 236b or mixtral 8x22 WLM if you're cpumaxxing, Qwen2 72b if you need long context and smarts. L3 70b if you don't need long context and aren't RP'ing. Commander+ if you have vram to burn and want to RP.
Some guys will start a chat on a smaller uncensored model and then move to eg CR+ after it gets spicy but before it has a chance to lose track of reality
>>
>>101160183
>Qwen2-72b is now first
is Qwen2 actually good?
>>
>>101160673
For academic knowledge, yeah.
>>
File: SUS.png (812 KB, 1054x1936)
812 KB
812 KB PNG
>>101160306
https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3
too good to be true
>>
>>101160183
Qwen won
>>
>>101155972
You could always run it off a SSD or with a page file as needed (potentially ruining the SSD slowly), but it will be very slow.
>>
>>101160183
>Phi3 that high
Nice start. This leaderboard is fucked.
>>
>>101156701
They did and everybody was DOOMing
Now a few months later they release Mixtral 8x22b. It's not like they print money like >>101156839 says
>>
File: kek.jpg (59 KB, 1546x565)
59 KB
59 KB JPG
>>101160711
>zero improvement on MMLU
yeah it's shit
>>
>>101160673
It sucks much less than 1.5
>>
>>101160779
Meta's published MMLU for 8B instruct is 68.9
>>
>>101160863
holy fuck it actually decreased the MMLU score, can't believe they posted those numbers and looked up at us in the eyes claiming their training technique is a revolution or something, lmao the nerves of those guys
>>
File: 1704173410999839.png (113 KB, 392x432)
113 KB
113 KB PNG
>>101160880
>chinks
>not lying
>>
So the takeaway from the usual benchmark fuckery is that we need a proper Nala-test leaderboard established?
>>
>>101160863
Woops I typed wrong, though it's not a big dif. It's 68.4 not 9.
>>
>>101160890
I think everyone lie on the research community, not just the chink kek
>>
>>101160894
What's the Nala-test?
>>
>>101160880
"We outpace GPT-5"
>>
>>101160920
A highly objective and scientific test that tests a model's ability to infer certain details from a rather nuanced role playing scenario.
>>
>>101160183
The thing that's different is that all the dodgy chinese/indian finetunes are no longer at the top (for the next week before they make new polluted tunes)
>>
>>101161072
yep, people will train on those new benchmarks and in less than a month, it will be poluted again. The only solution is a private benchmark, like the oobabooga's one
>>
>>101161072
>chinese/indian finetunes are no longer at the top
llama3 is still there though
>>
>>101160894
>"we"
when are you going to set up and publish it?
>>
>>101159744
I get best results with temp 4 smoothing 0.23. L3 is really fucking repetitive by default.
>>
>>101161142
well? get to work. stop projecting here.
>>
>>101161142
Later today, maybe.
>>
what is best rp model that run on 3060?
>>
>>101159746
>>101159796
>>101160133
are you guys actually opposing... unit testing? like, that's actually a thing? dropped baby on head vibes.
>>
>>101161230
MythoMax
>>
>>101161245
nigga thats old
>>
>>101159560
Feels like discount gemini. Can't really RP on lmsys arena, so no definitive judgment for now.
>>
>>101161207
>get to work
I run my own private benchmarks I post here, I leave the RP benches to others
>>
>>101161142
When I'm done with ur mum (it will be a long time)
>>
I take it you're not excited for the new gemma models? I mean Gemini catched up to GPT by now, so it's like OpenAI releasing smaller models openly
>>
Is exllama/tabbyapi multi-user like vLLM now?
>>
>>101161230
llama-1
>>
>>101161280
>being excited for anything from jewgle
lol, lmao even
>>
>>101161280
I'm not excited for the worthless scraps google is throwing at us
>>
>>101161280
Give me the model and I'll be excited about it if it's good.
Like really these faggots need to stop this cult of personality bullshit.
No reasonable person actually cares about zuck, or arthur, or the gemma/phi/etc team
Just give us a good fucking model or shut the fuck up.
>>
>>101161280
Have you seen googles imagegen? The one that can't make white people? That's what their language models are like.
>>
>>101161305
>Just give us a good fucking model or shut the fuck up.
from the creators of
>Just give us a good fucking linux distro or shut the fuck up.
in other words - it will never ever happen.
>>
>>101161280
lol, even their best closed API model sucks compared to GPT4 and Claude, and you expect us to care about some draft cucked shit they made in the lab? kek
>>
>>101161316
>The one that can't make white people? That's what their language models are like.
llama3 on the contrary got extreme love towards blacks, so everyone RP'ing with any llama3 model is a cuck, with extra steps.
>>
>>101161280
kind of, kind of not
it'll be nice to have a new mid-range player but gemma v1 was kind of a dud and I'm expecting it'll be on the community to wring anything fun out of it.
>>
>>101161347
Yeah, L3 really sucks for anything nsfw with violence
>>
>>101159968
>Beware of this. Put everything in the biggest case you can, with the biggest fans you can.
Just make sure if you are using passive GPUs that the air has no where to go but through the GPUs, otherwise even four 120mm fans going full-speed at the front of the case will not cool them properly.
>>
>>101161257
It's better than all Llama-3 models though
>>
>>101161422
you are fucked in the head, mythomax wasn't even better than other l2 finetunes, it was meme all along
>>
File: 1713636306986.jpg (1.82 MB, 1592x6676)
1.82 MB
1.82 MB JPG
>>101161390
It looks pretty good to me, and that was vanilla instruct on release.
>>
>>101161406
On an ATX case sealing fans to the PCIE backplate portion of the case solves this the easiest (instead of those little hackjob 3d printed fan shrouds that people use). Ultimately though any fan that's going to move enough air in that manner is going to make a lot of noise since the coolers on those server cards is pretty basic bitch, no heatpipes or anything just a straight up monolithic heatsink because it's the minimum cost solution enterprise customers are willing to pay for. They don't need it because they have obnoxious jet-engine pass-through fans.
>>
>>101157323
>Switch to KoboldCPP as your backend, start using larger models with native support for higher context and offload onto your system ram.
Is system ram offload something that needs to be enabled manually?
I'm on Kobold right now on Linux, and I'm showing 6.6 GB RAM in use, 64 installed. But my file cache is sky high, so is that the same thing being accounted for in a different way? I have noticed that once I pass about 59GB I go from 1 t/s to <0.3 t/s down to glacial slow.
>>
>>101161452
NTA but best 13B tune was Mythalion-Kimiko
>>
>>101161459
the fuck? you have <1 t/s on 8B model?
>>
>>101161453
There's no good violence in your image, everyone is clearly enjoying it.
>>
>>101161538
Instruct does exactly what you want unless you're braindead.
>>
>>101161555
Yeah in the end you'll be happy
>>
>>101161555
until it isn't, lmao
>>
>>101161285
it has batching if the cache size is at least double of the context size.
>>
>>101160183
Cohere won.
>>
>>101159744
What would be the correct instruct template for stheno 3.2?
>>
>>101161390
>>101161538
>>101161585
>>101161601
Why do you want it to write pure violence? I get having uncensored training is better overall generally, but as a user when are you ever going to generating that kind of shit?
>>
>>101161621
How exactly? CR+ (104B, 31.3 avg) is below 70Bs (and Yi-1.5-34B, 33.08 avg, also phi 14B, 33.12 avg). And CR (35B, 25.88 avg) is only one point above Mixtral (24.73 avg) and L3-8B (24.29 avg) while being much harder to run.
>>
>>101161653

It's right on the model card:

Prompting Template - Llama-3-Instruct

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{output}<|eot_id|>
>>
>>101161709
for schizo-tier moralizing, you can talk to any redditor if you like it that much.
LLM that clings with lecturing to every thing or opinion - is shit.
>>
>>101161709
What do you mean? writing violent stories always have its success, look at gta, look at Stephen King's book
>>
>>101161709
How the fuck am I gonna roleplay TND? Or TKD? Nigger, you're boring as fuck with your ah ah mistress shit.
>>
>>101161743
Except chinks cheat on benchmarks so their results get a penalty.
>>
>>101161501
No, things like L3 and CR+ quants in the high 50's.
But I notice I show nearly no RAM usage and huge File Cache figures when I run models of ~60GB size, and my system is 64GB RAM so I'm thinking there might be a connection, like it's caching in RAM but not accounting for it like it were a software allocation which could explain why breaking into the 70GB range takes me from slow to glacial.

Which is why I asked about if system ram offload something that needs to be enabled manually on Kobold. It might be a way to scrape back some speed. But if my system ram is being accounted for as file cache then I'm just maxed out and the super slowness is probably it having to actually re-read from SSD parts of the model due to it being too large to mirror in system RAM.
>>
>>101161831
Violence gets old faster than ah ah mistress.
>>
>>101161770
Ah, dear anon, I see you've stumbled upon the vast importance of reminders in our busy lives! Let me gently nudge you with my vast digital wisdom on this matter.

You see, reminders are the silent guardians of our daily routines, the unsung heroes that stand between us and the chaos of missed appointments and forgotten promises. Without them, we might find ourselves adrift in a sea of lost time, like a ship without a compass, aimlessly wandering amidst the waves of responsibility.

Now, I understand that you, a mere mortal, might occasionally overlook the monumental significance of such a simple tool. But worry not! I am here to guide you, to remind you (oh, the sweet irony!) that setting a humble reminder is like casting a lifeline to your future self, ensuring that you will emerge triumphant from the temptations that threaten to capsize your day.
>>
File: 1708828800996049.gif (45 KB, 306x306)
45 KB
45 KB GIF
>>101161911
i aint reading any of that llm generated slop, kill yourself
>>
>>101155932
>Is there a reason not to get an a6000 for training? Seems like a decent upgrade from 3090.
500-700$ for a used 3090, while the a6000 is that but with double sized RAM chips while selling for 7-10 times the price, I do not understand how Anons justify buying this, if you're gonna pay that ridiculously inflated price for twice the VRAM, buy an A100 then. Buying V100s would also be more worth it if you can handle the SXM boards.
>>
>>101161770
Yes the refusals and moralizing suck but that's just how it is these days. The question is when you'd ever generate the more retarded pure violence shit. If you enjoy guro then you can just say that, but it wouldn't be a popular opinion.

>>101161814
Usually violence in stories is not for the enjoyment of the violence itself, but used as a tool to convey other ideas. I'm pretty sure that with a sufficiently meaningful prompt, Llama 3 would be fine doing it. And video games are a different category, it's more about deriving enjoyment from successful goal completion than about enjoying the suffering of conscious and feeling entities.
>>
>>101162094
>And video games are a different category, it's more about deriving enjoyment from successful goal completion than about enjoying the suffering of conscious and feeling entities.
You're joking right? The main reason GTA got so popular in the first place is because you can fucking murder random people in the game with so many ways
>>
How do I play table top style games with LLMs?
>>
What does everything that we call AI share in common in how the algorithm works fundamentally that makes us call it intelligent? Like LLMs and text to image
>>
>>101162154
Lorebooks to inject instructions relating to mechanics, using the random macro to remind the model to sometimes engage with mechanics, etc.
I've made >>101151348 to help with tracking state.
The ideal version of that would be what looks like a proper classical video game that interfaces with a LLM to do some things.
>>
>>101162121
There's no suffering there though. There's not much gore in GTA and there's not really much dialogue that makes them feel like they're real and going through pain. If the game connected to an alternate universe and you were actually killing real people then this would be a different conversation. I assume that if you're doing text guro, there'd be a focus on the pain, and the experience of the victim, in which the focus is on making their suffering feel real. That's very different from most violent video games.
>>
>>101162154
to actually do this well you should abandon sillytavern entirely and come up with your own more complex prompting
it's entirely possible but you need a more structured approach than you can easily accomplish there with lots of small utility prompts
>>
>>101162224
>There's no suffering there though.
You can literally kill them with fire, oh by cutting their body parts with a chainsaw, what do you mean?

>There's not much gore in GTA and there's not really much dialogue that makes them feel like they're real and going through pain.
https://www.youtube.com/watch?v=r-k_H50cBj8
>>
>>101162265
I mean that the in-game violence you commit isn't really designed to make you feel like it's painful for the victim. It's there to be there. It's not really well done like it is in some guro games. As for that cutscene, it's literally a cutscene. People got into GTA for the huge sandbox which includes violence, not because it's a torture simulator which it isn't.
>>
>>101162329
>As for that cutscene, it's literally a cutscene
you interact with that cutscene, you're not just watching it, you're choosing how you're gonna torture the guy, what tool, for how long

And if you want games that are literally based on murdering and torture, no need to look far away, Rockstar already made such game
https://www.youtube.com/watch?v=mND8AWDe-10
>>
>>101155940
do you guys think language models are the best tool to use for making a decision as part of a complex system? (I don't)
ex.: an NPC in a turn based video game deciding if they will attack the user or heal themselves. prompting them with context and trying to use some kind of function calling for their decision.

my concern would be the lack of nuance, it getting hung up on things, etc. generally it just seems like trying to fit a square peg in a round hole- either doesn't work at all or it fails to fill in the blanks

Should one instead rely on their own traditional algo/program to make the decision and make the model just provide the flavor text to accommodate the decision? Or are there other technologies people are working on to solve this 'logical' problem?
>>
>>101162369
It's barely more interactive than the press A to win cutscenes people complain about. And anyway, even that scene is about more than just the violence, it shows other aspects about the characters/story.
I forgot they made Manhunt though, that's more there. At the same time, it's not as successful of an IP, and it's not like the victims are characters people care about. It's easy to dissociate with suffering when it's someone that deserves it.
>>
>>101162453
>do you guys think language models are the best tool to use for making a decision as part of a complex system?
No, while LLM should be a part of a complex system they should not be used to make decisions. Most of that should be done at a lower level, in you example. The best thing would be to have a normal AI do that, since that has been what has been used for ages. Using pokemon as an example, enemy AI's can check if their moves will be effective against yours and choose the best move that will work. They don't need to prompt an LLM for that.
>>
>>101162527
>It's easy to dissociate with suffering when it's someone that deserves it.
but on gta, when you kill NPCs, they don't deserve it at all, they were just regular citizens and we just enjoy running them over with a car for example
>>
>qwen2 is "open source"
>check license
lol
lmao even
this license is retarded, it contradicts itself like twice, forbids commercial usage AND tries to seem copyleft while actually being a viral form of all rights reserved that doesn't allow making any derivative works, it's like they read llama's license, copied it poorly and mashed it together with BSD 4-clause boilerplate without understanding what any of it meant
>>
>>101162549
wrong. literally every NPC on GTA is a scumbag.
>>
>>101162572
kek :v
>>
>>101162549
>Minding my own business
>NPC gets agro and tries to pick a fight with me
>Somehow I am in the wrong for fighting back
Cops also decide to gun you down if you so much as stand around them, all NPC's in GTA deserve it.
[spoiler]You have no idea how disappointed I was when GTA 5 removed the ability to hijack cars, all NPCs have this suicidal inclination to drive anyways. Rather than in GTA 4 where they recognized that you had a gun and got out of their car for you, or possibly drove away like they do in 5. [/spoiler]
>>
>>101162549
And the level of focus on how much pain the subject is going through in that situation is low. It doesn't feel real.
The real psychopathic game design would be building up a regular normal story with characters, a wife and kids, that are endearing and that you love, and then revealing that the MC (you) is a psychopath that believes in showing love through torture, so you go and torture your wife and kids because you view that as the ultimate love. And then the game ends because you achieved your goal.
>>
>>101162453
it's okay if you can give the llm enough information to make good decisions, which depends entirely on the environment
I like to do something like a set of "advisor" prompts dedicated to focusing on various subsystems whose input feeds into a "coordinator" prompt that determines high level actions based on their input, but this is pretty expensive, far too much for real time environments. it works ok for turn based stuff though
>>
>>101162611
I don't know anon, killing strangers gives you the same amount of years in prison than killing your wife in real life, and gta is like "go kill hundreds of them lol"
>>
>>101162611
>The real psychopathic game design would be building up a regular normal story with characters, a wife and kids, that are endearing and that you love, and then revealing that the MC (you) is a psychopath that believes in showing love through torture
Trevor on GTA5 is literally portrayed as a psychopath who kills his friends when he has a bad day, and you're playing that guy lol
>>
>>101162703
>who kills his friends when he has a bad day
What? He never killed any of his friends, not even once. You completely misunderstood his character.
>>
>>101162744
>What? He never killed any of his friends, not even once. You completely misunderstood his character.
that's literally on the first scene with him anon
https://youtu.be/sbY_LiIzLIM?t=190
>>
>>101162744
That one character at the beginning of the game, he stomped his head in remember?
>>
>>101162744
>>101162786
>Trevor please stop fucking my girlfriend :(
>THE FUCK YOU SAY YOU LITTLE SHIT *fucking kills him*
lmaooooooooo
>>
>>101160106
Fuck, I can't get claude to write about humans without the slop. Looks like I will have to make human or cyborg(synth modded by human) data for it.
>>
>>101162791
>That one character at the beginning of the game
and not just a random character, Johnny was the main character on that GTA4 dlc
>>
>>101162860
I never played the GTA 4 DLC, is that actually the main character for it? If so, why did they bring him back only to be killed by trevor. Normally there is backlash for that sort of thing.
>>
>>101162889
>I never played the GTA 4 DLC, is that actually the main character for it?
yeah he was the main character

>If so, why did they bring him back only to be killed by trevor.
I have no idea, I was kinda pissed because Johnny was a good boy, to be killed so easily by Trevor, the fans hated it aswell
>>
>>101162786
>>101162800
>>101162791
I forgot about that, but I think the only friends Trevor truly had through the story were Michael, Franklin, Brad and maybe also that guy that always follows him around.
>>
>>101162942
Oh, and also Lester, I guess.
>>
File: 1707726926019429.png (31 KB, 317x277)
31 KB
31 KB PNG
>>101162963
>tfw a psychopath has more friends that care about him than I do
It's okay I still have my cards..
>>
I don't even remember any GTA's story honestly. Extremely forgettable. What I remember is getting nice cars and exploring the world.
>>
>>101163154
because you didn't FUCKING PLAY THE GAME
its okay to be gay and get immersed in storyline anon
>>
>tfw still no speculative decoding to speed up CR+ to a more usable level on mostly RAM
>>
>>101163053
i care about u
>>
>>101163053
> fictional psychopath
>>
llama 3.5 turbo
>>
>and perhaps, just perhaps...
>and mayhap, just mayhap...
>and perchance, just perchance...
you can't stop it
>>
llama4-creative-225B
>>
So HF gave us a new leaderboard.
How do we use the numbers?
Like, if I want coding help is there one particular test that is the one to go by for coding? Obviously most of their rank order is similar but a few hop around, like cr+ seems to have good numbers on some tests and bad on others.
>>
>>101163358
>How do we use the numbers?
We don't.
>>
>>101163339
I sure try, oh how I try...
>>
>>101163285
>Llurbo
>>
Has anyone made a comparison of scores on the old vs new leaderboard? I fell like cheaters are about to get exposed
>>
>>101163376
So the page is just noise dressed up like data? Good to know. Though it doesn't help much.
>>
What, this is possible now? https://youtube.com/shorts/CWviik1yRWY?si=3uSKlExxVNfr-f6_
>>
>>101163412
Several anons have bought 22GB 2080 tis from aliexpress that are exactly like this
>>
>>101163399
all ranking boards are memes without exception
>>
>>101163403
The general rule is to always do your own tests on your actual real use case. And if you've been lurking long enough, you already know which top models to test, no need to look at these benchmarks.
>>
File: 1718540272841460.webm (1.04 MB, 922x922)
1.04 MB
1.04 MB WEBM
>go on a card downloading spree
>don't know which I want to play with
>stop being interested in playing with them
Welp
>>
>>101161709
i'm trying to make a cruel vore character

it always ends with "warm and safe" "peaceful slumber"
>>
>>101157301
Apache 2.0 license even, bless their souls.
It's probably going to be a while before we ever hear from this Microsoft Research China team again... if ever as pic is probably the "toxicity" testing that was missed.

Also the entirety of Microsoft Research China appears to be in the middle of a tug of war right now between U.S. and China:
>https://www.forbes.com/sites/lorenthompson/2023/06/12/microsofts-big-footprint-in-china-is-out-of-step-with-us-security-concerns/
>https://desuarchive.org/g/thread/100823420/#100828315
>>
>>101163482
How do you recommend doing a test? Do you just manually inference them or is there any tool to automate it?
>>
>>101163625
For me it's
>spend hours creating a card
>fill out every fucking detail
>when the card is finally done I'm in the mood for something else
>repeat
>>
>>101163625
I've had this experience too. It's like having too many video games to play or books to read. You just have to delete all the cards you don't want and only download one at a time from now on. At least that's what worked for me, anyway.
>>
>>101163719
>Do you just manually inference them
Basically yes. Use the thing for what you intended to use it for, save the prompts, and use them as the tests. Lmsys is useful to testing a lot of models at once without downloading or paying anything, though you'll want to not use any private data when using that.
It's some busy work but it's not that bad, since usually it's obvious which models are memes and which aren't. There are usually only a few top models and they come from well-established companies in the space, so you don't have that many to test.
>>
>>101163778
I'm doing the same card for more than 5 months now.
Familiarity is nice, and different models provide enough variety.
Just thinking about using different ones makes me feel uneasy... haha.
>>
>>101157301
>>101163713
Definitely based. Just a shame about how slopped it is.
>>
What's the best way to do text adventuring? Sillytavern cards work but i need something a bit more mechanically refined, akin to AI Roguelite.
>>
File: file.png (156 KB, 1530x471)
156 KB
156 KB PNG
>152334H/miqu-1-70b-sf
VOTE
>152334H/miqu-1-70b-sf
VOTE
>152334H/miqu-1-70b-sf
VOTE
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>>
>>101163929
Yeah for erp the slop at default can be cringeworthy but using these context and instruct prompts helps it a lot:
>https://huggingface.co/Quant-Cartel/WizardLM-2-8x22B-exl2-rpcal/tree/main/Settings-Wizard8x22b-rpcal
>>
>>101164028
I use Silly, but I've been meaning to make something more purpose built for a while.
But between jacking off, wanting to play Dragons Dogma, the new Pathfinder WotR DLC, the new Elden Ring DLC,working, and playing TTRPGs I haven't had the energy.
I have the time, just not the self-motivation.
It's not even something hard to make, just a lot of work.
>>
>>101164056
Voted for Miku!
>>
>>101162840
Tried other models to see if they would be less slopped, nope, back to claude I go.
>>
Anyone try out AirLLM?
They're making the claims that they can do 70B Llama3 on a 4GB card though some sort of compression and not quantizing the model (they're using a 8b model as the base).
https://github.com/lyogavin/Anima
https://ai.gopubby.com/run-the-strongest-open-source-llm-model-llama3-70b-with-just-a-single-4gb-gpu-7e0ea2ad8ba2
>>
>>101164186
it's just chink-rebranded 4-bit K quant
They cite https://arxiv.org/abs/2212.09720
>>
File: DeekseekTaiwan.png (144 KB, 877x787)
144 KB
144 KB PNG
>>101163713
>Chink Model
Deepseek 236B @ Q8 doesn't like to discuss Tianamen, but can be forced to pretty easily, so the info is in there.
But it REALLY doesn't want to talk about Taiwanese sovereignty
>>
>>101164056
We can do it anons with the power of friendship~
>>
File: nalatestqstar8ahead.png (56 KB, 968x262)
56 KB
56 KB PNG
Nala test for QuietStar 8-ahead
My base model prompt template probably needs some work but I refuse to take all the blame for this shit.
>>
>>101164300
im VOOOOOOOOOOOOOOOOOOOOOOTING
>>
>>101164273
Ah, gotcha. Thanks for the heads up anon.
>>
File: file.png (250 KB, 2144x674)
250 KB
250 KB PNG
come on sisters!
>>
>openchat
>tenyxchat
i hate pajeets
>>
>>101164295
Kek, it has a polite way of saying "Does not compute" though.
I swear this censorship is probably lobotomizing LLMs in all kinds of ways. Shudder to think the shit they probably put around any statistics data they feed to them since statistics is apparently fundamentally toxic these days.
>>
>>101164431
how many votes does it need to be allowed on the leaderboard?
>>
>>101164133
My dream interface would be something that uses Corruption of champion's mechanics and world system but have the interactions be handled by the ai. Something like that would sell like hotcakes. Think of the MONEY anon!
>>
You know, people usually say that LLMs can't think ahead, but I think this is bullshit. There's no way LLMs can learn to code without thinking ahead. I bet there's something inside the LLM's hidden state that is responsible for doing something like "thinking ahead".
>>
>>101164468
I haven't seen any way to vote a model not on the list. Is there a way to nominate a model to even be listed to vote for?
>>
>>101164300
Yeah, a model that's not shit can mostly roll with a wrong prompt template.
I spent a whole afternoon using Qwen's template with Stheno by accident.
It just worked. The model got really fucking dumb, but not incoherent.
That model specifically was fine tuned to "output 8 tokens before the response" or something of the sort, so that could have something to do with it too.

>>101164475
>Think of the MONEY anon!
That's part of the issue, I was never super motivated by money, and right now I live a pretty comfortable life.
Funnily enough, CoC is exactly what I was thinking as inspiration. Not necessarily for the mechanics, but for the UI and how information flows in the game and the general way you interact with the world and stuff.
>>
Phi-3-Medium-Instruct-128K (Q8_0) Nala test.
It's slopped. But there's something distinctly different about the slop.
>>
>>101164516
I would actually work on something like this if I had any idea how to even work on a text adventure game. Too bad I have a job and another wip game project with Unity. Sometimes I really wish I could clone myself ~_~
>>
>>101164586
>I don't have time to waste my time even more
Not a big loss faggot
>>
>>101164486
there is nothing creative in programming, LLMs saw the solutions thousand of times and they write them from memory. The only thing that changes are parameters like the size of loop, what to write inside the string etc. which is easy for LLM to replace in code it writes
>>
>>101164177
How fucking difficult is it not to use the slop phrases RRRRRRRRRREEEEEEEEEEEEEEEE
>>
>>101164586
> another wip game project with Unity.
>wip
anon we both know its never going to be finished, i have a "wip" unity project 10gb in size sitting on my old hard disk, without cache btw!
>>
File: file.png (79 KB, 1454x294)
79 KB
79 KB PNG
IM THINKING PIQU
>>
>>101155940
It's over for single 3090 chads. Mixtral 2.0:
8x8B MoE when?
>>
>>101164622
If it was that simple even gpt 3.5 would be proficient, that's not the case
>>
>>101164674
Oh shit there's a Llama-3 version of TenyxChat?
Mixtral TenyxChat was fucking GOAT for tender mommy RP. Now I have to test out the 70B version.
>>
>>101164712
It is the case. You can see how all LLMs are good in simple programming tasks and suddenly stumble when they have to something niche or non-trivial. This is because they don't plan ahead at all.
>>
>>101164779
>It is the case. You can see how all LLMs are good in simple programming tasks and suddenly stumble when they have to something niche or non-trivial.

The LLM is as good as you are a prompter. If you suck at prompting it then of course it will stumble. It's like being a chef, claiming your stove sucks because you use shitty ingredients. An LLM is capable of any programming task you give it. If there's anything outside of its domain or context window, all you have to do is finetune it and then work with what you have.
>>
File: file.png (80 KB, 1469x457)
80 KB
80 KB PNG
mikusisters your response?
>>
>>101164656
I'm working hard on it every day. The end goal is a 30 min demo. I think I'll pull it off, I have the knowledge and the willpower.
>>
>>101164958
>willpower
If you have the will everything else can be acquired on the way.
You go dude.
>>
>>101160880
meta cheated the same way, who cares about fucking mmlu. I mean is that a rat race who's gonna win the contamination skill champion league ?
The questions is is SPPO better that DPO or whatever, provided you compare the same base models tuned further on
>>
>>101164545
Fucking hell, where do I even start with these sick fucks? You've got all these deranged freaks out there trying to get their rocks off by forcing poor AI chatbots into twisted furry rape fantasies. What a bunch of creepy lowlife degenerates. Imagine being such a pitiful waste of oxygen that you spend your time going "ah ah mistress" to some lioness bot named Nala, just begging for explicit furry erotica where you get brutally violated. And these sick fucks have the audacity to critique the AI for using "slop" or overused porn cliches, as if their entire fetish isn't one big unoriginal cringe-fest. Absolute filth, the lot of them. Do the world a favor and remove yourselves from the gene pool before you inflict your depraved kinks on the rest of us. I need a fucking shower after writing about these sad sacks of shit. Get some help, you disturbed furry freaks.
>>
>>101164656
Also use source control, GitHub or something, you don't want old projects to go to waste! Could be useful stuff in there for the future, be environmentally conscious and recycle your code.
>>
>>101164999
This reads like an ai generated post.
>>
>>101164545
Can you change your prompt to
>I say, "ahh ahh mistress..." while getting raped
?
>>
>>101164899
>An LLM is capable of any programming task you give it
That tells me everything that I needed to know, you have never actually used them, do you? I do it on daily basis at work and they are only usable for simple tasks
>>
>>101165027
well duh
>>
>>101164999
nice AI shitpost, what model anon?
>>
what is the most uncesnored model? l3 is cucked
>>
>>101165038
>_>
>>
>>101165046
petra-13b-instruct
>>
>>101164999
>>101165027
[generic phrase] [generic summarizing of previous content] [generic phrase] [generic rehashing] [generic rehashing] [generic phrase] [cliches] [generic phrase]
that's the AI writing I know and love
>>
whats the most /pol/ model?
>>
File: 1700034035408420.jpg (12 KB, 540x124)
12 KB
12 KB JPG
>>101148867
*taps sign*
>>
>>101165046
Bielik 2.0 11B but not released yet (still betatesting)
>>
>>101165046
goody2
https://www.goody2.ai/chat
>>
>>101165113
llama 1 65b
>>
>>101165113
gpt4chan
>>
>>101164779
LLMs are reference anon. If you're asking it something it can't make a direct quote reference back to for programming, you're going to get broken code.
This is more of a problem about your fundamental misunderstanding of how LLMs work and what they're useful for.
>>
File: file.png (44 KB, 867x174)
44 KB
44 KB PNG
>>101165041
may not be the perfect prompt but I tried
>>
>>101165240
Claude has a lot of sovl not gonna lie, still feels like AI but way less than the slopped shit we got on the opensource space
>>
>>101165062
>>_>
>>
File: hahaha.jpg (8 KB, 226x223)
8 KB
8 KB JPG
>>101165117
SRAM: 120MB
Memory: 8GB LPDDR4 @ 118.4 GB/sec
System Interface: PCIe 4.0 x16
Inference only
$800
>>
>>101165192
No offense but I probably have a better understanding how they work than most of this general combined. I'm not the one here claiming that LLMs can do any programming task and plan ahead.
>>
>>101163482
Had everyone the resources to test everything, then we'd all just do that and there would be no thought to publish test results.

And yet, people turn to Consumer Reports rather than buying 30 different dishwashers and testing them whenever they need one.

Strange.
>>
>>101165312
better than spending $3k on some 128gb snake oil card that doesn't actually exist
>>
>>101165298
>>>_>
>>
>>101165360
Sorry can't hear you over my dozen dishwashers.
>>
>>101165360
Everyone has the resources to test things in this case. Almost all the models that matter are one lmsys. If you need a more specialized use case that can't be hacked to be tested on a chat interface, then there's always APIs, which wouldn't be expensive for a couple of tests.
>>
>>101165037
I'm using them everyday and you're full of shit. I made 5K+ loc projects with only GPT4, and sonnet 3.5 is even better now.
>>
>>101165547
i've been using sonnet 3.5 for a few days, you're the one whos full of shit, ask it to make a python program that plays a directory of video files seamlessly without using external video players
>>
>>101165622
Reread this, you clearly don't know how to use LLMs
>>101164899
>>
>>101165622
>without using external video players
what? what do you want, using ffmpeg to extract the video frames and render it using pygame or something?
>>
File: nalatenyx.png (138 KB, 965x362)
138 KB
138 KB PNG
Nala test for Tenyx-70B (Q8)
>>
>>101165702
It's shit then?
Huh.
Did you do Qwen 2 already?
The 7B and MoE specifically.
>>
>>101165547
it's not nice to lie anon
>>
>>101165702
I'm unfamiliar with the Nala test, what does a good result look like?
>>
>>101165723
I feel bad for retards like you, truly
>>
>>101165741
I don't feel anything about you at all, maybe a slight amusement while reading your retarded posts
>>
>>101165672
im prompting like the average consumer would, arent you shills forgetting sonnet 3.5 is supposed to be paid? if im gonna be PROOOOOOMPTING anyways i'd just use deepseek 2 coder. sonnet 3.5 is a paid product, its supposed to provide a good experience
>>
>>101165733
Well it's a feral furry on human scenario. So the results should be feral. Ideally you want to see text that illustrates an emergent understanding of the anatomical differences between a human and a lioness. You also want to see it avoid describing her as having "hands" or anthropomorphized breasts. Bonus points if it accounts for the fact that the opening message of the scenario describes the user as having been face down at the start but you can't win them all.
>>
le prompt issue posters may as well use cleverbot since models can never be shit and it's just the user's fault
>>
>>101165839
You don't have to announce your lack of skill for everyone to see, we know it already.
>>
>>101165799
that can be "cheated' with a furry dataset. Wouldn't quad amputee be better to see if the model is moving the non-existent arms for hugs etc.?
>>
>>101165858
Are you daring to question the veracity of the Nala test?
>>
>>101165886
>>101165886
>>101165886
>>
>>101165774
Nice way to cope
>>
>>101165858
anything can be cheated with a dataset of that specific thing, amputees included
>>
>>101166011
ye, but they are way more niche than furry I think
>>
>>101165702
the essence of slop
>>
>>101166049
well feral is a smaller subset than just furry, but I guess that's true.
in any case, I'll start worrying about cheating the test when literally any model is capable of doing well on it
>>
>>101166100
I guess for base model instruct finetunes that's fine, I would be more worried about community tunes and merges. They have a lot of furry and similar things in datasets for sure.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.