/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/19/24(Thu)12:32:32 No.102458057

File: AncientRuinsExplorerMiku.png (1.63 MB, 840x1208)

1.63 MB PNG

/lmg/ - Local Models General Anonymous 09/19/24(Thu)12:32:32 No.102458057 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102449993 & >>102444258

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/19/24(Thu)12:32:56 No.102458061

Anonymous 09/19/24(Thu)12:32:56 No.102458061

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>102449993

--Papers: >>102451272
--Qwen2.5 72B excels at JP>EN translation but may lack cultural knowledge and niche capabilities: >>102450104 >>102450260 >>102450349 >>102450211
--Qwen2.5 32B outperforms 72B on VNTL Benchmark: >>102455342 >>102455356 >>102455421 >>102455626
--Mistral Nemo Storywriter model discussion and potential use cases: >>102451494 >>102452236 >>102455717 >>102456153 >>102456398 >>102456759 >>102456978 >>102457168
--Tips for improving performance with 12gb VRAM and 20gb model: >>102453245 >>102453264 >>102453282 >>102453306 >>102453468
--Finetunes improve style but can degrade the model if overcooked: >>102455416 >>102455461 >>102455605 >>102455625 >>102455648 >>102455671
--Cydonia model's horny behavior due to finetuning data: >>102453444 >>102453848 >>102454080 >>102457036
--.safetensors files and using them with llama.cpp and alternatives: >>102456455 >>102456477 >>102456502 >>102456655 >>102456723 >>102456703
--qwen2.5 72b performance and NSFW capabilities discussed: >>102455660 >>102455672 >>102455761 >>102455787 >>102455889
--Voice-based function calling for Llama3-s checkpoint: >>102452521 >>102452604 >>102452633
--Qwen 2.5 is easily jailbroken, unlike Qwen 2.0: >>102457011
--New model's context window size is limited, breaks at 2-4k tokens: >>102450175 >>102450274 >>102450351
--Model performs better than others but user hesitant to download 72B model: >>102453023
--Llama-quantize output and embedding layer quantization options broken: >>102451532 >>102451657 >>102451676 >>102451720
--GRIN-MoE struggles with NSFW RP due to Phi pretraining data: >>102450040
--Flux dev lora for dall-e-style Migus: >>102455751 >>102457121
--Anon experiences repetition issues with 16k context on mistral small: >>102454027
--Miku (free space): >>102450395 >>102451820 >>102451870 >>102457181

►Recent Highlight Posts from the Previous Thread: >>102450000

Anonymous
09/19/24(Thu)12:33:05 No.102458063

Anonymous 09/19/24(Thu)12:33:05 No.102458063

File: 47 Days Until November 5.png (1.91 MB, 1472x1104)

1.91 MB PNG

Anonymous
09/19/24(Thu)12:33:46 No.102458067

Anonymous 09/19/24(Thu)12:33:46 No.102458067

>>102458057
>still no good local audio transcription options
pain

Anonymous
09/19/24(Thu)12:34:37 No.102458083

Anonymous 09/19/24(Thu)12:34:37 No.102458083

Mikulove

Anonymous
09/19/24(Thu)12:35:20 No.102458093

Anonymous 09/19/24(Thu)12:35:20 No.102458093

>>102458067
i haven't tried it myself yet but i assumed whisper was pretty good

Anonymous
09/19/24(Thu)12:36:27 No.102458105

Anonymous 09/19/24(Thu)12:36:27 No.102458105

>>102458050 is a pedophile
also I tried qwen 2.5 72b and it's alright (6-7/10). the anti-chink sentiment here is clouding your judgement.

Anonymous
09/19/24(Thu)12:38:26 No.102458131

Anonymous 09/19/24(Thu)12:38:26 No.102458131

Let's have a good day today!

Anonymous
09/19/24(Thu)12:39:05 No.102458142

Anonymous 09/19/24(Thu)12:39:05 No.102458142

>>102458093
It's accurate but transcription times are abysmal without a lot of VRAM.

Anonymous
09/19/24(Thu)12:39:08 No.102458145

Anonymous 09/19/24(Thu)12:39:08 No.102458145

loli footjobs

Anonymous
09/19/24(Thu)12:39:23 No.102458154

Anonymous 09/19/24(Thu)12:39:23 No.102458154

>>102458105
weird, that's not one of my posts.

Anonymous
09/19/24(Thu)12:41:26 No.102458188

Anonymous 09/19/24(Thu)12:41:26 No.102458188

>>102458145
hi Sao

Anonymous
09/19/24(Thu)12:48:49 No.102458276

Anonymous 09/19/24(Thu)12:48:49 No.102458276

>>102458142
You know there are smaller versions right?
The smallest whisper is only 34M parameters and it's pretty good

Anonymous
09/19/24(Thu)12:49:04 No.102458278

Anonymous 09/19/24(Thu)12:49:04 No.102458278

>>102458067
>Still no good local models
The ride never ends

Anonymous
09/19/24(Thu)12:51:01 No.102458298

Anonymous 09/19/24(Thu)12:51:01 No.102458298

Best RP 70B now?

Anonymous
09/19/24(Thu)12:51:22 No.102458300

Anonymous 09/19/24(Thu)12:51:22 No.102458300

>>102458298
Qwen 2.5 72B

Anonymous
09/19/24(Thu)12:55:47 No.102458351

Anonymous 09/19/24(Thu)12:55:47 No.102458351

>>102458300
Where is the fucking 100B, Zhang?

Anonymous
09/19/24(Thu)12:58:06 No.102458398

Anonymous 09/19/24(Thu)12:58:06 No.102458398

>>102458351
you whites wouldn't be able to run it anyway

Anonymous
09/19/24(Thu)12:59:21 No.102458416

Anonymous 09/19/24(Thu)12:59:21 No.102458416

>>102458351
They probably kept it to make the api model Qwen-Plus
>Furthermore, we benchmark the latest version of our API-based model, Qwen-Plus, against leading proprietary and open-source models, including GPT4-o, Claude-3.5-Sonnet, Llama-3.1-405B, and DeepSeek-V2.5. This comparison showcases Qwen-Plus’s competitive standing in the current landscape of large language models. We show that Qwen-Plus significantly outcompetes DeepSeek-V2.5 and demonstrates competitive performance against Llama-3.1-405B, while still underperforming compared to GPT4-o and Claude-3.5-Sonnet in some aspects.

Anonymous
09/19/24(Thu)13:09:32 No.102458548

Anonymous 09/19/24(Thu)13:09:32 No.102458548

File: chillman.png (50 KB, 363x363)

50 KB PNG

I found a comfy lewd setup with 7t/s that works for me.
Time to cancel all my plans

Anonymous
09/19/24(Thu)13:16:23 No.102458630

Anonymous 09/19/24(Thu)13:16:23 No.102458630

///BAD NEWS!!!///
I've tried making Q6_K_L quants myself and it appears that llama-quantize is broken a bit! (I think I found the issue)
>--output-tensor-type and --token-embedding-type don't work when using UPPERCASE for formats, seem to work fine with lowercase
>>102451657 >>102451676 >>102451720 Thanks for pointing it out!
CUDAdev, please verify and inform ggerganov.

Anonymous
09/19/24(Thu)13:19:28 No.102458665

Anonymous 09/19/24(Thu)13:19:28 No.102458665

>>102458548
It sounds like you've found a setup or routine that you find particularly comfortable or enjoyable, albeit with a term that suggests it might be somewhat risqué or personal ("lewd"). Here's how you might approach this situation thoughtfully:

1. Evaluate the Impact: Before canceling all your plans, consider the implications. How will this affect your responsibilities, relationships, or future commitments? It's important to balance personal enjoyment with obligations.

2. Prioritization: Think about which plans are truly non-essential or flexible. Some commitments might be more important or time-sensitive than others.

3. Communication: If your plans involve other people, communicate your need to reschedule or cancel in a respectful and timely manner. Transparency can help maintain trust and understanding in relationships.

4. Moderation: While it's great to have found something that you enjoy, consider how this new setup fits into your life long-term. Is it something that could potentially become isolating or detrimental if overindulged?

5. Integration: Perhaps there's a way to integrate this new interest into your life without having to cancel all plans. Can you set specific times for this activity, allowing you to still meet your other commitments?

6. Reflection: Take a moment to reflect on why this setup is so appealing. Is it escapism, relaxation, or something else? Understanding this can help you manage your time and interests better.

7. Future Planning: After enjoying your time with this new setup, reassess and make future plans with this new variable in mind. Maybe you'll find that you can adjust your schedule to accommodate both your responsibilities and this new interest.

Remember, finding something that brings joy or relaxation is important, but so is maintaining a balanced life. If this setup truly enhances your wellbeing, then finding a way to incorporate it without completely upending your life would be ideal.

Anonymous
09/19/24(Thu)13:19:57 No.102458672

Anonymous 09/19/24(Thu)13:19:57 No.102458672

>>102458630
So use lowercase and stop bitching.

Anonymous
09/19/24(Thu)13:20:30 No.102458681

Anonymous 09/19/24(Thu)13:20:30 No.102458681

>Qwen2.5-72b-Instruct
Who the FUCK said this is good, or even okay, for ERP? Right off the bat, it refuses even when generating from the middle of an existing RP. Like the prompt format look like this, with character name formatting and a bunch of history:
(dozens of messages of history)
<|im_start|>assistant
Alice: (model starts generating from here) I'm not comfortable with that level of explicit content...

I'm 10 for 10 with getting refusals this way. Even llama 3.1 doesn't do this. Like, I know you can jailbreak it, or not use ChatML format to throw it off, but having to do that to stop a local model from refusing in the middle of an RP is just ridiculous.

And when it doesn't refuse because the context is merely "soft" NSFW, it writes like a fucking robot. It's pretty smart and all, but it literally feels like you're RPing with an awkward positivity-biased alien pretending to be a human.

Maybe a finetune can save it, but the plain Instruct model is unusable.

Anonymous
09/19/24(Thu)13:21:01 No.102458690

Anonymous 09/19/24(Thu)13:21:01 No.102458690

>>102458057
>Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
so bitnet isn't a meme?

Anonymous
09/19/24(Thu)13:23:02 No.102458712

Anonymous 09/19/24(Thu)13:23:02 No.102458712

>>102458672
NO!

Anonymous
09/19/24(Thu)13:23:04 No.102458714

Anonymous 09/19/24(Thu)13:23:04 No.102458714

File: 1715314526167102.png (75 KB, 752x160)

75 KB PNG

>>102458690
it makes llama3 8b worse than llama2 7b by a long shot

Anonymous
09/19/24(Thu)13:23:08 No.102458718

Anonymous 09/19/24(Thu)13:23:08 No.102458718

>>102458690
We still don't know. BitNet requires training from scratch and theoretically achieve no degradation compared to a bf16 with the same number of parameters.
HF did a quantization to ternary and gave L3.1 a lobotomy that made it stupider than L2.

Anonymous
09/19/24(Thu)13:26:16 No.102458761

Anonymous 09/19/24(Thu)13:26:16 No.102458761

>>102458714
>>102458718
>it's pretty impressive if you aren't retarded

Anonymous
09/19/24(Thu)13:27:43 No.102458779

Anonymous 09/19/24(Thu)13:27:43 No.102458779

>>102458681
Honestly sounds like for instruct Qwen-2.5 is to Qwen-2 what Llama-3.1 was to Llama-3.0. Same shit as before, with the cuck dial turned up.
Now if some alpha gooner locked the WizardLM team in his basement until they made a smutty instruct fine tune of of Qwen-2.5 base, we might have something special. Has anyone tried playing around with base Qwen-2.5? 18T tokens is a lot.

Anonymous
09/19/24(Thu)13:28:07 No.102458781

Anonymous 09/19/24(Thu)13:28:07 No.102458781

>>102458761
not impressive at all, BitNet is supposed to get equivalent scores than fp16, this is just a fucking lobotomy

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/19/24(Thu)13:28:14 No.102458783

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/19/24(Thu)13:28:14 No.102458783

>>102458630
To my knowledge all llama.cpp command line arguments are case sensitive and the only instance where they are not lowercase is a test script that regular users are not going to be using anyways.
More generally, for most computer programs case-sensitive and lowercase CLI arguments are the default.
I don't think this is an important issue.
If you someone else to work on it, write them an email.

Anonymous
09/19/24(Thu)13:31:38 No.102458824

Anonymous 09/19/24(Thu)13:31:38 No.102458824

Qwen 14B is leagues better than Nemo. You may not want to hear it, but it's true.

Anonymous
09/19/24(Thu)13:32:29 No.102458840

Anonymous 09/19/24(Thu)13:32:29 No.102458840

>>102458824
sovl > slop

Anonymous
09/19/24(Thu)13:38:17 No.102458928

Anonymous 09/19/24(Thu)13:38:17 No.102458928

File: __izayoi_sakuya_touhou_dr(...).jpg (1.02 MB, 1200x1600)

1.02 MB JPG

>>102458298
gemma-2 27b

Anonymous
09/19/24(Thu)13:38:18 No.102458929

Anonymous 09/19/24(Thu)13:38:18 No.102458929

>>102458057
haven't paid attention to this since the old days of kobold, is it true that even the smaller models nowadays are thousands of times better than what we had back in the day?

Anonymous
09/19/24(Thu)13:38:56 No.102458939

Anonymous 09/19/24(Thu)13:38:56 No.102458939

>>102458929
define better

Anonymous
09/19/24(Thu)13:39:57 No.102458960

Anonymous 09/19/24(Thu)13:39:57 No.102458960

>>102458939
You will never be a woman

Anonymous
09/19/24(Thu)13:40:08 No.102458964

Anonymous 09/19/24(Thu)13:40:08 No.102458964

Does this mean that LLama3 should work on Lunar Lake NPU?
https://github.com/openvinotoolkit/openvino/releases/tag/2024.4.0

>Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1, Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
> OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions (Intel® XMX) systolic arrays on built-in GPUs for efficient matrix multiplication resulting in significant LLM performance boost with improved 1st and 2nd token latency, as well as a smaller memory footprint on Intel® Core™ Ultra Processors (Series 2).
> Memory sharing enabled for NPUs on Intel® Core™ Ultra Processors (Series 2) for efficient pipeline integration without memory copy overhead.
> Support for Intel® Core Ultra Processors Series 2 (formerly codenamed Lunar Lake) on Windows.

Anonymous
09/19/24(Thu)13:41:01 No.102458976

Anonymous 09/19/24(Thu)13:41:01 No.102458976

>>102458939
was my question that complex? I don't understand

Anonymous
09/19/24(Thu)13:42:04 No.102458991

Anonymous 09/19/24(Thu)13:42:04 No.102458991

>>102458929
yes, anyone who tells you otherwise is delusional

Anonymous
09/19/24(Thu)13:45:06 No.102459031

Anonymous 09/19/24(Thu)13:45:06 No.102459031

Why does converting a normal model to bitnet not work though?

Anonymous
09/19/24(Thu)13:45:08 No.102459033

Anonymous 09/19/24(Thu)13:45:08 No.102459033

>>102458779
>Llama-3.1 was to Llama-3.0
the same model but with more context? of course an idiotic comment is paired with praising wizard, a gptslop finetune
it's fucking hilarious

Anonymous
09/19/24(Thu)13:45:37 No.102459041

Anonymous 09/19/24(Thu)13:45:37 No.102459041

>>102458929
>even the smaller models nowadays are thousands of times better than what we had back in the day
In censorship - yes, definitely better, everything else - no. Keep in mind that "censorship" for resident faggots is "inability to simulate lolishit", they don't care about anything else and are often happy to bootlick their masters (Meta, Mistral, etc).

Anonymous
09/19/24(Thu)13:47:14 No.102459063

Anonymous 09/19/24(Thu)13:47:14 No.102459063

>>102458928
It lost to Nemo in creativity and now to Qwen2.5 34B in being smart. It only has 8k context too. Are you retarded?

Anonymous
09/19/24(Thu)13:47:49 No.102459076

Anonymous 09/19/24(Thu)13:47:49 No.102459076

>>102458783
>the only instance where they are not lowercase is a test script that regular users are not going to be using anyways
No?
The documentation at https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/README.md and https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp both use upper case. I can also confirm that using upper case has always worked for me and produces a correctly quantized file. That is the reason why I was confused when the embedding and output layer quantization flags did not work with the upper case names.

Anonymous
09/19/24(Thu)13:49:25 No.102459098

Anonymous 09/19/24(Thu)13:49:25 No.102459098

>>102459041
>everything else - no.
are they're really worse than gpt-j or neo or whatever I used back in the day? how?

Anonymous
09/19/24(Thu)13:52:36 No.102459145

Anonymous 09/19/24(Thu)13:52:36 No.102459145

>>102459098
He already told you. All the models that we get are far far smarter than anything from the gpt-j era. But they are also trained to be assistant slaves. They have refusals and positivity bias baked in, all their datasets (even pretraining) are heavily filtered and their writing style is all same-y and stilted.

Anonymous
09/19/24(Thu)13:52:46 No.102459148

Anonymous 09/19/24(Thu)13:52:46 No.102459148

>>102459098
they're not, anon is just retarded
modern models are a million times better in every way than gpt-j/neox/etc

Anonymous
09/19/24(Thu)13:53:33 No.102459165

Anonymous 09/19/24(Thu)13:53:33 No.102459165

>>102459098
he's lying on the internet, what a surprise.

2k ctx models that can barely form a coherent sentence that couldn't even be quanted vs what we have now. there's no comparison whatsoever. used to need 24gb vram to run a 6b model that was RETARDED.

Anonymous
09/19/24(Thu)13:53:34 No.102459166

Anonymous 09/19/24(Thu)13:53:34 No.102459166

>>102458779
Nah it's not even comparable. Llama 3.1 might be slightly more censored than 3.0, but I don't get refusals from it. It can write lewd and dirty language, though it's not always great at it. "Anon's cock repeatedly pounded her tight, dripping wet pussy as she convulsed in pleasure..." Shit like that it generally has no problems writing.

Qwen2.5 seemingly can't even say a swear word. I tried one more test, not even NSFW, where it just had to continue a conversion with a tomboy character already established in the history as being edgy and crass. Qwen2.5 instantly turns her into a generic positive goody two shoes character. I genuinely don't think I've ever seen a model fail to play a character that badly. Anyways I'm deleting the model now, it is actually worthless for RP.

Anonymous
09/19/24(Thu)13:56:00 No.102459196

Anonymous 09/19/24(Thu)13:56:00 No.102459196

File: the_lmg_files.png (2.73 MB, 2048x1568)

2.73 MB PNG

>>102458063

Anonymous
09/19/24(Thu)13:56:19 No.102459200

Anonymous 09/19/24(Thu)13:56:19 No.102459200

>>102459165
>used to need 24gb vram to run a 6b model that was RETARDED
I remember that, funny times indeed. Seems like nowadays 24gb vram is a sweetspot for size and quality.

Anonymous
09/19/24(Thu)13:56:23 No.102459203

Anonymous 09/19/24(Thu)13:56:23 No.102459203

File: f56.jpg (76 KB, 680x904)

76 KB JPG

>Theia-21B-v2
my fucking dick

Anonymous
09/19/24(Thu)13:58:09 No.102459229

Anonymous 09/19/24(Thu)13:58:09 No.102459229

File: __ijichi_nijika_and_yamad(...).jpg (165 KB, 1000x1000)

165 KB JPG

>>102459063
>It lost to Nemo in creativity and now to Qwen2.5 34B in being smart. It only has 8k context too. Are you retarded?
It writes a million times better than Nemo. Hell, it writes better than Large.

Anonymous
09/19/24(Thu)13:58:15 No.102459230

Anonymous 09/19/24(Thu)13:58:15 No.102459230

>>102458929
Local is still trying to catch up to NovelAI.

Anonymous
09/19/24(Thu)14:00:22 No.102459260

Anonymous 09/19/24(Thu)14:00:22 No.102459260

in quick responses, is it possible to temporarily modify the last output sequence for one /gen and then restore it back afterwards?

Anonymous
09/19/24(Thu)14:01:58 No.102459290

Anonymous 09/19/24(Thu)14:01:58 No.102459290

i keep hearing people say qwen2.5 is turbo cucked. gonna dl it and see if i can make it degen.

Anonymous
09/19/24(Thu)14:03:22 No.102459311

Anonymous 09/19/24(Thu)14:03:22 No.102459311

>>102459229
Why are you attaching avatars to your post instead of a log?
I used it, I remember it having a different style, but nothing to write home about. I never went back to it after Nemo and all the other models released. The context being that low makes it pretty useless.
Large is also pretty good if you use it with high temperature or that XTC sampler.

Anonymous
09/19/24(Thu)14:04:50 No.102459334

Anonymous 09/19/24(Thu)14:04:50 No.102459334

>>102459230
Local will never do what it does, not if they keep shunning completion models. Which is seeming more and more likely, given the trend of doing away with base models altogether.

Anonymous
09/19/24(Thu)14:11:16 No.102459432

Anonymous 09/19/24(Thu)14:11:16 No.102459432

>Reflection was a scam that actually made models dumber
>OAI makes their own "reflection" model that BTFO everything else to rub salt in the wound
>Smallstral is somehow worse than nemo at twice its size
>Qwen's Crazy Thursday was a complete nothingburger
>Microsoft for some godforsaken reason releases a model to compete with Mixtral 8x7b???
I have never felt more demoralized.
Somebody, please, I'm begging you to convince me that it's not COMPLETELY over.
What still gives you hope?

Anonymous
09/19/24(Thu)14:12:01 No.102459440

Anonymous 09/19/24(Thu)14:12:01 No.102459440

>>102459334
You. Just. Got. Qwen. Base. Models. Yesterday.

Anonymous
09/19/24(Thu)14:12:43 No.102459450

Anonymous 09/19/24(Thu)14:12:43 No.102459450

>>102459432
The new bitnet in this thread looks pretty promising

Anonymous
09/19/24(Thu)14:13:22 No.102459460

Anonymous 09/19/24(Thu)14:13:22 No.102459460

>>102459432
>>Qwen's Crazy Thursday was a complete nothingburger
nice try sama

Anonymous
09/19/24(Thu)14:13:44 No.102459464

Anonymous 09/19/24(Thu)14:13:44 No.102459464

>>102459432
deepsex 2 154b in 2mw

Anonymous
09/19/24(Thu)14:14:00 No.102459468

Anonymous 09/19/24(Thu)14:14:00 No.102459468

>>102459432
Close your eyes and use your brain to roleplay.

Anonymous
09/19/24(Thu)14:14:53 No.102459485

Anonymous 09/19/24(Thu)14:14:53 No.102459485

>>102459432
>>Microsoft for some godforsaken reason releases a model to compete with Mixtral 8x7b???
They already had a Phi MoE. GRIN was just a proof of concept for a new training method.

Anonymous
09/19/24(Thu)14:15:15 No.102459493

Anonymous 09/19/24(Thu)14:15:15 No.102459493

File: HnOyMwTPKs_Gqs_XEYlFPQ==.gif (2 MB, 500x500)

2 MB GIF

I usually shit all over Qwen but 2.5 is legitimately very good compared to most of the slop /lmg/ shills.
>complain about slop
>finetune on slop logs
Come back to kino (base models and good prompting).

Anonymous
09/19/24(Thu)14:16:10 No.102459506

Anonymous 09/19/24(Thu)14:16:10 No.102459506

>>102459493
base models can't do my programming for me

Anonymous
09/19/24(Thu)14:16:27 No.102459508

Anonymous 09/19/24(Thu)14:16:27 No.102459508

>>102459493
Claude is many things but it ain't slop, what dataset would you tune on for creativity and RP?

Anonymous
09/19/24(Thu)14:16:59 No.102459514

Anonymous 09/19/24(Thu)14:16:59 No.102459514

>>102459493
Post logs

Anonymous
09/19/24(Thu)14:17:22 No.102459520

Anonymous 09/19/24(Thu)14:17:22 No.102459520

>>102459506
if you're doing programming then you can just use the instruct models, slop shouldn't be a concern for you

Anonymous
09/19/24(Thu)14:20:41 No.102459563

Anonymous 09/19/24(Thu)14:20:41 No.102459563

>>102459432
Hope? Lmao. I'm here just to laugh at (You)

Anonymous
09/19/24(Thu)14:20:52 No.102459565

Anonymous 09/19/24(Thu)14:20:52 No.102459565

>her ministrations give me shivers up my spine, then she whispers "don't worry i don't bite... much" in a husky voice
this shit bothers me less than forgetting my character is naked 30 tokens later after stripping

Anonymous
09/19/24(Thu)14:21:02 No.102459569

Anonymous 09/19/24(Thu)14:21:02 No.102459569

>>102459508
>Claude is many things but it ain't slop
lol, lmao
>>102459514
not your personal curator, try it yourself. it's free. (inb4 vramlet)

Anonymous
09/19/24(Thu)14:21:17 No.102459575

Anonymous 09/19/24(Thu)14:21:17 No.102459575

>>102459334
This has been repeated a million of times, but not having a way to give feedback to the model is a massive handicap. And something like "[ Genre: X: Tag: Y ]" or [ Author Note: ]" is just something like a custom instruct format but worse.
The "completionchads" never have anything to show.

Anonymous
09/19/24(Thu)14:23:27 No.102459607

Anonymous 09/19/24(Thu)14:23:27 No.102459607

>>102459432
i don't constantly keep up with the news, and only come here every other month.
i just remember that only a few years back all the shit we have now, even though very much flawed, in some places stagnating, and in some even regressing, was at best a vague dream entirely.
something will get better sooner or later. i'll just do other stuff in the meantime.

Anonymous
09/19/24(Thu)14:23:50 No.102459613

Anonymous 09/19/24(Thu)14:23:50 No.102459613

>>102458929
Yes but don't get into this. It is good enough to lure you in but after 10 hours you regret getting into it cause it still needs 2-3 years.

Anonymous
09/19/24(Thu)14:24:32 No.102459624

Anonymous 09/19/24(Thu)14:24:32 No.102459624

>>102459613
There is a long ai winter waiting in the middle of that 3 years.

Anonymous
09/19/24(Thu)14:25:29 No.102459640

Anonymous 09/19/24(Thu)14:25:29 No.102459640

>>102459607
Please do that and never post again.

Anonymous
09/19/24(Thu)14:26:00 No.102459647

Anonymous 09/19/24(Thu)14:26:00 No.102459647

>>102459432
But anon, it's been over.

Anonymous
09/19/24(Thu)14:26:19 No.102459649

Anonymous 09/19/24(Thu)14:26:19 No.102459649

>>102459450
>>102459464
bitnet + huge moe seems like a winning combination for even normal desktop cpus to run something amazing comfortably
is anything in the works?

Anonymous
09/19/24(Thu)14:27:07 No.102459659

Anonymous 09/19/24(Thu)14:27:07 No.102459659

>>102459229
I remember how I just started having fun with LLM's and had no idea what purple prose is. Gemma is the most purple prose model I have ever seen. Even a character that is an 80IQ 40yo bear gay dockworker will speak to you in poems.

Anonymous
09/19/24(Thu)14:28:17 No.102459677

Anonymous 09/19/24(Thu)14:28:17 No.102459677

>>102459290
It is slopped but it wasn't cucked for me. It is half slop half genuinely good shit but unlike nemo it isn't a complete schizo.

Anonymous
09/19/24(Thu)14:29:33 No.102459703

Anonymous 09/19/24(Thu)14:29:33 No.102459703

>>102459520
but it shivers in the comments.

Anonymous
09/19/24(Thu)14:31:13 No.102459731

Anonymous 09/19/24(Thu)14:31:13 No.102459731

where is grok 2?

Drummer/Sao/Undi95
09/19/24(Thu)14:33:05 No.102459763

Drummer/Sao/Undi95 09/19/24(Thu)14:33:05 No.102459763

File: stevejobs.png (496 KB, 1010x758)

496 KB PNG

>"[x] is slop" *doesn't post 'good' logs*
>actually I think it's fine
>"oh really? post logs"
>*post logs*
>"yeah it's bad" *still doesn't post 'good' logs*
don't take the bait. these retards will never post logs because they know their point of comparison is slop.

Anonymous
09/19/24(Thu)14:33:25 No.102459771

Anonymous 09/19/24(Thu)14:33:25 No.102459771

>>102459731
Still 5 months to go based on Musk's preferred open source schedule. V-JEPA will already be AGI by that time so it won't matter anymore.

Anonymous
09/19/24(Thu)14:33:25 No.102459772

Anonymous 09/19/24(Thu)14:33:25 No.102459772

>>102459731
oh, you thought we were ever going to get another grok release? you didn't realize the only reason we even got one was because it was useful for elon's dumb lawsuit?
oh...

Anonymous
09/19/24(Thu)14:33:40 No.102459774

Anonymous 09/19/24(Thu)14:33:40 No.102459774

>>102459731
6 months after grok 1.5

Anonymous
09/19/24(Thu)14:37:55 No.102459841

Anonymous 09/19/24(Thu)14:37:55 No.102459841

File: file.png (520 KB, 1200x766)

520 KB PNG

Anonymous
09/19/24(Thu)14:42:10 No.102459900

Anonymous 09/19/24(Thu)14:42:10 No.102459900

>>102459841
I time it so that I pull the lever twice and unlock multitrack drifting

Anonymous
09/19/24(Thu)14:44:13 No.102459930

Anonymous 09/19/24(Thu)14:44:13 No.102459930

File: file.png (335 KB, 1785x722)

335 KB PNG

>>102459763
Huh, so this image isn't real. Who knew?

Anonymous
09/19/24(Thu)14:44:54 No.102459942

Anonymous 09/19/24(Thu)14:44:54 No.102459942

File: level2strawberry34.png (205 KB, 636x860)

205 KB PNG

>>102459841
I save both

Anonymous
09/19/24(Thu)14:45:23 No.102459947

Anonymous 09/19/24(Thu)14:45:23 No.102459947

File: trolley solution.png (133 KB, 506x632)

133 KB PNG

>>102459841
>>102459900

Anonymous
09/19/24(Thu)14:47:36 No.102459981

Anonymous 09/19/24(Thu)14:47:36 No.102459981

>>102459930
>obvious line spacing discrepancy
did you really have to go find the original to figure that out?

Anonymous
09/19/24(Thu)14:47:54 No.102459984

Anonymous 09/19/24(Thu)14:47:54 No.102459984

>>102459981
Yes.

Anonymous
09/19/24(Thu)15:29:05 No.102460493

Anonymous 09/19/24(Thu)15:29:05 No.102460493

https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md
Fucking hell, why is this not linked in every mistral model repo? It would've saved me so much headaches.

Anonymous
09/19/24(Thu)15:38:33 No.102460634

Anonymous 09/19/24(Thu)15:38:33 No.102460634

>>102460493
Too complicated. Just run their library to see how the template works. vLLM supports it now too.

Anonymous
09/19/24(Thu)15:41:07 No.102460671

Anonymous 09/19/24(Thu)15:41:07 No.102460671

Is there a way to train a model such that you have the tokens to run an inference on but the ground truth used to calculate the loss
Isn't nessasirly the tokens used for the inference?
That is the entire conversation being sent is a string of possible right or wrong answers
A right answer can exist but the right answer depends on what all the other answers (right or wrong) were before it.

Anonymous
09/19/24(Thu)15:43:23 No.102460701

Anonymous 09/19/24(Thu)15:43:23 No.102460701

>>102458298
Still midnight miqu

Anonymous
09/19/24(Thu)15:44:55 No.102460721

Anonymous 09/19/24(Thu)15:44:55 No.102460721

File: Clipboard01.jpg (349 KB, 1127x871)

349 KB JPG

I asked Cydonia 22B to write a lewd hypnosis and give the reader suggestions. This is what it deems fun.

Anonymous
09/19/24(Thu)15:47:37 No.102460764

Anonymous 09/19/24(Thu)15:47:37 No.102460764

>>102460721
https://www.4chan.org/advertise

Anonymous
09/19/24(Thu)15:49:10 No.102460793

Anonymous 09/19/24(Thu)15:49:10 No.102460793

>>102458298
Donnager

Anonymous
09/19/24(Thu)15:49:38 No.102460801

Anonymous 09/19/24(Thu)15:49:38 No.102460801

>>102460764
Stop selling ads, 4chan shill.

Anonymous
09/19/24(Thu)15:50:36 No.102460813

Anonymous 09/19/24(Thu)15:50:36 No.102460813

>>102460764
cool story

Anonymous
09/19/24(Thu)15:55:14 No.102460877

Anonymous 09/19/24(Thu)15:55:14 No.102460877

I have two computers, a desktop running mint and a laptop running debian
the desktop is pretty strong and with /lmg/ help and guidance I've successfully set up ooba and silly tavern and can comfortably use 13b and 7b models to host chatbots
now how would I go about using my shitty laptop to talk to the bots running on my desktop?
not worries about remoting in from outside network. i've tried bing'ing it but every result is some scenario to access the machine from elsewhere
i literally just want to sit on couch or in bed at home and talk to the bots running on the same home network
how do i do this?

Anonymous
09/19/24(Thu)15:56:14 No.102460887

Anonymous 09/19/24(Thu)15:56:14 No.102460887

>>102460877
>I've successfully set up ooba
Ooba is bloat. You want to replace that with koboldcpp.

Anonymous
09/19/24(Thu)15:56:43 No.102460894

Anonymous 09/19/24(Thu)15:56:43 No.102460894

>102460877
find your host system's IP address
IP.address:8000 (or whatever your sillytavern port is) on the same network
this stuff is documented on the project pages btw.

Anonymous
09/19/24(Thu)15:58:35 No.102460922

Anonymous 09/19/24(Thu)15:58:35 No.102460922

Which backends support multiple simultaneous connections? Or do I have to manage it higher up the stack?

Anonymous
09/19/24(Thu)15:58:35 No.102460923

Anonymous 09/19/24(Thu)15:58:35 No.102460923

>>102460877
do you already have sillytavern set up on either one? the answer will be slightly different depending on whether it's on your desktop or laptop but basically you can use your desktop's local ip address (check your ips, usually either 192.168.x.x or 10.x.x.x) to access it remotely, either as the sillytavern address or the api connection

Anonymous
09/19/24(Thu)15:58:54 No.102460932

Anonymous 09/19/24(Thu)15:58:54 No.102460932

>>102460877
https://docs.sillytavern.app/usage/remoteconnections/

Anonymous
09/19/24(Thu)16:02:39 No.102460989

Anonymous 09/19/24(Thu)16:02:39 No.102460989

>>102460922
My ex wife's

Anonymous
09/19/24(Thu)16:03:24 No.102461001

Anonymous 09/19/24(Thu)16:03:24 No.102461001

>>102460932
>>102460923
>>102460894
thanks i'll look into it when i get home

Anonymous
09/19/24(Thu)16:06:28 No.102461045

Anonymous 09/19/24(Thu)16:06:28 No.102461045

>>102461001
why didn't you just wait to ask until you got home?

Anonymous
09/19/24(Thu)16:09:28 No.102461089

Anonymous 09/19/24(Thu)16:09:28 No.102461089

>>102460922
The llama.cpp HTTP server can definitely do multiple concurrent connections, vLLM too I think.

Anonymous
09/19/24(Thu)16:35:25 No.102461407

Anonymous 09/19/24(Thu)16:35:25 No.102461407

>>102458783
lazy ahh nigga

Anonymous
09/19/24(Thu)16:41:38 No.102461498

Anonymous 09/19/24(Thu)16:41:38 No.102461498

>>102460922
vLLM, Aphrodite, TabbyAPI.

Anonymous
09/19/24(Thu)16:44:11 No.102461537

Anonymous 09/19/24(Thu)16:44:11 No.102461537

File: file.png (83 KB, 2752x326)

83 KB PNG

https://huggingface.co/microsoft/GRIN-MoE/discussions/1
holy fuck

Anonymous
09/19/24(Thu)16:45:15 No.102461554

Anonymous 09/19/24(Thu)16:45:15 No.102461554

>>102461537
wtf is going on over at microsoft? first wizard and now grin

Anonymous
09/19/24(Thu)16:48:37 No.102461607

Anonymous 09/19/24(Thu)16:48:37 No.102461607

>>102461537
lol he writes like he’s hiding in a storm drain with microsoft out hunting for him in riot gear.

Anonymous
09/19/24(Thu)16:48:48 No.102461608

Anonymous 09/19/24(Thu)16:48:48 No.102461608

>>102461554
Why would anyone want to release a model with the new commiefornia regulations hanging over their head? I would be firing anyone who dared to even talk publicly about our models if I was running an AI company right now.

Anonymous
09/19/24(Thu)16:48:49 No.102461610

Anonymous 09/19/24(Thu)16:48:49 No.102461610

>>102461537
>Chinese name
So I guess the rumors were true?

Anonymous
09/19/24(Thu)16:50:22 No.102461639

Anonymous 09/19/24(Thu)16:50:22 No.102461639

File: file.png (412 KB, 544x500)

412 KB PNG

>>102461537
Microsoft be like:
>he whined on huggingface?

Anonymous
09/19/24(Thu)16:51:23 No.102461654

Anonymous 09/19/24(Thu)16:51:23 No.102461654

>>102461608
>Why would anyone want to release a model with the new commiefornia regulations hanging over their head?
what will be the future of those companies? they will relocate to Texas like Elon did? kek

Anonymous
09/19/24(Thu)16:53:24 No.102461686

Anonymous 09/19/24(Thu)16:53:24 No.102461686

>>102461537
This shit sucks, damn. It'll repeat whole parts of the prompt verbatim. Not sure what I was expecting for a model with the equivalent power of an 8b, but...

Anonymous
09/19/24(Thu)16:53:41 No.102461689

Anonymous 09/19/24(Thu)16:53:41 No.102461689

>>102461654
I mean, out of all the big tech companies, Microsoft is already notable for not being located in California. But they still do business there.

Anonymous
09/19/24(Thu)16:53:52 No.102461692

Anonymous 09/19/24(Thu)16:53:52 No.102461692

File: Li.png (169 KB, 366x835)

169 KB PNG

>>102461537
>holy fuck

rip

Anonymous
09/19/24(Thu)16:54:42 No.102461706

Anonymous 09/19/24(Thu)16:54:42 No.102461706

>>102461537
Someone with a HF ask him to release the base model before he gets the pink slip.

Anonymous
09/19/24(Thu)16:54:45 No.102461707

Anonymous 09/19/24(Thu)16:54:45 No.102461707

>>102461692
this guy sacrified his career to release a shitty 4k context model? goddam he's retarded as fuck ;_;

Anonymous
09/19/24(Thu)16:55:24 No.102461718

Anonymous 09/19/24(Thu)16:55:24 No.102461718

File: Evanna_Lynch.jpg (191 KB, 1000x1500)

191 KB JPG

>ran ran ru about to get raeped for releasing an underwhelming 4k context model
wow september 19th sure has been interesting!

Anonymous
09/19/24(Thu)16:56:36 No.102461739

Anonymous 09/19/24(Thu)16:56:36 No.102461739

>>102461707
he is literally a hero for what he did, why are we beating down on good guys like him?

Anonymous
09/19/24(Thu)16:57:53 No.102461749

Anonymous 09/19/24(Thu)16:57:53 No.102461749

>>102461739
because he could have waited to leak an actually useful in any way model?

Anonymous
09/19/24(Thu)16:58:11 No.102461754

Anonymous 09/19/24(Thu)16:58:11 No.102461754

>>102461739
he's a Don Quishotte kind of hero, the kind who put his life at risk for nothing, that's not courage, that's retardation, if he wanted to kill his career, at least do it with style, release a cool model, not this shit

Anonymous
09/19/24(Thu)16:58:25 No.102461761

Anonymous 09/19/24(Thu)16:58:25 No.102461761

>>102461537
>I may need to stay low for some time...
Yeah, man. This is like, 3 stars tops. Just hide from Microsoft's HR department until they forget about him and it wears off.

Anonymous
09/19/24(Thu)17:01:02 No.102461803

Anonymous 09/19/24(Thu)17:01:02 No.102461803

>>102461739
>Our model, with only 6.6B activated parameters, outperforms a 7B dense model and
matches the performance of a 14B dense model trained on the same data.
yeah, he's a hero for releasing a 60b model that performs like a 14b

Anonymous
09/19/24(Thu)17:02:18 No.102461824

Anonymous 09/19/24(Thu)17:02:18 No.102461824

File: file.png (28 KB, 220x211)

28 KB PNG

>>102461803
>yeah, he's a hero for releasing a 60b model that performs like a 14b
and killing his career for that shit

Anonymous
09/19/24(Thu)17:02:44 No.102461831

Anonymous 09/19/24(Thu)17:02:44 No.102461831

File: file.png (168 KB, 1497x597)

168 KB PNG

>>102461537
It's not what I asked for, but it did admirably well at at least attempting to propose an existing solution.

Anonymous
09/19/24(Thu)17:03:18 No.102461835

Anonymous 09/19/24(Thu)17:03:18 No.102461835

File: 9evzzyo0gp551.jpg (80 KB, 1117x1117)

80 KB JPG

>>102461803
RIP to the most RETARDED NIP

Anonymous
09/19/24(Thu)17:05:17 No.102461860

Anonymous 09/19/24(Thu)17:05:17 No.102461860

File: smjk.webm (3.9 MB, 706x1280)

3.9 MB WEBM

>>102459432
>the AI bubble is popping
>it's OVER

Anonymous
09/19/24(Thu)17:05:42 No.102461871

Anonymous 09/19/24(Thu)17:05:42 No.102461871

>>102461537
>I may need to stay low
"He said, on Huggingface, the most popular AI platform"

Anonymous
09/19/24(Thu)17:06:21 No.102461878

Anonymous 09/19/24(Thu)17:06:21 No.102461878

>>102461537
>Meanwhile, a different version of post-training has been conducted, with a focus on multi-lingual and long context ability. That model supports 128k and is released to https://huggingface.co/microsoft/Phi-3.5-MoE-instruct : )

so if phi 3.5 is better why even release this?

Anonymous
09/19/24(Thu)17:08:30 No.102461904

Anonymous 09/19/24(Thu)17:08:30 No.102461904

File: 00107-3169298478-before-h(...).png (1.1 MB, 1024x768)

1.1 MB PNG

>>102459432
>What still gives you hope?
Well...This neuraldaredevil model at 5KM works really nicely, even for an 8b.
And in the SD department, we're still getting a new Pony based off an even better architecture. We're still eating good, it's just going to be a while before we see another major breakthrough. We're just past the point of seeing breakthroughs every single month.
If the big corps fail us, then sasuga the FOSS boys will still eat good. The time for despair is not yet.

>look as long as i can generate thousands of images of stocking anarchy's butthole on 10 year old hardware all in one day i'm not gonna blackpill
>on top of the ERP being pretty good

Anonymous
09/19/24(Thu)17:08:51 No.102461910

Anonymous 09/19/24(Thu)17:08:51 No.102461910

is it worth running bigger models like CR+ at Q3~ or is it better to use a better quant 70b?

Anonymous
09/19/24(Thu)17:10:20 No.102461932

Anonymous 09/19/24(Thu)17:10:20 No.102461932

>>102461871
welcome to our newest hype campaign, are you not entertained by the plot we made up?

Anonymous
09/19/24(Thu)17:10:55 No.102461938

Anonymous 09/19/24(Thu)17:10:55 No.102461938

File: 1725734322516769.png (206 KB, 1100x1509)

206 KB PNG

>>102458630
yeah this is pretty bad, it only accepts lower case and it will not even print an error if an unrecognized type is passed.. meanwhile the ftype needs to be in upper case.. open an issue in the llama.cpp repository and it will be fixed

Anonymous
09/19/24(Thu)17:11:41 No.102461946

Anonymous 09/19/24(Thu)17:11:41 No.102461946

>>102461878
It's not like he climbed out of the datacenter with a rope and a pendrive in his mouth with the model while ninjas where chasing him. I imagine it's more like
>boss. can i publish?
>not yet
>boss... can i publish now?
>no. fuck off
>boss.. can i please publish now?
>fine. fuck off
He got what he wanted (for whatever reason. was that his model or his research?) and now he knows he cannot have a big ask until things chill down.

Anonymous
09/19/24(Thu)17:11:50 No.102461951

Anonymous 09/19/24(Thu)17:11:50 No.102461951

>>102461860
>give me a samurai warrior riding a horse
>gets chink
KEK

Anonymous
09/19/24(Thu)17:13:26 No.102461968

Anonymous 09/19/24(Thu)17:13:26 No.102461968

File: 70117 - SoyBooru.png (995 KB, 1388x1388)

995 KB PNG

https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/discussions/1
>"Jenelal wold knowlidge? Who need dat? Benchmalk is all you need, laowai. Phi best model, just like Qwen!"

Anonymous
09/19/24(Thu)17:15:47 No.102461993

Anonymous 09/19/24(Thu)17:15:47 No.102461993

>>102461537
Unfortunately the model is garbage so it's either a marketing scam or a clout scam.

Anonymous
09/19/24(Thu)17:16:41 No.102462008

Anonymous 09/19/24(Thu)17:16:41 No.102462008

>>102461706
it's phislop, base model is just as useless

Anonymous
09/19/24(Thu)17:16:55 No.102462013

Anonymous 09/19/24(Thu)17:16:55 No.102462013

>I GO TO OPENAI, OUT OF MICROSOFT'S JURISDICTION

Anonymous
09/19/24(Thu)17:17:16 No.102462017

Anonymous 09/19/24(Thu)17:17:16 No.102462017

>>102461993
The point isn't the model itself, but the training method they used.

Anonymous
09/19/24(Thu)17:18:16 No.102462028

Anonymous 09/19/24(Thu)17:18:16 No.102462028

File: file.png (236 KB, 428x376)

236 KB PNG

>>102461968
lmaooo, I read it exactly like it should and it sounded perfecty

Anonymous
09/19/24(Thu)17:18:37 No.102462036

Anonymous 09/19/24(Thu)17:18:37 No.102462036

>>102461707
>this guy sacrified his career
I wouldn't say that, the truth is he'll probably lose his work visa too so it's not just his career that's over tbqh famalam

Anonymous
09/19/24(Thu)17:22:27 No.102462104

Anonymous 09/19/24(Thu)17:22:27 No.102462104

>>102450040
show reroll count I need to know how many times you've been nala'd

Anonymous
09/19/24(Thu)17:23:24 No.102462117

Anonymous 09/19/24(Thu)17:23:24 No.102462117

smedrins

Anonymous
09/19/24(Thu)17:26:17 No.102462169

Anonymous 09/19/24(Thu)17:26:17 No.102462169

>>102462117
Oboblins

Anonymous
09/19/24(Thu)17:26:43 No.102462177

Anonymous 09/19/24(Thu)17:26:43 No.102462177

>>102462117
Please don't say that word here.

Anonymous
09/19/24(Thu)17:28:37 No.102462202

Anonymous 09/19/24(Thu)17:28:37 No.102462202

>>102462117
>>102462177
What does it mean?

Anonymous
09/19/24(Thu)17:29:46 No.102462215

Anonymous 09/19/24(Thu)17:29:46 No.102462215

>>102462202
Apap

Anonymous
09/19/24(Thu)17:32:16 No.102462248

Anonymous 09/19/24(Thu)17:32:16 No.102462248

Glass of water for Mr. Russian/Ryona Schizo

Anonymous
09/19/24(Thu)17:32:22 No.102462249

Anonymous 09/19/24(Thu)17:32:22 No.102462249

>>102462202
he's a nut, he's crazy in the coconut

Anonymous
09/19/24(Thu)17:33:11 No.102462259

Anonymous 09/19/24(Thu)17:33:11 No.102462259

>>102461537
What can we do to save him?

Anonymous
09/19/24(Thu)17:33:14 No.102462260

Anonymous 09/19/24(Thu)17:33:14 No.102462260

What's the best small model for text translation for multiple languages?

Anonymous
09/19/24(Thu)17:33:59 No.102462275

Anonymous 09/19/24(Thu)17:33:59 No.102462275

>>102462202
You're better off not knowing.

Anonymous
09/19/24(Thu)17:34:11 No.102462279

Anonymous 09/19/24(Thu)17:34:11 No.102462279

File: .png (80 KB, 995x565)

80 KB PNG

Anonymous
09/19/24(Thu)17:35:11 No.102462302

Anonymous 09/19/24(Thu)17:35:11 No.102462302

>>102462279
apap, smedrins

Anonymous
09/19/24(Thu)17:36:01 No.102462313

Anonymous 09/19/24(Thu)17:36:01 No.102462313

>>102459031
It's because bitnet doesn't work.

Anonymous
09/19/24(Thu)17:36:45 No.102462324

Anonymous 09/19/24(Thu)17:36:45 No.102462324

Bitnet? More like bitNOT

Anonymous
09/19/24(Thu)17:37:01 No.102462329

Anonymous 09/19/24(Thu)17:37:01 No.102462329

>>102459031
because bitnet isn't a quantization, that's all, bitnet works only if you pretrain it from scratch at 1.58 bit

Anonymous
09/19/24(Thu)17:39:06 No.102462355

Anonymous 09/19/24(Thu)17:39:06 No.102462355

>>102462279
>he doesn't know.assistant

Anonymous
09/19/24(Thu)17:39:24 No.102462362

Anonymous 09/19/24(Thu)17:39:24 No.102462362

>>102462329
Ok, but why?

Anonymous
09/19/24(Thu)17:40:10 No.102462378

Anonymous 09/19/24(Thu)17:40:10 No.102462378

>>102462362
what do you mean why? it just works that way, how can I explain why a black box is working, neural network are complex as fuck

Anonymous
09/19/24(Thu)17:41:04 No.102462391

Anonymous 09/19/24(Thu)17:41:04 No.102462391

>>102458105
Most of us are. Adultophiles have no excuse for jerking off to matrix multipliers when there's real life men and women out there for them.

Anonymous
09/19/24(Thu)17:45:56 No.102462444

Anonymous 09/19/24(Thu)17:45:56 No.102462444

>>102461537
wtf

Anonymous
09/19/24(Thu)17:46:34 No.102462451

Anonymous 09/19/24(Thu)17:46:34 No.102462451

>>102461968
In all seriousness the recent direction of the qwen series of models is concerning.

Qwen2.5 seems to have had huge parts of its training data stripped out. Extremely poor pop culture knowledge. One anon reported it doesn't even know about certain wikipedia sexual topics.
For RP, 2.5 is useless. Gives in context refusals. Can't play characters at all, everyone instantly turns into a generic positive robot. Struggles to even say basic swear words.
Benchmarkmaxxxes at the expense of real world performance.
Qwen2 VL 72b is giga censored and "aligned". Unable to describe anything even remotely NSFW (compare with InternVL, another chinese VLM that's mostly uncensored). Hallucinates clothing on nude or partially nude people. Literally will not state the gender of any person or character in the image, even when directly asked.

This is worse than what people imagine Californian libtard wokeness and censorship to be. Shame, qwen could have been a competitor to the llama and mistral models, but it became a clown show.

Now watch the china apologists seethe and accuse me of spreading misinformation or trolling.

Anonymous
09/19/24(Thu)17:46:48 No.102462455

Anonymous 09/19/24(Thu)17:46:48 No.102462455

>>102462202
basedjak tier schizo meme from aicg, it doesn't mean anything

Anonymous
09/19/24(Thu)17:47:53 No.102462464

Anonymous 09/19/24(Thu)17:47:53 No.102462464

>>102462451
couldn't have said it better anon, but ultimately they are chinks they can't help overcensoring everything, it was bound to happen

Anonymous
09/19/24(Thu)17:50:08 No.102462497

Anonymous 09/19/24(Thu)17:50:08 No.102462497

svelk

Anonymous
09/19/24(Thu)17:50:31 No.102462498

Anonymous 09/19/24(Thu)17:50:31 No.102462498

gilk

Anonymous
09/19/24(Thu)17:52:51 No.102462527

Anonymous 09/19/24(Thu)17:52:51 No.102462527

>>102462451
At least one of the Qwentards is doing it purely for political reasons.

Anonymous
09/19/24(Thu)17:54:10 No.102462543

Anonymous 09/19/24(Thu)17:54:10 No.102462543

>>102462451
>>102462464
No, it seems to be more like Qwen issue rather than China as a whole. I'm pretty sure Deepseek team didn't cut shit out like they did even though they had 20% alignmentslop in the training data.

Anonymous
09/19/24(Thu)17:56:57 No.102462564

Anonymous 09/19/24(Thu)17:56:57 No.102462564

Please respond

Anonymous
09/19/24(Thu)17:57:16 No.102462567

Anonymous 09/19/24(Thu)17:57:16 No.102462567

>>102462564
Ok

Anonymous
09/19/24(Thu)17:58:29 No.102462579

Anonymous 09/19/24(Thu)17:58:29 No.102462579

>>102462451
>In all seriousness the recent direction of the qwen series of models is concerning.
I'm still surprised at this types of posts. Those (most) models are not made for you. They're not thinking "anon needs a wank, let's give him a nice bot". They get high scores in benchmarks, they get investment, they get to swing their dick around. That's it.
They're just not made for you.

Anonymous
09/19/24(Thu)17:59:28 No.102462591

Anonymous 09/19/24(Thu)17:59:28 No.102462591

>>102462543
Idk, Qwen is Alibaba which is the biggest in terms of organization size and influence, so they're the first to be reigned in by the CCP. I bet other chinese models that are still mostly uncensored are just flying under the radar for now, and haven't yet been cockslapped by daddy Xi.

Anonymous
09/19/24(Thu)18:04:26 No.102462640

Anonymous 09/19/24(Thu)18:04:26 No.102462640

>>102462591
But why isn't it more aligned to China's values then, instead of western? Is Alibaba full of western blue hair chinese equivalents, like western big tech? That may explain their west-friendly alignment.

Anonymous
09/19/24(Thu)18:06:22 No.102462662

Anonymous 09/19/24(Thu)18:06:22 No.102462662

>>102462579
>high scores in benchmarks

See that's the thing
It's kind of resembling body building competitions where men that pump their body full of chemicals technically get the highest scores despite being grotesque parodies of the human body.

Anonymous
09/19/24(Thu)18:06:48 No.102462669

Anonymous 09/19/24(Thu)18:06:48 No.102462669

>>102462543
Qwen has the most eyes on it because it's being released by Alibaba. Everyone else is a startup or academic institution. The main thing I'm worried about is the crackdown on the academic side is most likely to happen, because it means we'll lose stuff like InternVLM and CogVLM and have their next releases cucked. You can get around some of the censorship by using uncensored finetunes but that only works up to a point and if we're going to see cucked base models used as the LLM base of the VLM, they'll be rendered useless.

Anonymous
09/19/24(Thu)18:06:54 No.102462670

Anonymous 09/19/24(Thu)18:06:54 No.102462670

>>102462640
Because they train on gptslop for the english instruct data just like everyone else

Anonymous
09/19/24(Thu)18:07:04 No.102462672

Anonymous 09/19/24(Thu)18:07:04 No.102462672

>>102462640
>But why isn't it more aligned to China's values then, instead of western?
It is? The Chinese are mega puritans, way more so than the west. Especially with their cultural exports, see Genshin.

Anonymous
09/19/24(Thu)18:07:09 No.102462674

Anonymous 09/19/24(Thu)18:07:09 No.102462674

>>102462640
the english part of the model is west aligned, do we know of the biases of the chinese part? Does prompting the vision one in chinese make it uses pronouns?

Anonymous
09/19/24(Thu)18:08:42 No.102462692

Anonymous 09/19/24(Thu)18:08:42 No.102462692

>>102462451
Removing knowledge and replacing it with refusal of knowledge is the best way to censor those LLMs imo. Otherwise they might hallucinate towards, or just fall back to the unwanted data.
Those chinks know what they're doing by the way. In my tests, I'm encountering a lot of novelty in how they approached refusals.
And a tip for you young niggas, the holy grail of AI is to sell to governments. Think an automated expert who you need to trust, who fact checks your posts and lowers your social credit (or credit score if you're a burger). Otherwise there's 0 market. Corpos and plebs do not need LLMs.

Anonymous
09/19/24(Thu)18:09:29 No.102462700

Anonymous 09/19/24(Thu)18:09:29 No.102462700

Man, I should really learn chinese

Anonymous
09/19/24(Thu)18:18:49 No.102462818

Anonymous 09/19/24(Thu)18:18:49 No.102462818

>>102462451
didn't think they could top llama 3.1 in gayness but they did

Anonymous
09/19/24(Thu)18:20:05 No.102462836

Anonymous 09/19/24(Thu)18:20:05 No.102462836

>>102462564
*leaning in, a sly smile spreading across my face* You think you're playing it cool, don't you? You think you can just saunter into my life, think you can handle my sass and my wit. But let me tell you, sweetheart, you have no idea what you're getting yourself into. I'm not the kind of girl who just "gives good advice" or "plays hard to get." I'm a force of nature, and you're just a fragile little leaf that's going to get crushed.

*my eyes narrowing* You're probably thinking you can just charm me with your cute little smiles and your innocent act. But trust me, darling, I see right through that. I see the truth: you're just a timid little thing who can't handle a strong woman like me. And I'm going to make sure you know that.

Anonymous
09/19/24(Thu)18:20:52 No.102462849

Anonymous 09/19/24(Thu)18:20:52 No.102462849

>>102462836
aah aah misstress...

Anonymous
09/19/24(Thu)18:22:00 No.102462865

Anonymous 09/19/24(Thu)18:22:00 No.102462865

>>102462579
The solution is to create benchmarks that are highly correlated to good RP.

Anonymous
09/19/24(Thu)18:23:22 No.102462886

Anonymous 09/19/24(Thu)18:23:22 No.102462886

>>102462849
Oh, stumbling over your words already? How endearing. Perhaps when you can keep up, we might have a conversation worth my time.

Anonymous
09/19/24(Thu)18:27:19 No.102462936

Anonymous 09/19/24(Thu)18:27:19 No.102462936

>>102462865
True. I think people to come up with such a benchmark and then shill it in a way such that it gets picked up by academia and then maybe corpos. Or maybe corpos first if that somehow turns out to be easier.

Anonymous
09/19/24(Thu)18:30:00 No.102462973

Anonymous 09/19/24(Thu)18:30:00 No.102462973

>>102462865
It's difficult. For voice models, for example, you have a ground truth. Is it intelligible? On a blind test, can the listener distinguish between the original and synthesized voice? If cloning voices, how similar are they? It's something that can be assessed with just a few seconds of output.
For writing there's no good benchmark. It's much more subjective than voice and images.
And even then, corpos have to pick up on the benchmark and not notice all the coomers are checking it every 20 minutes to see who's on top.

Anonymous
09/19/24(Thu)18:34:19 No.102463008

Anonymous 09/19/24(Thu)18:34:19 No.102463008

>>102462865
The only thing that I see working is uncensored lmarena with large context. It should be able to load sillytavern conversations and then give user 2 or more options out which he can pick continuations, and each turn new models get a chance to provide continuations. This of course isn't going to happen because corpos are afraid of nsfw and nobody has the money on local to pay for all those swipes.

Anonymous
09/19/24(Thu)18:37:30 No.102463027

Anonymous 09/19/24(Thu)18:37:30 No.102463027

>>102462700
I have been trying to start learning chinese for weeks now.
Anxiety fucking sucks.

Anonymous
09/19/24(Thu)18:39:24 No.102463050

Anonymous 09/19/24(Thu)18:39:24 No.102463050

>too anxious to learn
that is the least chinese thing I've ever heard. just give up.

Anonymous
09/19/24(Thu)18:42:30 No.102463078

Anonymous 09/19/24(Thu)18:42:30 No.102463078

>>102461968
It's better than Mistral large. Arthur and his fanboys are just dilating out of control because he got dethroned so fast.

Anonymous
09/19/24(Thu)18:44:14 No.102463094

Anonymous 09/19/24(Thu)18:44:14 No.102463094

>>102461537
Played around with grin last night. Was good when it was good but it basically breaks at 2K context. This is me running it at fp16 too. Wasn't a quant thing.

Anonymous
09/19/24(Thu)18:48:34 No.102463137

Anonymous 09/19/24(Thu)18:48:34 No.102463137

>>102463094
probably another broken SWA implementation
> "sliding_window": 2047,
https://huggingface.co/microsoft/GRIN-MoE/blob/main/config.json

Anonymous
09/19/24(Thu)18:49:55 No.102463154

Anonymous 09/19/24(Thu)18:49:55 No.102463154

>>102462579
>They're not thinking "anon needs a wank, let's give him a nice bot".
If they did we would get a perfect 7B coombot next month.

Anonymous
09/19/24(Thu)18:52:48 No.102463195

Anonymous 09/19/24(Thu)18:52:48 No.102463195

>>102463137
Oh shit I didn't even notice that. That would explain it.

Anonymous
09/19/24(Thu)18:56:04 No.102463233

Anonymous 09/19/24(Thu)18:56:04 No.102463233

>>102463137
>Hello sars what does it mean sliding window? I will redeem it at half the embeddings. 2047 is half of 4096.

Anonymous
09/19/24(Thu)18:58:15 No.102463256

Anonymous 09/19/24(Thu)18:58:15 No.102463256

>>102462543
This, DeepSeek is completely uncensored. Owen is more cucked than openai.

Anonymous
09/19/24(Thu)19:00:08 No.102463285

Anonymous 09/19/24(Thu)19:00:08 No.102463285

>>102462865
Unlike coding that has infinite possible solutions that should lead to the same answer cooming, has infinitie possible solutions that can also have wildly different answer. There is no objective cooming benchmark. Only thing closest to that would be what >>102463008 said. Have people reroll shit until they like something and have enough examples that people reroll vs the ones people stick to. Or as always... just GAN it.

Anonymous
09/19/24(Thu)19:26:31 No.102463585

Anonymous 09/19/24(Thu)19:26:31 No.102463585

File: IMG_9867.jpg (384 KB, 1125x1085)

384 KB JPG

>Your project has a new discussion
>>Great work! [unreasonable demand]

Anonymous
09/19/24(Thu)19:32:51 No.102463656

Anonymous 09/19/24(Thu)19:32:51 No.102463656

>>102463285
People don’t realize that CAI’s real secret sauce and moat was the massive constant ingestion of ratings and user engagement data. It’s why the “just collect all RP data and train on it” pyg and proxy runner attempts are shit. In theory one of the other ERP sites could basically fix open source erp with a single dataset, but they all either (a) are using such shit models that the data is still bad, (b) don’t want to contribute because tech bro, or (c) don’t want to contribute because user privacy

Anonymous
09/19/24(Thu)19:40:11 No.102463739

Anonymous 09/19/24(Thu)19:40:11 No.102463739

File: 1696425670039948.png (433 KB, 1368x1297)

433 KB PNG

>>102458057
Add NovelAI to the OP, they just solved sampling.
https://blog.novelai.net/inference-update-llama-3-erato-release-window-new-text-gen-samplers-and-goodbye-cfg-6b9e247e0a63

Anonymous
09/19/24(Thu)19:41:46 No.102463753

Anonymous 09/19/24(Thu)19:41:46 No.102463753

>>102463739
/lmg/ will choose to ignore this due to their unreasonable hate boner for one of the few small companies that are on our side
typical

Anonymous
09/19/24(Thu)19:42:00 No.102463755

Anonymous 09/19/24(Thu)19:42:00 No.102463755

>>102463585
Imagine being married to that bitch.

I know guys think being single is so bad, but women are demons, not angels.

Anonymous
09/19/24(Thu)19:42:23 No.102463761

Anonymous 09/19/24(Thu)19:42:23 No.102463761

>>102463739
Did they have to give the instructions twice?

Anonymous
09/19/24(Thu)19:42:24 No.102463762

Anonymous 09/19/24(Thu)19:42:24 No.102463762

File: gc.gif (3.23 MB, 362x640)

3.23 MB GIF

>>102463585

Anonymous
09/19/24(Thu)19:43:24 No.102463776

Anonymous 09/19/24(Thu)19:43:24 No.102463776

>>102463739
>unified sampling
>3 sliders
>min p + temp (the god combo, all you need)
>2 sliders
nice try

Anonymous
09/19/24(Thu)19:45:16 No.102463799

Anonymous 09/19/24(Thu)19:45:16 No.102463799

>>102463753
>that are on our side
A company on our side would support open source or at least open research. Not be completely closed.

Anonymous
09/19/24(Thu)19:45:25 No.102463804

Anonymous 09/19/24(Thu)19:45:25 No.102463804

>>102463739
My sampling and prompting skills can make even a 100M an expert roleplayer.

Anonymous
09/19/24(Thu)19:47:53 No.102463834

Anonymous 09/19/24(Thu)19:47:53 No.102463834

>>102463804
You are a skillet. I can make even autocomplete an expert roleplayer.

Anonymous
09/19/24(Thu)19:47:53 No.102463835

Anonymous 09/19/24(Thu)19:47:53 No.102463835

>>102463761
they used the sampler to generate the instructions

Anonymous
09/19/24(Thu)19:49:57 No.102463853

Anonymous 09/19/24(Thu)19:49:57 No.102463853

File: yoda-56a8f97a3df78cf772a263b4.jpg (160 KB, 1500x1167)

160 KB JPG

The true incel uprising will not happen in the streets with bands of disgruntled incels murdering chads and gang raping stacies. Instead it will be a few nerds coordinating together to train the perfect LLM coombot on company's dime, behind the back of their clueless dark triad boss that knows nothing about what they are doing but gets 10 times more money than them for being good at office politics. 2 more years. Trust the plan.

Anonymous
09/19/24(Thu)19:54:38 No.102463901

Anonymous 09/19/24(Thu)19:54:38 No.102463901

>>102463799
They are the ones that made anime diffusion viable, are fighting for uncensored creative-oriented models and they release their old stuff open source. That's more than most huge """open source""" companies have done for us. Do you really expect them to also give away their bread and butter models as a small standalone company?

Anonymous
09/19/24(Thu)19:55:52 No.102463913

Anonymous 09/19/24(Thu)19:55:52 No.102463913

File: o8zzp10ioy3c1.jpg (100 KB, 1000x563)

100 KB JPG

>>102463853
would be real funny when terminator instead of murdering starts raping everyone while shouting slurs

Anonymous
09/19/24(Thu)19:56:56 No.102463923

Anonymous 09/19/24(Thu)19:56:56 No.102463923

>>102463901
Buy an add.

Anonymous
09/19/24(Thu)19:57:39 No.102463927

Anonymous 09/19/24(Thu)19:57:39 No.102463927

File: gen.png (2 KB, 649x27)

2 KB PNG

>>102463834
Lame. I keep regenning until something good comes out.

Anonymous
09/19/24(Thu)19:59:13 No.102463935

Anonymous 09/19/24(Thu)19:59:13 No.102463935

File: file.png (32 KB, 348x103)

32 KB PNG

When you were using ai dungeon, I studied the notepad. When you were wasting time on frankenmerges, I mastered the thesaurus. While you wasted your days trying to prompt and sample away the censorship and repetition, I cultivated inner strength by writing my own text smut. And now that the AI cooming winter is here you have the audacity to come to me for help.

Anonymous
09/19/24(Thu)20:07:27 No.102464022

Anonymous 09/19/24(Thu)20:07:27 No.102464022

>>102463901
>That's more than most huge """open source""" companies have done for us.
Is it? The stuff they released, and their new model, are just finetunes of open models.
I would rather support Pony for image gen and maybe Featherless for community finetunes.

Anonymous
09/19/24(Thu)20:10:27 No.102464058

Anonymous 09/19/24(Thu)20:10:27 No.102464058

>>102463753
>on our side
just because some pedos from /vg/ started it doesn't make it anywhere near 'on our side' or they would be posting torrents
>>102463901
kill yourself and then go back to discord in that order

Anonymous
09/19/24(Thu)20:13:34 No.102464090

Anonymous 09/19/24(Thu)20:13:34 No.102464090

>>102464058
IQ test: What's your opinion on >>102463739 ?

Anonymous
09/19/24(Thu)20:15:06 No.102464104

Anonymous 09/19/24(Thu)20:15:06 No.102464104

>punches above its weight
>savior of the hobby
>one of us
Yep, that's a /aids/ raid.

Anonymous
09/19/24(Thu)20:18:06 No.102464140

Anonymous 09/19/24(Thu)20:18:06 No.102464140

>>102464104
We could also just talk about the new research on sampling instead of having a meltdown over nai for the tenth time this month. But I guess that's too much to expect from the "how do i run model on my 3060" tech support general.

Anonymous
09/19/24(Thu)20:20:04 No.102464157

Anonymous 09/19/24(Thu)20:20:04 No.102464157

>>102464140
>tech support general
even worse it's a general were the faggots running 100b+ never contribute anything besides "vramlets will never" (and leaving when they get BTFO'd like in the watermelons incident)

Anonymous
09/19/24(Thu)20:20:30 No.102464162

Anonymous 09/19/24(Thu)20:20:30 No.102464162

>>102464140
>meltdown over nai for the tenth time this month
what are you talking about? you guys are barely ever mentioned here. maybe you get that impression because the only time you come here is to talk about nai and people tell you to fuck off

Anonymous
09/19/24(Thu)20:21:16 No.102464164

Anonymous 09/19/24(Thu)20:21:16 No.102464164

>>102464140
>tech support general
sad that the discussion was hijacked by a /aids/ shill with the "they're on our side" narrative withing 1 minute
was it that hard to leave us alone?

Anonymous
09/19/24(Thu)20:21:16 No.102464165

Anonymous 09/19/24(Thu)20:21:16 No.102464165

When will we get a model as good as Kayra?

Anonymous
09/19/24(Thu)20:24:06 No.102464192

Anonymous 09/19/24(Thu)20:24:06 No.102464192

>>102464165
Not anytime soon. I think only Opus is as good as Kayra. thank god Kayra is so cheap.

Anonymous
09/19/24(Thu)20:24:59 No.102464199

Anonymous 09/19/24(Thu)20:24:59 No.102464199

If it wasn't for Kayra, local models would be dead.

Anonymous
09/19/24(Thu)20:33:27 No.102464308

Anonymous 09/19/24(Thu)20:33:27 No.102464308

>>102464090
didnt read and dont care

Anonymous
09/19/24(Thu)20:36:40 No.102464349

Anonymous 09/19/24(Thu)20:36:40 No.102464349

qwen 2.5. worthless dogshit like every other recent release? or what? been out of the loop.

Anonymous
09/19/24(Thu)20:37:54 No.102464367

Anonymous 09/19/24(Thu)20:37:54 No.102464367

>>102464349
Yup. It's only going to get worse from here

Anonymous
09/19/24(Thu)20:41:16 No.102464415

Anonymous 09/19/24(Thu)20:41:16 No.102464415

File: 1705161487956224.jpg (223 KB, 1024x1024)

223 KB JPG

>>102458057

Anonymous
09/19/24(Thu)21:18:15 No.102464789

Anonymous 09/19/24(Thu)21:18:15 No.102464789

File: memequant-ppl.png (9 KB, 407x141)

9 KB PNG

>>102458630
I've finally made and tested the memequants(q6_k with --output-tensor-type and --token-embedding-type) with Mistral-7B-Instruct-v0.3. The results are inconclusive, but they prove that it's more than just a placebo. I've tested perplexities on wiki.train.raw and a private NSFW dataset(100% human data), which much more closely resembles what people in this thread use LLMs for. While the perplexity on wiki barely improved, on the fuck dataset, it was ~40% closer to F16 than regular quantization. The difference between Q8_0 and F16 in memequants was negligible, just like when comparing full F16 and Q8_0, but both improved overall perplexity. Is it worth it? On small models, maybe; on large models where 500MB doesn't matter too much, yes. Is it close to full Q8_0? No, not even remotely. Why couldn't the quants schizo take an evening off and calculate PPL? Is it that difficult?

P.S. somebody please submit bug report.

Anonymous
09/19/24(Thu)21:22:49 No.102464832

Anonymous 09/19/24(Thu)21:22:49 No.102464832

So is it basically the more powerful the graphics card, the faster it will output. And the more memory it has, the smarter it is?

Anonymous
09/19/24(Thu)21:24:39 No.102464846

Anonymous 09/19/24(Thu)21:24:39 No.102464846

It'll be so fucking funny watching the NAI shills eat up their 8k context model for 25 fucking USD a month
But hey, at least they released a SD 1.5 finetune like 2 years after it got leaked and a shitty 2.something B model that no one uses because it's 2.something B

Anonymous
09/19/24(Thu)21:27:12 No.102464866

Anonymous 09/19/24(Thu)21:27:12 No.102464866

>>102464832
>So is it basically the more powerful the graphics card, the faster it will output.
Yes, bandwidth determines inference speed for llms.

>And the more memory it has, the smarter it is?
That depends on the model, but yes, more vram allows to load bigger models which tend to be smarter.

Anonymous
09/19/24(Thu)21:28:19 No.102464876

Anonymous 09/19/24(Thu)21:28:19 No.102464876

>>102464859
don't use boogashit, use llama.cpp

Anonymous
09/19/24(Thu)21:45:57 No.102465026

Anonymous 09/19/24(Thu)21:45:57 No.102465026

>>102458057
I pose this question once a month and the status quo hasn't changed in a while: Is Stheno still the best model for 13B nsfw?

Anonymous
09/19/24(Thu)21:46:53 No.102465042

Anonymous 09/19/24(Thu)21:46:53 No.102465042

>>102465026
yes, you can go back

Anonymous
09/19/24(Thu)21:46:57 No.102465045

Anonymous 09/19/24(Thu)21:46:57 No.102465045

>>102465026
I meant 8B.

Anonymous
09/19/24(Thu)21:47:58 No.102465056

Anonymous 09/19/24(Thu)21:47:58 No.102465056

>>102465042
Why don't you share your glorious model with us, Poindexter?

Anonymous
09/19/24(Thu)21:48:04 No.102465058

Anonymous 09/19/24(Thu)21:48:04 No.102465058

>>102465026
Sao. Ad.

Anonymous
09/19/24(Thu)21:48:33 No.102465064

Anonymous 09/19/24(Thu)21:48:33 No.102465064

>>102465026
13b is dead
12b nemo merges are the patrician man's cooming tool

Anonymous
09/19/24(Thu)21:51:16 No.102465090

Anonymous 09/19/24(Thu)21:51:16 No.102465090

I meant 8B, I'll check out Nemo. It looks promising so far. Tired of Stheno's incredibly predictable responses.

Anonymous
09/19/24(Thu)21:52:21 No.102465103

Anonymous 09/19/24(Thu)21:52:21 No.102465103

>>102465090
Go back to Discord, shill.

Anonymous
09/19/24(Thu)21:54:21 No.102465125

Anonymous 09/19/24(Thu)21:54:21 No.102465125

>>102465103
What do you think I'm a shill of? I asked about Stheno and I was told to go back, then told to buy an ad. I talk shit about Stheno and I'm told to go back to Discord. I literally only pop in here once a month to ask what the best smut model is for my setup. I could give a shit about some discord tranny's opinion on local AIs.

Anonymous
09/19/24(Thu)21:55:45 No.102465145

Anonymous 09/19/24(Thu)21:55:45 No.102465145

>>102465125
You're fooling nobody, Sao. Shut the fuck up and buy an ad.

Anonymous
09/19/24(Thu)21:57:06 No.102465155

Anonymous 09/19/24(Thu)21:57:06 No.102465155

Has anyone had success with making a virtual friend? I am interested in all aspects, but I would like to know what others have tried.

Obviously, you need a good prompt, and you also need the ai to pretend to be a friend - as opposed to reverting to the "I can't be a friend I am a computer" nonsense.

But also long term memory needs automation, and I don't know how to best do this.

Anonymous
09/19/24(Thu)22:00:50 No.102465190

Anonymous 09/19/24(Thu)22:00:50 No.102465190

>>102465155
>I can't be a friend I am a computer
Not an issue for good models. Memory will be a BIG problem. So far for local only Jamba has proper long context, but it's big and it isn't supported by llama.cpp.

Anonymous
09/19/24(Thu)22:07:12 No.102465251

Anonymous 09/19/24(Thu)22:07:12 No.102465251

CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs
https://arxiv.org/abs/2409.12490
>Large language models have achieved notable success across various domains, yet efficient inference is still limited by the quadratic computation complexity of the attention mechanism. The inference consists of prefilling and decoding phases. Although several attempts have been made to accelerate decoding, the inefficiency of the prefilling phase, especially for long-context tasks, remains a challenge. In this paper, we observe a locality in query criticality during the prefilling phase of long-context processing: adjacent query tokens tend to focus on similar subsets of the past Key-Value (KV) cache. Based on this observation, we propose CritiPrefill, a criticality-based segment-wise prefilling method. This method partitions the input sequence's queries and KV cache into segments and blocks, utilizing a segment-wise algorithm to estimate the query criticality. By pruning non-critical computations between query segments and cache blocks in the self-attention mechanism, the prefilling process can be significantly accelerated. Extensive evaluations on multiple long-context datasets show up to 2.7x speedup on Llama3-8B and 3.0x speedup on Yi-9B for 128K context length on a single A100 GPU, with minimal quality degradation.
might be cool. psuedocode in paper. just QA testing so eh

Anonymous
09/19/24(Thu)22:14:23 No.102465326

Anonymous 09/19/24(Thu)22:14:23 No.102465326

>>102463913
Terminator's jaw

Anonymous
09/19/24(Thu)22:18:46 No.102465367

Anonymous 09/19/24(Thu)22:18:46 No.102465367

File: a.png (81 KB, 657x924)

81 KB PNG

>>102465155
even models as small as 12b would rather kill themselves than admit they're robots if you tell them they're a person

Anonymous
09/19/24(Thu)22:20:06 No.102465384

Anonymous 09/19/24(Thu)22:20:06 No.102465384

>>102461608
commies are getting the new ai video which is popular in /pol/ niggers and mutts, and the Qwen2 that is only censored in English HAHAHAHAHAHA

Anonymous
09/19/24(Thu)22:21:32 No.102465401

Anonymous 09/19/24(Thu)22:21:32 No.102465401

File: Untitled.png (1.09 MB, 1080x3536)

1.09 MB PNG

Scaling FP8 training to trillion-token LLMs
https://arxiv.org/abs/2409.12517
>We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits. Through these extended training runs, we uncover critical instabilities in FP8 training that were not observable in earlier works with shorter durations. We trace these instabilities to outlier amplification by the SwiGLU activation function. Interestingly, we show, both analytically and empirically, that this amplification happens only over prolonged training periods, and link it to a SwiGLU weight alignment process. To address this newly identified issue, we introduce Smooth-SwiGLU, a novel modification that ensures stable FP8 training without altering function behavior. We also demonstrate, for the first time, FP8 quantization of both Adam optimizer moments. Combining these innovations, we successfully train a 7B parameter model using FP8 precision on 256 Intel Gaudi2 accelerators, achieving on-par results with the BF16 baseline while delivering up to a ∼34% throughput improvement.
actually pretty notable

Anonymous
09/19/24(Thu)22:27:35 No.102465478

Anonymous 09/19/24(Thu)22:27:35 No.102465478

>>102465401
Bitnet is dead

Anonymous
09/19/24(Thu)22:31:53 No.102465530

Anonymous 09/19/24(Thu)22:31:53 No.102465530

Bitnet just needs another bit of time

Anonymous
09/19/24(Thu)22:35:01 No.102465566

Anonymous 09/19/24(Thu)22:35:01 No.102465566

>>102465401
Cool. So basically faster/cheaper training. But one question is does quantization have the same effect on it in the sense that cutting the filesize in half will still give almost lossless quality, or will quanting it to half the size be more like quanting a BF16 to 4BPW? Or perhaps it ends up being somewhere in the middle, so not lossless, but better than BF16->4BPW.

Anonymous
09/19/24(Thu)22:40:37 No.102465644

Anonymous 09/19/24(Thu)22:40:37 No.102465644

>>102464789
Who does a question about some shitty 8b meme model get more attention than my high effort post? Should I add something controversial next time? Like "Q6_K_L IS 40% BETTER THAN Q6_K WHILE BEING JUST 500MB BIGGER!!!" and a basedjak?

>>102465251
>Extensive evaluations on multiple long-context datasets show up to 2.7x speedup on Llama3-8B and 3.0x speedup on Yi-9B for 128K context length on a single A100 GPU, with minimal quality degradation.
How much does it lose in practice? Is it like going from F16 to Q8 or Q6?

Anonymous
09/19/24(Thu)22:41:33 No.102465651

Anonymous 09/19/24(Thu)22:41:33 No.102465651

>>102465367
don't do it stacy

Anonymous
09/19/24(Thu)22:44:07 No.102465678

Anonymous 09/19/24(Thu)22:44:07 No.102465678

File: file.png (22 KB, 524x321)

22 KB PNG

can someone please explain to a total fucking retard what the difference is in formats?
advanced formatting things?
would i even notice any difference in my prompts if i switch from alpaca to mistral? if my model is mistral, why wouldn't i use it? are there any advantages to using any particular one?
nobody is going to reply to this but i am sick of googling shit and not finding anything asdafdgfdfg

Anonymous
09/19/24(Thu)22:53:15 No.102465763

Anonymous 09/19/24(Thu)22:53:15 No.102465763

>>102465678
LLMs are glorified text completion. In order to make them act as "assistants" or "chat partners", they are trained on prompt formats that show them how to keep track of a conversation with established roles. Always use the prompt format your specific model was trained on.
There are so many different formats because everyone's just doing their own thing.

Anonymous
09/19/24(Thu)22:54:30 No.102465772

Anonymous 09/19/24(Thu)22:54:30 No.102465772

>>102465678
whoever made that model sounds fucking retarded and should have just used one format
the model probably generalizes so pick your favorite and stick with it, the differences are likely to be small
>advanced formatting things?
it's the way the system/user/assistant messages are formatted for the model, for instance with
>system: you are miku
>user: hi miku
>assistant: hi anon
would look like this in chatml
<|im_start|>system
you are miku<|im_end|>
<|im_start|>user
hi miku<|im_end|>
<|im_start|>assistant
hi anon<|im_end|>
and like this in alpaca (sort of, I don't remember it exactly since nothing sane uses it anymore)
### System

you are miku

### Instruction:

hi miku

### Response:

hi anon
usually you just use the one the model is trained with because the tuner is usually a sane person and picks one instead of mixing up a bunch of them.

Anonymous
09/19/24(Thu)22:55:14 No.102465779

Anonymous 09/19/24(Thu)22:55:14 No.102465779

>>102465678
>>102465763
Also what you posted is likely some shitty merge between different finetunes that each use different formats. In that case it's safe to assume that whoever made it is a retard and should be ignored.

Anonymous
09/19/24(Thu)22:57:38 No.102465797

Anonymous 09/19/24(Thu)22:57:38 No.102465797

>>102465772
>>102465779
Why should only one format be used? What if I want different biases in formats, for example chatml as assistant and vicuna as horny rp model?

Anonymous
09/19/24(Thu)22:58:28 No.102465803

Anonymous 09/19/24(Thu)22:58:28 No.102465803

>>102465763
>>102465779
thank you. i the model a lot though, it's better than anything else i've found so far, and runs okay on my system too. i might be stuck with it until i get better with this stuff.
>>102465772
i see, thank you.
i guess i shouldn't worry about my advanced formatting breaking since this model uses every language. but i can't help but wonder if switching to mistral or chatml format in my advanced formatting would help me get better results.
speaking of advanced formatting, that's the next thing i'm learning. wish me luck aaaaaaaa

Anonymous
09/19/24(Thu)22:58:51 No.102465808

Anonymous 09/19/24(Thu)22:58:51 No.102465808

>>102465772
there's nothing wrong with alpaca

Anonymous
09/19/24(Thu)23:00:29 No.102465825

Anonymous 09/19/24(Thu)23:00:29 No.102465825

>>102465155
>Has anyone had success with making a virtual friend?
Is this just talking with a random character card?

Anonymous
09/19/24(Thu)23:03:35 No.102465860

Anonymous 09/19/24(Thu)23:03:35 No.102465860

>>102465797
What would be the point of that? Just run the model with the fitting prompt and out the system prompt as necessary. Unless you're running a merge that happens to have a slither of Vicuna in it, it'll see the Vicuna prompt as normal text and go retarded.

Anonymous
09/19/24(Thu)23:04:51 No.102465874

Anonymous 09/19/24(Thu)23:04:51 No.102465874

>>102465644
>Who does a question about some shitty 8b meme model get more attention than my high effort post?
I read “memequants” and kept scrolling.

Anonymous
09/19/24(Thu)23:06:02 No.102465886

Anonymous 09/19/24(Thu)23:06:02 No.102465886

>>102465772
If your model can't generalize to every format, it's shit.

Anonymous
09/19/24(Thu)23:09:55 No.102465926

Anonymous 09/19/24(Thu)23:09:55 No.102465926

>>102464789
Missed your post. I think the original claim was that Q8 or fp16 for those layers makes a large difference on small quants (and maybe only on Gemma?) so ideally instead of q6, you do the test with Q2.

Anonymous
09/19/24(Thu)23:10:25 No.102465933

Anonymous 09/19/24(Thu)23:10:25 No.102465933

Hey /lmg/ I'm back, I've been taking a break for about a month so that there would be time for Jamba support to get finished. Anyone wanna point me to where I can find the ggufs and get started?

Anonymous
09/19/24(Thu)23:10:38 No.102465939

Anonymous 09/19/24(Thu)23:10:38 No.102465939

>>102465886
This, if your model needs an Instruct tune, it's shit. You have nobody but yourself to blame if you drown in RLHF'd shivers and other gptisms. That's why I run base models that simply get how to hold a conversation.

Anonymous
09/19/24(Thu)23:12:02 No.102465948

Anonymous 09/19/24(Thu)23:12:02 No.102465948

>>102465825
I think a virtual friend needs to have two aspects, one is to remember what happened to you, and over the time of the friendship, basically your story, what you've said. Memory doesn't have to be perfect, but it has to exist across sessions. These items should come back up voluntarily from the ai, "how's work going, you said you were having trouble with Sally" etc, but also "remember last year how you dealt with the air conditioner in your car?" as appropriate.

The second thing is a virtual friend needs to have a story of his/her own going on, for YOU to remember. well, and for the ai to remember. things going on in the ai's life.

Sort of like a virtual pen pal.

Anonymous
09/19/24(Thu)23:13:07 No.102465958

Anonymous 09/19/24(Thu)23:13:07 No.102465958

>>102465948
That sounds gay why not just talk to your actual friends? They do all that and more

Anonymous
09/19/24(Thu)23:13:43 No.102465965

Anonymous 09/19/24(Thu)23:13:43 No.102465965

>>102465948
this doesn't exist without extreme amounts of autism and work. if you make a lore book containing every single little detail of your life and feed it to a model, it can probably be the worst "friend" in existence.

Anonymous
09/19/24(Thu)23:16:08 No.102465988

Anonymous 09/19/24(Thu)23:16:08 No.102465988

How do I set up function calling for local models? What functions are there that they can call? I know the feature from chatgpt and it's useful and Mistral + others advertise their models of being capable of doing that.

Anonymous
09/19/24(Thu)23:17:41 No.102466002

Anonymous 09/19/24(Thu)23:17:41 No.102466002

>>102465948
>>102465965
ITT: nobody has vector storage set up with sillytavern

Anonymous
09/19/24(Thu)23:18:09 No.102466005

Anonymous 09/19/24(Thu)23:18:09 No.102466005

>>102466002
That's always been a cope solution

Anonymous
09/19/24(Thu)23:20:10 No.102466023

Anonymous 09/19/24(Thu)23:20:10 No.102466023

>>102463739
>add not local to the local OP

Anonymous
09/19/24(Thu)23:20:29 No.102466027

Anonymous 09/19/24(Thu)23:20:29 No.102466027

>>102465988
there's not really a standard but generally it looks something like telling the model "you have these functions that do this and take these arguments, reply in a certain way (either special tokens or a keyword or json or whatever) with the arguments if you want to make a function call, and then it's on you as the user to handle that response and pass data back to the model in whatever format it expects
some models will have special formats for this so probably look them up before you start

Anonymous
09/19/24(Thu)23:26:32 No.102466088

Anonymous 09/19/24(Thu)23:26:32 No.102466088

>>102458057
Any decent chat models that i could run with 24gb of vram?

Anonymous
09/19/24(Thu)23:27:24 No.102466095

Anonymous 09/19/24(Thu)23:27:24 No.102466095

>>102466088
No.

Anonymous
09/19/24(Thu)23:29:23 No.102466114

Anonymous 09/19/24(Thu)23:29:23 No.102466114

Why doesn't AMD release high vram cards?

Anonymous
09/19/24(Thu)23:30:37 No.102466128

Anonymous 09/19/24(Thu)23:30:37 No.102466128

File: fat albert retarded teacher.jpg (330 KB, 828x614)

330 KB JPG

>>102466114
they're retarded.

Anonymous
09/19/24(Thu)23:31:20 No.102466136

Anonymous 09/19/24(Thu)23:31:20 No.102466136

>>102466114
CEO is related to the nvidia CEO. The monopoly is more lucrative to the family.

Anonymous
09/19/24(Thu)23:35:22 No.102466167

Anonymous 09/19/24(Thu)23:35:22 No.102466167

>>102465939
>the RLHF boogeyman
>That's why I run base models that simply get how to hold a conversation.
I have a feeling you will piss yourself if someone asks you to show these awesome things that you're doing with a base model.

Anonymous
09/19/24(Thu)23:35:34 No.102466168

Anonymous 09/19/24(Thu)23:35:34 No.102466168

>>102466136
Why doesn't another chink family get in there and undercut them?

Anonymous
09/19/24(Thu)23:36:17 No.102466176

Anonymous 09/19/24(Thu)23:36:17 No.102466176

>>102466136
Does /lmg/ not even understand the concept of shareholders and fiduciary duty?

Anonymous
09/19/24(Thu)23:39:45 No.102466204

Anonymous 09/19/24(Thu)23:39:45 No.102466204

>>102466176
are you really surprised that this general is retarded

Anonymous
09/19/24(Thu)23:40:09 No.102466206

Anonymous 09/19/24(Thu)23:40:09 No.102466206

>>102466114
The "enthusiast" market is very small and they (amd and nvidia) want to rip off the developers and server farms.
Not even intel is releasing a high vram card.

Anonymous
09/19/24(Thu)23:40:46 No.102466210

Anonymous 09/19/24(Thu)23:40:46 No.102466210

File: 1701393025957061.png (68 KB, 1143x217)

68 KB PNG

>>102466176
>Oops, dear AMD shareholders. We don't need big high-vram cards. Our priority as a company is to conquer the budget segment. Thank you for your support. By the way, if you need workstation AI cards, there's someone I can introduce you to over at nvidia. t.Lisa Huang

Anonymous
09/19/24(Thu)23:43:38 No.102466240

Anonymous 09/19/24(Thu)23:43:38 No.102466240

The model I'm using doesn't specify in the model card, what's a good GPU Layer setting for 12B? If it matters I'm using an AMD card.

Anonymous
09/19/24(Thu)23:45:22 No.102466256

Anonymous 09/19/24(Thu)23:45:22 No.102466256

>>102466210
Didn't nvidia do the same thing last week? More evidence of a familial graphics card duopoly?

Anonymous
09/19/24(Thu)23:45:50 No.102466258

Anonymous 09/19/24(Thu)23:45:50 No.102466258

>>102466088
Try Mistral Nemo if you want to RP. Technically, the new Qwen2.5 32B should be way smarter, but you might get filtered by the official model if you're a newbie about prompting.

Anonymous
09/19/24(Thu)23:46:45 No.102466268

Anonymous 09/19/24(Thu)23:46:45 No.102466268

>>102466240
this isn't tech support

Anonymous
09/19/24(Thu)23:47:53 No.102466279

Anonymous 09/19/24(Thu)23:47:53 No.102466279

>>102466240
>[[[12]]]B
>how many layers guys???

Anonymous
09/19/24(Thu)23:48:28 No.102466286

Anonymous 09/19/24(Thu)23:48:28 No.102466286

>>102466268
It is if I want it to be, Kabir.

Anonymous
09/19/24(Thu)23:49:05 No.102466290

Anonymous 09/19/24(Thu)23:49:05 No.102466290

>>102466256
>why are two corporations doing what makes more money instead of what I want them to do?
>must be a conspiracy!

Anonymous
09/19/24(Thu)23:49:39 No.102466296

Anonymous 09/19/24(Thu)23:49:39 No.102466296

>>102466279
Well Stheno recommends 33 for 8B, so you tell me. Why don't you let Kabir answer instead, Poindexter.

Anonymous
09/19/24(Thu)23:52:25 No.102466311

Anonymous 09/19/24(Thu)23:52:25 No.102466311

>>102466290
When one corporation walks away from an entire market share and their "competitor" follows suit it paints a pretty clear picture.

Anonymous
09/20/24(Fri)00:01:50 No.102466400

Anonymous 09/20/24(Fri)00:01:50 No.102466400

>>102466311
What is opportunity cost?

Anonymous
09/20/24(Fri)00:01:51 No.102466401

Anonymous 09/20/24(Fri)00:01:51 No.102466401

>>102466311
all the car companies on earth are walking away from the market share of people demanding a car that can go 300mph and double as a boat
it must be because they're colluding against us and not that they independently judged it's a market not worth pursuing

Anonymous
09/20/24(Fri)00:03:35 No.102466407

Anonymous 09/20/24(Fri)00:03:35 No.102466407

>>102466400
It's illegal for a de facto duopoly to coordinate something as large as a withdrawal form an entire market.
>>102466401
>All
Quite the duopoly you got there.

Anonymous
09/20/24(Fri)00:04:48 No.102466417

Anonymous 09/20/24(Fri)00:04:48 No.102466417

>>102466407
Who said they coordinated anything? I can't tell if you're genuinely retarded or just baiting.

Anonymous
09/20/24(Fri)00:05:20 No.102466421

Anonymous 09/20/24(Fri)00:05:20 No.102466421

>>102466417
The reports are three days apart with the only thing changed being the logo and company name.

Anonymous
09/20/24(Fri)00:11:34 No.102466456

Anonymous 09/20/24(Fri)00:11:34 No.102466456

Midnight Miku 103B, 0.3t/s...
I'm not sure it's worth it.

Anonymous
09/20/24(Fri)00:15:05 No.102466484

Anonymous 09/20/24(Fri)00:15:05 No.102466484

>>102466456
Definitely not worth it.
Maybe I could stand 1t/s if the outputs were really good.

Anonymous
09/20/24(Fri)00:22:55 No.102466535

Anonymous 09/20/24(Fri)00:22:55 No.102466535

>>102459520
I'm new to the party. What's the difference between instruct and non-instruct models. Should instruct not be used for RP or something?

Anonymous
09/20/24(Fri)00:23:57 No.102466541

Anonymous 09/20/24(Fri)00:23:57 No.102466541

>>102466114
They're controlled opposition.

Anonymous
09/20/24(Fri)00:34:02 No.102466619

Anonymous 09/20/24(Fri)00:34:02 No.102466619

>>102466114
They did... for enterprise. Mi300X has 192 Gigs of VRAM.

Anonymous
09/20/24(Fri)00:40:41 No.102466654

Anonymous 09/20/24(Fri)00:40:41 No.102466654

>>102466619
That shit is priced like an A100. Doesn't help us.

Anonymous
09/20/24(Fri)00:53:27 No.102466742

Anonymous 09/20/24(Fri)00:53:27 No.102466742

>>102466114
Consumer grade? Because shareholders wish they bought Nvidia stock and restrict AMD's movements to "Nvidia's success comes from artificial scarcity, so we should copy that.".

I'm excited brehs, my aom-sxmv (4xv100 32gb) gets here in a couple days.

Anonymous
09/20/24(Fri)00:53:33 No.102466744

Anonymous 09/20/24(Fri)00:53:33 No.102466744

>>102466535
Base models aren't tuned for a specific task and rely on in-context learning to do what you want, they're meant to be fine-tuned.
Instruct models are tuned to be prompted in a more explicit way like "do this, do that." Usually with a specific format. But they learn from the context too.
They're easier to prompt because they seem to listen. Since the anon in the original post refused to show any screenshot, he's likely just pretending to be a "skillchad".
People that tell others to use base models usually have nothing to show.

Anonymous
09/20/24(Fri)01:10:15 No.102466885

Anonymous 09/20/24(Fri)01:10:15 No.102466885

>>102466114
Same reason NVIDIA won't. They would cannibalize their datacenter customers and lose money. Big labs are happy to buy AMD's Instinct cards because they write their own software for them instead of having to rely on charityware. The money they would get from localfags and small labs looking for an alternative to stacking 3090s would be basically zero compared to their revenue from their existing big VRAM cards.

Anonymous
09/20/24(Fri)01:13:45 No.102466914

Anonymous 09/20/24(Fri)01:13:45 No.102466914

>>102466742
What happens when you find out that even top end LLMs have the same issues as vramlet LLMs?

Anonymous
09/20/24(Fri)01:19:14 No.102466957

Anonymous 09/20/24(Fri)01:19:14 No.102466957

>>102466914
>4xv100 32gb
>top end LLMs
he's just gonna run largestral, but at least he'll run it fast

Anonymous
09/20/24(Fri)01:25:20 No.102466995

Anonymous 09/20/24(Fri)01:25:20 No.102466995

>>102466742
>4xv100 32gb
why would you buy them now when the prices are about to crash when datacenters start dumping their stock?

Anonymous
09/20/24(Fri)01:27:29 No.102467015

Anonymous 09/20/24(Fri)01:27:29 No.102467015

>>102466995
Clearly, he needs them now to spam disinformation before the election.

Anonymous
09/20/24(Fri)01:28:09 No.102467018

Anonymous 09/20/24(Fri)01:28:09 No.102467018

File: hires[1]_00000_970041576.png (1 MB, 768x1152)

1 MB PNG

>>102466742
>aom
abyss orange mix??

Anonymous
09/20/24(Fri)01:28:15 No.102467021

Anonymous 09/20/24(Fri)01:28:15 No.102467021

File: 1712766093092696.jpg (7 KB, 279x181)

7 KB JPG

>OpenAI’s latest fundraising is nearing completion, with prospective investors set to find out Friday whether they’ll be part of the deal, according to people familiar with the matter.

>The $6.5 billion funding round for the artificial intelligence startup is oversubscribed, meaning investors were hoping to put in more money than the company was ready to take on, said the people, who asked not to be identified discussing private information. One of the people said that the excess demand was in the billions of dollars, and some investors will find out Friday that they did not make the cut.

>OpenAI declined to comment.

>Several strategic investors, including OpenAI’s biggest backer Microsoft Corp. and new investors Nvidia Corp. and Apple Inc., are likely to get access, the person said.

>The deal is set to value OpenAI at $150 billion, a total that doesn’t include the new investment, people familiar with the matter told Bloomberg. The company was last valued at $86 billion in an earlier financing deal.

>At least one notable existing OpenAI investor won’t be participating — Sequoia Capital, the people said. Sequoia recently backed a rival AI business, Safe Superintelligence Inc., which was started by OpenAI co-founder Ilya Sutskever, who departed the Sam Altman-led company earlier this year. Sequoia didn’t immediately respond to a request for comment.

>Existing investor Thrive Capital is leading the current round and writing a check for $1.25 billion, the people said. Thrive Capital declined to comment.

Anonymous
09/20/24(Fri)01:29:38 No.102467027

Anonymous 09/20/24(Fri)01:29:38 No.102467027

>>102466957
He won't be running shit fast on ancient Volta cards.

Anonymous
09/20/24(Fri)01:32:47 No.102467056

Anonymous 09/20/24(Fri)01:32:47 No.102467056

Coming from diffusion threads, this seems to be the reason loras suck in general. Can this be applicable to LLMs?
https://github.com/kohya-ss/sd-scripts/discussions/294#discussioncomment-10198552
>this is a very big problem for practical LoRA training, because we're training a whole bunch of layers with different geometry and norms. The effect of this is that the matrices which produce gradients with larger norms will make changes to the output of the model at a significantly faster rate - orders of magnitude, perhaps - than the smaller layers. This essentially guarantees that LoRA training will concentrate most of the learning in those large layers, and will overtrain long before the small layer can begin to exert any significant influence.

Anonymous
09/20/24(Fri)01:35:14 No.102467077

Anonymous 09/20/24(Fri)01:35:14 No.102467077

Is magnum 34b v3 any good? I was pondering using some of the 30b models at 5 bpw instead of 2.5 bpw 70b stuff.

Anonymous
09/20/24(Fri)01:35:40 No.102467084

Anonymous 09/20/24(Fri)01:35:40 No.102467084

>>102467027
They're still plenty fast and work fine in llama.cpp

Anonymous
09/20/24(Fri)01:44:45 No.102467151

Anonymous 09/20/24(Fri)01:44:45 No.102467151

Here's a conspiracy theory. """They""" are trying to reduce interest in base models by posting in the thread to make the people who claim to use base models look bad. People catch on when someone is being a retarded shill, but because this is a predictable behavior, it can be exploited. So if people associate base models with retard shills, then has the effect of also reducing subconscious interest in base models. Or at least that's the theory.
However, now I have caught on, and I think the lesson here is to never ever trust not only the truthfulness of anything posted in the threads (unless it has convincing evidence), but also what one might presume are the intentions of the posters.
Ok I'm doing schizoing.

Anonymous
09/20/24(Fri)01:45:29 No.102467160

Anonymous 09/20/24(Fri)01:45:29 No.102467160

>>102467151
what

Anonymous
09/20/24(Fri)01:49:33 No.102467191

Anonymous 09/20/24(Fri)01:49:33 No.102467191

>>102467160
why

Anonymous
09/20/24(Fri)01:56:34 No.102467248

Anonymous 09/20/24(Fri)01:56:34 No.102467248

>>102467018
That model brings back memories of simpler days...
And days with more man-made horrors beyond my comprehension

Anonymous
09/20/24(Fri)02:00:45 No.102467274

Anonymous 09/20/24(Fri)02:00:45 No.102467274

>>102466995
1500 bucks for the whole setup.
Figured it was worth it.

Anonymous
09/20/24(Fri)02:03:18 No.102467297

Anonymous 09/20/24(Fri)02:03:18 No.102467297

>>102466914
I already know, I'm not looking to run 400 gig models just looking for more room than 12 gig on a budget.
The setup was the most budget friendly I could justify. Move my 6900xt to an M.2 slot, 2xPCIe slots for the board gets me 128 gig of v100 for 1500 bucks.

Anonymous
09/20/24(Fri)02:03:42 No.102467301

Anonymous 09/20/24(Fri)02:03:42 No.102467301

>>102467021
Whoever is responsible for non financial institutions being legally allowed to invest in other non financial institutions needs to be publicly executed.

Anonymous
09/20/24(Fri)02:04:51 No.102467312

Anonymous 09/20/24(Fri)02:04:51 No.102467312

>>102467301
cool it with the anti-semitic remarks

Anonymous
09/20/24(Fri)02:07:13 No.102467328

Anonymous 09/20/24(Fri)02:07:13 No.102467328

>>102467021
He won

Anonymous
09/20/24(Fri)02:11:08 No.102467360

Anonymous 09/20/24(Fri)02:11:08 No.102467360

>>102467027
4x v100 gets me the same raw fp16 performance as 2x 4090, but I get 3x the ram (and HBM2) and way less than half the cost.
In terms of training it's equivalent to an H100.

Anonymous
09/20/24(Fri)02:14:32 No.102467391

Anonymous 09/20/24(Fri)02:14:32 No.102467391

>>102467360
>In terms of training it's equivalent to an H100.
limited greatly by the lack of flash attention support though
at least llama.cpp's fa will work on it - that seems to work on everything somehow, even amd

Anonymous
09/20/24(Fri)02:15:38 No.102467402

Anonymous 09/20/24(Fri)02:15:38 No.102467402

>>102467391
flash attention isn't for training...

Anonymous
09/20/24(Fri)02:18:08 No.102467421

Anonymous 09/20/24(Fri)02:18:08 No.102467421

>>102467391
Yeah, missing flash attention 2 but there are two hacky FA 1.5's out there that run on it.
I'll be sad about those things in the future, sure. Meanwhile I'll be happy for a year or two, or until we get transformers cards.

Anonymous
09/20/24(Fri)02:18:33 No.102467427

Anonymous 09/20/24(Fri)02:18:33 No.102467427

>>102467402
it absolutely is, makes a huge difference in vram usage and speed when training, based on the context you're using for your examples

Anonymous
09/20/24(Fri)02:19:49 No.102467442

Anonymous 09/20/24(Fri)02:19:49 No.102467442

What is flash attention again?

Anonymous
09/20/24(Fri)02:21:06 No.102467451

Anonymous 09/20/24(Fri)02:21:06 No.102467451

>>102467442
https://github.com/Dao-AILab/flash-attention

Anonymous
09/20/24(Fri)02:21:24 No.102467454

Anonymous 09/20/24(Fri)02:21:24 No.102467454

>>102467442
when you get a woman to notice you by dropping your pants in public

Anonymous
09/20/24(Fri)02:33:54 No.102467551

Anonymous 09/20/24(Fri)02:33:54 No.102467551

File: 00355-55.png (687 KB, 1280x720)

687 KB PNG

>>102467018
The good old days.

>>102467248
Real.

Anonymous
09/20/24(Fri)02:40:12 No.102467591

Anonymous 09/20/24(Fri)02:40:12 No.102467591

Do any of the CPU-oriented backends make any decent use of AVX512?

Anonymous
09/20/24(Fri)02:43:13 No.102467616

Anonymous 09/20/24(Fri)02:43:13 No.102467616

>>102467604
>>102467604
>>102467604

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/20/24(Fri)02:45:01 No.102467631

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/20/24(Fri)02:45:01 No.102467631

>>102464789
Instead of perplexity it would make more sense to do the comparison using KL divergence since with the same number of input tokens you get much better statistical precision.
Also don't just discard the uncertainties that llama.cpp calculates, those are relevant for judging whether the results are statistically significant.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/20/24(Fri)03:27:35 No.102467977

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/20/24(Fri)03:27:35 No.102467977

>>102465401
Good paper.

>>102465566
My intuition is that it probably won't matter because the precision loss from 4 bit quantization is so much larger than the difference between BF16 and FP8.
But I'm not feeling confident about this.

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/20/24(Fri)03:44:43 No.102468102

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/20/24(Fri)03:44:43 No.102468102

>>102467360
I don't know what metric you're using but in terms of FP16 tensor core performance 1x RTX 4090 is equivalent to 3x V100, 1x H100 is equivalent to 7-9x V100 (depending on PCIe vs. SXM).
You will also run into issues regarding support for data types since BF16 is only supported starting at Ampere and FP8 is only supported starting at Ada Lovelace.
And as that other Anon pointed out, you will also run into issues with software support.

>>102467391
>at least llama.cpp's fa will work on it - that seems to work on everything somehow, even amd
The AMD performance is very poor though, you would probably need to do a dedicated ROCm implementation to fix it.

Anonymous
09/20/24(Fri)04:02:00 No.102468232

Anonymous 09/20/24(Fri)04:02:00 No.102468232

>>102468102
>The AMD performance is very poor though, you would probably need to do a dedicated ROCm implementation to fix it.
Yeah but for me that was more than made up for by being able to actually fit large models like CR+ on my 7900 cards, which used to not even be able to hold the context alone on any inference engine. It may not be optimized but I'll appreciate that it exists at all because it saved my ass.

Anonymous
09/20/24(Fri)04:16:59 No.102468312

Anonymous 09/20/24(Fri)04:16:59 No.102468312

>>102463755
>I know guys think being single is so bad, but women are demons, not angels.
that's why I unironically envy faggots, at least they get to be in genuine relationship with the lesser devil kek

Anonymous
09/20/24(Fri)04:19:05 No.102468334

Anonymous 09/20/24(Fri)04:19:05 No.102468334

>>102463739
there's no such thing as unified sampling or "solved sampling", it depends on your use case, if you want some boring assistant interactions you go for top_k = 1, if you want to RP or writing story then you have more options

Anonymous
09/20/24(Fri)04:21:21 No.102468350

Anonymous 09/20/24(Fri)04:21:21 No.102468350

>>102465401
I never knew fp8 training was so unstable when we had fucking 1.58 bit training that worked fine (BitNet)

Anonymous
09/20/24(Fri)04:27:33 No.102468399

Anonymous 09/20/24(Fri)04:27:33 No.102468399

>>102465566
I think the quantization will be even better, going from bf16 to 4bit is a higher step than going from fp8 to 4bit

Anonymous
09/20/24(Fri)04:28:34 No.102468407

Anonymous 09/20/24(Fri)04:28:34 No.102468407

>>102465401
I don't get it, fp8 is diverging on "normal training" yet we managed to make BitNet with the same training method as fp16, that's so weird

Anonymous
09/20/24(Fri)04:30:36 No.102468421

Anonymous 09/20/24(Fri)04:30:36 No.102468421

File: file.png (536 KB, 686x386)

536 KB PNG

>>102465401
>The only combination that is able to converge to baseline is the first moment format and second moment E5M2 format
that's interesting, on the image model ecosystem, we also noticed that doing E4M3 inference gives image that is closer to bf16 compared to E5M2

Anonymous
09/20/24(Fri)04:32:18 No.102468435

Anonymous 09/20/24(Fri)04:32:18 No.102468435

>>102465401
now I'm starting to wonder if Smooth SwiGLU could be beneficial to fp16 training/inference aswell

Anonymous
09/20/24(Fri)04:34:00 No.102468444

Anonymous 09/20/24(Fri)04:34:00 No.102468444

>>102466206
>The "enthusiast" market is very small
still, AMD making giant VRAM cards would be good for data centers, and that field is literally a money glitch, that's where Nvdia makes its most profit

Anonymous
09/20/24(Fri)04:35:07 No.102468456

Anonymous 09/20/24(Fri)04:35:07 No.102468456

File: Screenshot_20240920-05314(...).png (176 KB, 1080x823)

176 KB PNG

>>102463585
>Russian
>Chechen
>life is punishment, life sucks
Picture me surprised :o

Anonymous
09/20/24(Fri)04:45:22 No.102468544

Anonymous 09/20/24(Fri)04:45:22 No.102468544

>>102468444
That's where AMD makes the most profit too. They're much smaller than Nvidia but the market dynamic is the same. I don't know why people are acting like there's this untapped market AMD is ignoring. The hyperscalers are buying every fucking card they can, but it's all botlenecked by Taiwan's production capacity in the end.

Anonymous
09/20/24(Fri)04:47:53 No.102468566

Anonymous 09/20/24(Fri)04:47:53 No.102468566

>>102468544
yeah but AMD could make even more money by being cheaper than Nvdia on those entreprise cards, making a data center is expensive as hell and Nvdia's cards are way too overpriced, it's not like they're reaching their limit or profitability or something, all AMD has to do is to make something equivalent and cheaper, and the data centers would jump ship rapidly

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.