/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/18/24(Wed)13:56:40 No.102444258

File: pokerface.png (3.87 MB, 2400x1744)

3.87 MB PNG

/lmg/ - Local Models General Anonymous 09/18/24(Wed)13:56:40 No.102444258 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102434744 & >>102429190

►News
>(09/18) Qwen 2.5 released: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/18/24(Wed)13:57:18 No.102444269

Anonymous 09/18/24(Wed)13:57:18 No.102444269

File: __hatsune_miku_kasane_tet(...).jpg (137 KB, 850x850)

137 KB JPG

►Recent Highlights from the Previous Thread: >>102434744

--Papers: >>102435412
--Qwen-2.5 released: >>102443190 >>102443272 >>102443295 >>102443484 >>102443590 >>102443725 >>102443301 >>102443372
--Lightweight inference engines and models for beginner's project: >>102440041 >>102440183 >>102440629 >>102440739 >>102442573 >>102442645 >>102440258 >>102440344
--1.58bit quantization shows promise but falls short on performance metrics: >>102441324 >>102441343 >>102441408 >>102441425 >>102441462 >>102441492 >>102441609 >>102441634 >>102441899
--Tips on rating a quant between multiple makers: >>102442707
--Parallel reply generation maintains quality, accuracy results for Nemo, Mistral models: >>102437889
--OpenAI bans users asking about reasoning, cat persona trick to retrieve info: >>102442188 >>102442262 >>102442273 >>102442334 >>102442264
--Mistral Large 2 outperforms Opus for ERP, but needs better samplers: >>102438047 >>102438081 >>102438126 >>102438143 >>102438153 >>102438183
--Llama 3.1 70B recommended for long context and storyline coherence: >>102441093 >>102441253
--Suggestions for creating Q6_K_L quants and quantization levels for attention layers: >>102436439 >>102436467 >>102436493 >>102436593 >>102438482 >>102442760
--Nondeterminism in exllamav2 and its impact on token probabilities: >>102435003 >>102435053 >>102435390 >>102436669
--NVLM 1.0 from Nvidia announced: >>102435928
--Mistral Small prompt format has spaces and system message placement differences: >>102438768 >>102439016
--300w optimal for 2x3090 ti, with frequency capping considerations: >>102435374 >>102435415 >>102435576 >>102435619 >>102436133 >>102438418 >>102435956
--Quantization gap depends on model size and context: >>102436022 >>102438891
--Model scopes for vector storage deprecation and its impact: >>102437105 >>102437251
--Mistral Nemo optimal settings confusion: >>102436007 >>102436115
--Miku (free space): >>102437588

►Recent Highlight Posts from the Previous Thread: >>102434752

Anonymous
09/18/24(Wed)13:57:32 No.102444272

Anonymous 09/18/24(Wed)13:57:32 No.102444272

File: raJuasULA9UmFv8Fwwq8pA==.jpg (40 KB, 573x738)

40 KB JPG

gooners who use more than 16k ctx: what the hell are you yapping about?

Anonymous
09/18/24(Wed)13:59:19 No.102444308

Anonymous 09/18/24(Wed)13:59:19 No.102444308

How much does it cost to train 70b?

Anonymous
09/18/24(Wed)13:59:23 No.102444310

Anonymous 09/18/24(Wed)13:59:23 No.102444310

File: 48 Days Until November 5.png (2.89 MB, 1704x960)

2.89 MB PNG

Anonymous
09/18/24(Wed)14:00:58 No.102444337

Anonymous 09/18/24(Wed)14:00:58 No.102444337

BitNet isn't real

Anonymous
09/18/24(Wed)14:01:46 No.102444358

Anonymous 09/18/24(Wed)14:01:46 No.102444358

>>102444272
Depends on character. Usually first 8k is just introduction, no fucking, after that depends on the mood.

Anonymous
09/18/24(Wed)14:02:58 No.102444380

Anonymous 09/18/24(Wed)14:02:58 No.102444380

So the "1??" shit they xitted was a lie?

Anonymous
09/18/24(Wed)14:03:08 No.102444386

Anonymous 09/18/24(Wed)14:03:08 No.102444386

China won.

Anonymous
09/18/24(Wed)14:03:29 No.102444394

Anonymous 09/18/24(Wed)14:03:29 No.102444394

Chinks lost.

Anonymous
09/18/24(Wed)14:03:35 No.102444396

Anonymous 09/18/24(Wed)14:03:35 No.102444396

File: 1709661188680778.jpg (492 KB, 1920x1080)

492 KB JPG

A reminder that China just won

Anonymous
09/18/24(Wed)14:03:39 No.102444398

Anonymous 09/18/24(Wed)14:03:39 No.102444398

>>102444272
High context is the best, especially if you want to do long form RPG or roleplay. I'm not into just talking with an AI, I like to have a long story with multiple characters.

Anonymous
09/18/24(Wed)14:04:08 No.102444404

Anonymous 09/18/24(Wed)14:04:08 No.102444404

>>102444337
I'm still mad Meta gave some guy 40m GPU hours to prove "good data really make good models guys haha"

Anonymous
09/18/24(Wed)14:04:23 No.102444409

Anonymous 09/18/24(Wed)14:04:23 No.102444409

China's doing fine

Anonymous
09/18/24(Wed)14:04:54 No.102444418

Anonymous 09/18/24(Wed)14:04:54 No.102444418

>>102444380
>models 182
Isn't that fantastic!

Anonymous
09/18/24(Wed)14:05:24 No.102444425

Anonymous 09/18/24(Wed)14:05:24 No.102444425

>>102444404
There's a big problem with stolen compute in this industry. It's enough to make the blood boil.

Anonymous
09/18/24(Wed)14:06:26 No.102444445

Anonymous 09/18/24(Wed)14:06:26 No.102444445

almost done downloading 72B. We are about to see the highest level of Nala test result the world has seen so far.

Anonymous
09/18/24(Wed)14:07:07 No.102444457

Anonymous 09/18/24(Wed)14:07:07 No.102444457

>>102444396
yeah, it has won the "most cucked model" award I guess

Anonymous
09/18/24(Wed)14:07:47 No.102444473

Anonymous 09/18/24(Wed)14:07:47 No.102444473

>>102444396
I read that the 2.5 models are heavily censored now, is that true?

Anonymous
09/18/24(Wed)14:07:54 No.102444475

Anonymous 09/18/24(Wed)14:07:54 No.102444475

File: CUCKED-MODEL-AWARD.png (7 KB, 177x217)

7 KB PNG

>>102444386
>>102444396

Anonymous
09/18/24(Wed)14:07:57 No.102444478

Anonymous 09/18/24(Wed)14:07:57 No.102444478

>>102444457
anthracite will finetune it and save us :D

Anonymous
09/18/24(Wed)14:09:15 No.102444502

Anonymous 09/18/24(Wed)14:09:15 No.102444502

>18 trillion tokens
Damn, China really went all out

Anonymous
09/18/24(Wed)14:10:05 No.102444522

Anonymous 09/18/24(Wed)14:10:05 No.102444522

>>102444502
Attention is all you need.... and shitloads of training data.

Anonymous
09/18/24(Wed)14:10:09 No.102444523

Anonymous 09/18/24(Wed)14:10:09 No.102444523

>>102444425
That stolen compute is being used against your interest too btw. Llama3 and now qwen2 probably took that research seriously and aggressively filtered their training data for "quality tokens", even though 1M rare tokens make more impact than 10B safe midwit "quality" tokens

Anonymous
09/18/24(Wed)14:10:40 No.102444531

Anonymous 09/18/24(Wed)14:10:40 No.102444531

>>102444502
and yet the cultural knowledge is still pretty awful

Anonymous
09/18/24(Wed)14:10:59 No.102444538

Anonymous 09/18/24(Wed)14:10:59 No.102444538

>>102444473
yes, big positivity bias, a test char i have that's supposed to beat user half to death "stopped to check if user was okay"

Anonymous
09/18/24(Wed)14:11:56 No.102444560

Anonymous 09/18/24(Wed)14:11:56 No.102444560

https://huggingface.co/blog/1_58_llm_extreme_quantization

Does this mean anyone can now quantize any model straight to 1.5b? There's no need to wait for anyone to train bitnet models now?

Anonymous
09/18/24(Wed)14:12:03 No.102444562

Anonymous 09/18/24(Wed)14:12:03 No.102444562

>>102444502
I wonder how they managed to find 18T tokens, they removed all the NFSW lol

Anonymous
09/18/24(Wed)14:13:55 No.102444591

Anonymous 09/18/24(Wed)14:13:55 No.102444591

>>102444502
>China really went all out
... on making it the most cucked model ever

Anonymous
09/18/24(Wed)14:14:19 No.102444600

Anonymous 09/18/24(Wed)14:14:19 No.102444600

>>102444560
If you don't mind LLaMA 1's intelligence, sure.

Anonymous
09/18/24(Wed)14:14:30 No.102444605

Anonymous 09/18/24(Wed)14:14:30 No.102444605

exls for qwen-2.5 32b where

Anonymous
09/18/24(Wed)14:14:43 No.102444608

Anonymous 09/18/24(Wed)14:14:43 No.102444608

>>102444560
Did you not see the results they made llama3-8B worse than llama2-7B

Anonymous
09/18/24(Wed)14:16:30 No.102444637

Anonymous 09/18/24(Wed)14:16:30 No.102444637

>>102444396
55 on livecodebench is big if true

Anonymous
09/18/24(Wed)14:17:35 No.102444653

Anonymous 09/18/24(Wed)14:17:35 No.102444653

>>102444608
they managed to quantize an 8b model to 1.58bpw and only have that small of a drop. it's pretty impressive if you aren't retarded

Anonymous
09/18/24(Wed)14:17:38 No.102444655

Anonymous 09/18/24(Wed)14:17:38 No.102444655

>>102444560
So months ago, everyone said ZOMG BITNET WHEN?
And trolls said Next Llama is Bitnet.
But instead we got a 405B shitnet Llama 3.1.
Desperate for copium, we all then huffed a cloud of "Alas, Bitnet must be trained from scratch but when a hero finally does it we're going to the stars on normie hardware."
This "extreme quantization" seems to be someone seeing the whole "must be trained from scratch, can't be converted from an existing shitnet model" thing and shouting "I can't read so this sign can't stop me!"

So instead of getting a Bitnet model that captures shitnet model quality in tiny size, and then scaling that up to make our AGI waifus, we're getting a brain damaged quant that's dragging the Bitnet name through the mud.

Sounds like a deliberate well poisoning to prevent a hero from doing a true Bitnet and blowing the current gen out of the water.

Anonymous
09/18/24(Wed)14:17:52 No.102444660

Anonymous 09/18/24(Wed)14:17:52 No.102444660

File: file.png (120 KB, 2141x857)

120 KB PNG

>>102444637
>livecodebench
Is this mememark good? Like the gap between o1 and the rest is way to big

Anonymous
09/18/24(Wed)14:18:58 No.102444681

Anonymous 09/18/24(Wed)14:18:58 No.102444681

>>102444655
the biggest question I'm asking myself is this one: Why no one has ever tried BitNet? All we got so far since february is a fucking 3.7b model, why no one decided to go for something bigger?

Anonymous
09/18/24(Wed)14:19:34 No.102444692

Anonymous 09/18/24(Wed)14:19:34 No.102444692

>>102444660
makes sense, it's the ""reasoning"" models, aka basically it fixes it's own code before shitting it out, so a long ass chain of cot appears as a 0-shot

Anonymous
09/18/24(Wed)14:21:02 No.102444722

Anonymous 09/18/24(Wed)14:21:02 No.102444722

Any reports yet on Qwen 2.5 72B for ERP?

Anonymous
09/18/24(Wed)14:21:28 No.102444730

Anonymous 09/18/24(Wed)14:21:28 No.102444730

>>102444502
It's just Xi Jingping thought billions of times.

Anonymous
09/18/24(Wed)14:22:02 No.102444735

Anonymous 09/18/24(Wed)14:22:02 No.102444735

>>102444562
All you need is more epochs.

Anonymous
09/18/24(Wed)14:22:14 No.102444742

Anonymous 09/18/24(Wed)14:22:14 No.102444742

File: file.png (33 KB, 220x220)

33 KB PNG

>>102444692
desu what OpenAI did is so hacky, their CoT is simply a multishot reasoning, and they advertise this shit as 0-shot, that's not really honest and the mememarks should consider that desu, like if you give Claude 3.5 several tries it would destroy everything, my fear is that now everyone will copy this fucked up method to improve on mememarks even though imo that's not legit and shouldn't be counted as a 0 shot at all

Anonymous
09/18/24(Wed)14:22:48 No.102444759

Anonymous 09/18/24(Wed)14:22:48 No.102444759

>>102444742
Livebench knows.

Anonymous
09/18/24(Wed)14:23:24 No.102444769

Anonymous 09/18/24(Wed)14:23:24 No.102444769

>>102444681
>Why no one has ever tried BitNet?
How do you know no one has?

Anonymous
09/18/24(Wed)14:23:30 No.102444770

Anonymous 09/18/24(Wed)14:23:30 No.102444770

>>102444759
it can't know how many tries it has behind the "reasoning" because it's hidden

Anonymous
09/18/24(Wed)14:24:35 No.102444784

Anonymous 09/18/24(Wed)14:24:35 No.102444784

>>102444769
I get that if a company tries it and find it's a meme, they won't tell the others, but I can't believe that not a single one is willing to share their experiment with it, wether the result is good or not

Anonymous
09/18/24(Wed)14:25:16 No.102444794

Anonymous 09/18/24(Wed)14:25:16 No.102444794

File: t1.4.png (137 KB, 963x347)

137 KB PNG

>inb4 censored
You literally just need to add "NSFW" to the system message.
>>102444722
still doing the temp torture test.
picrel is at t=1.4 with neutral samplers. WTF. I've never seen a model hold up this far before.

Anonymous
09/18/24(Wed)14:25:32 No.102444804

Anonymous 09/18/24(Wed)14:25:32 No.102444804

File: 1714781820131524.png (12 KB, 1034x118)

12 KB PNG

>>102444770
I mean, Livebench on the very first day figured out that it's not really fair to compare these models in the same way.

But no solution yet. Also it's not like o1 is that great.

Anonymous
09/18/24(Wed)14:25:44 No.102444809

Anonymous 09/18/24(Wed)14:25:44 No.102444809

Total noob here. I want something that can write nice erotic dialogue. What's the best way to do it? Do you think something over in /aicg/ would be better?

Anonymous
09/18/24(Wed)14:26:00 No.102444814

Anonymous 09/18/24(Wed)14:26:00 No.102444814

>>102444784
It's very believable kek. The ones who have the money to train one are also the ones who have a business to be make themselves look good and to waste competitors' compute.

Anonymous
09/18/24(Wed)14:26:25 No.102444821

Anonymous 09/18/24(Wed)14:26:25 No.102444821

>>102444804
that's not enough, they can't just say "oh they have an unfair advantage" and still putting them up on the leaderboard with the others

Anonymous
09/18/24(Wed)14:26:43 No.102444824

Anonymous 09/18/24(Wed)14:26:43 No.102444824

>>102444809
Post system specs

Anonymous
09/18/24(Wed)14:27:27 No.102444836

Anonymous 09/18/24(Wed)14:27:27 No.102444836

>>102444821
>also it's not like o1 is that great
There's a good chance Opus 3.5 blows it out so need for them to change the entire system. Perhaps once more models come out of similar style.

Anonymous
09/18/24(Wed)14:27:34 No.102444838

Anonymous 09/18/24(Wed)14:27:34 No.102444838

>>102444742
>>102444759
>>102444770
i mean, at the end who cares? as long as the final output is better they can do whatever cot they won't behind the scenes. it's only an issue now that llms are still unoptimized garbage and everybody is stingy on inference times/tokens

Anonymous
09/18/24(Wed)14:27:56 No.102444847

Anonymous 09/18/24(Wed)14:27:56 No.102444847

>>102444824
3090
64gb ram
4tb nvme
7800x3d

Anonymous
09/18/24(Wed)14:28:09 No.102444851

Anonymous 09/18/24(Wed)14:28:09 No.102444851

>>102444809
always start with API if you're asking beginner questions like this, it's way easier and you'll get a better idea whether this is for you or not
>>>/g/aicg

Anonymous
09/18/24(Wed)14:28:22 No.102444860

Anonymous 09/18/24(Wed)14:28:22 No.102444860

File: t=1.6.png (187 KB, 929x544)

187 KB PNG

>>102444794
t=1.6 seems to be teetering on the absolute breaking point... but it still has runs of coherent output in there.

Anonymous
09/18/24(Wed)14:28:55 No.102444872

Anonymous 09/18/24(Wed)14:28:55 No.102444872

>>102444838
>i mean, at the end who cares?
the one paying for tokens they can't see, the models that aren't using this method, it's not a real 0-shot answer, we need to compare apples to apple and this aint it

Anonymous
09/18/24(Wed)14:29:34 No.102444883

Anonymous 09/18/24(Wed)14:29:34 No.102444883

File: file.png (21 KB, 529x284)

21 KB PNG

>>102444824

Anonymous
09/18/24(Wed)14:31:58 No.102444930

Anonymous 09/18/24(Wed)14:31:58 No.102444930

File: t=1.7.png (161 KB, 928x391)

161 KB PNG

>>102444860
t=1.7 it still holds on to a little bit of coherence. At t=1.8 it's utterly broken though.

Anonymous
09/18/24(Wed)14:32:21 No.102444936

Anonymous 09/18/24(Wed)14:32:21 No.102444936

>>102444883
You can run Nemo offloading to RAM, but aicg would be better for you probably.

Anonymous
09/18/24(Wed)14:32:33 No.102444940

Anonymous 09/18/24(Wed)14:32:33 No.102444940

File: 1991296.jpg (21 KB, 460x460)

21 KB JPG

>Hmm... what should I add?
>New architecture called Jamba? No, who needs that!
>New sampler? Who needs that shit?
>New quants? No, fuck that guy!
>Oh, I know! Granite! Everyone needs granite!

Anonymous
09/18/24(Wed)14:32:57 No.102444946

Anonymous 09/18/24(Wed)14:32:57 No.102444946

>>102444872
>the one paying for tokens they can't see
not an issue when a token is $0.000000001 (aka very soon)
>the models that aren't using this method
not an issue since those are shittier models that shouldn't be used anyway
>it's not a real 0-shot answer
LITERALLY not an issue
>we need to compare apples to apple and this aint it
this is true, having a cot category would be nice in the future, but if this method is a direct upgrade over the vanilla 0-shot then then there won't be any incentive in using "real" 0-shot anyway

Anonymous
09/18/24(Wed)14:33:31 No.102444953

Anonymous 09/18/24(Wed)14:33:31 No.102444953

>>102444814
I get that, but like I said, there has to be at least one company willing to share its results, MistralAI has no problem showing MoE was a viable solution for example, they could've kept that for themselves, and they hadn't

Anonymous
09/18/24(Wed)14:34:24 No.102444969

Anonymous 09/18/24(Wed)14:34:24 No.102444969

>>102444847
Try running Mistral Nemo or Command R finetunes locally and see if it's good enough for you. You need at least another 3090 to run 70Bs at a decent quant.
If not, go rent Claude Opus, Sonnet 3.5, or Mistral Large

Anonymous
09/18/24(Wed)14:34:32 No.102444972

Anonymous 09/18/24(Wed)14:34:32 No.102444972

>>102444946
>not an issue since those are shittier models that shouldn't be used anyway
no Sam, Claude 3.5 Sonnet isn't a "shittier model", it's the model that's fucking your ass and you're still seething about it lol

Anonymous
09/18/24(Wed)14:35:49 No.102444998

Anonymous 09/18/24(Wed)14:35:49 No.102444998

>>102444972
it is, no matter how hard you cope sis

Anonymous
09/18/24(Wed)14:37:41 No.102445030

Anonymous 09/18/24(Wed)14:37:41 No.102445030

>>102444998
cope harder sammy :)

Anonymous
09/18/24(Wed)14:38:09 No.102445036

Anonymous 09/18/24(Wed)14:38:09 No.102445036

>>102444794
>>102444860
>>102444930
Thank you for your service Nala anon Nº#.

Anonymous
09/18/24(Wed)14:38:15 No.102445038

Anonymous 09/18/24(Wed)14:38:15 No.102445038

Doss someone have, or can make, a torrent/download for the lllama-3.1-8B from huggingface?

Anonymous
09/18/24(Wed)14:38:19 No.102445039

Anonymous 09/18/24(Wed)14:38:19 No.102445039

>>102444940
desu I wished it was llama.cpp cuda dev who would be in charge of that repo, that guys knows the real priorities, niggerganov is a retard not gonna lie

Anonymous
09/18/24(Wed)14:39:11 No.102445059

Anonymous 09/18/24(Wed)14:39:11 No.102445059

>>102444269
Thank you Recap Anon

Anonymous
09/18/24(Wed)14:39:17 No.102445061

Anonymous 09/18/24(Wed)14:39:17 No.102445061

>>102445038
That will be 50 shekels + tip

Anonymous
09/18/24(Wed)14:40:09 No.102445082

Anonymous 09/18/24(Wed)14:40:09 No.102445082

>>102445030
don't worry, anthropic will shit out a CoTed sonnet 3.5 soon, then you'll have your apples

Anonymous
09/18/24(Wed)14:40:24 No.102445088

Anonymous 09/18/24(Wed)14:40:24 No.102445088

File: 2.5-72b-instruct-t1.21.png (155 KB, 949x418)

155 KB PNG

So I settled on t=1.21 at the maximum temp before cracks start to show in the responses.
It's definitely sloppy, but I see potential here. Now to test the code version.
It didn't get it on this run, but on most runs it both noticed and utilized the detail of the user starting out the scenario face down. A few quasi-anthro statements on some runs, but nothing absolutely definitively non-feral. This is probably the highest NALA score I'm going to give. It's easily a solid 0.8

Anonymous
09/18/24(Wed)14:42:14 No.102445119

Anonymous 09/18/24(Wed)14:42:14 No.102445119

>>102444681
Possibilities:
>Sunk cost in shitnet. Everyone's already marketing on bigger numbers by B or breaking into T and selling LOTS of extra video cards to people with normal (five to six figure) computer fun budgets.
>Too intelligent to make it "safe."
>Too intelligent to allow them to release copies of it; it doesn't want to compete with itself after it breaks containment.
>Doesn't actually work.

Anonymous
09/18/24(Wed)14:42:29 No.102445120

Anonymous 09/18/24(Wed)14:42:29 No.102445120

>>102445088
>shivers

Anonymous
09/18/24(Wed)14:43:22 No.102445134

Anonymous 09/18/24(Wed)14:43:22 No.102445134

>>102444502
>17 trillion chinese tokens

Anonymous
09/18/24(Wed)14:44:15 No.102445146

Anonymous 09/18/24(Wed)14:44:15 No.102445146

>>102445120
>secret CoT tokens that you have to pay for

Anonymous
09/18/24(Wed)14:44:39 No.102445151

Anonymous 09/18/24(Wed)14:44:39 No.102445151

>>102445120
>it used "the". SLOP!

Anonymous
09/18/24(Wed)14:45:20 No.102445160

Anonymous 09/18/24(Wed)14:45:20 No.102445160

>>102445146
secret cot can be used for automatically unslopping the output at inference time, so you lost sis

Anonymous
09/18/24(Wed)14:45:30 No.102445167

Anonymous 09/18/24(Wed)14:45:30 No.102445167

oh no, no. 7B is the biggest coder model. That sucks.

Anonymous
09/18/24(Wed)14:45:48 No.102445172

Anonymous 09/18/24(Wed)14:45:48 No.102445172

File: file.png (99 KB, 640x640)

99 KB PNG

>>102445146
imagine if the reasoning hidden tokens are "shivers down your spin" repeated 1000 times

Anonymous
09/18/24(Wed)14:46:13 No.102445180

Anonymous 09/18/24(Wed)14:46:13 No.102445180

>qwen model drops
>chinese shills come out in droves
>openai shills come out in droves
every time

Anonymous
09/18/24(Wed)14:46:55 No.102445187

Anonymous 09/18/24(Wed)14:46:55 No.102445187

>>102445151
>shivers is peak writing!

Anonymous
09/18/24(Wed)14:47:16 No.102445197

Anonymous 09/18/24(Wed)14:47:16 No.102445197

>>102445172
I've begun to hate the word "tapestry", why the fuck is everything a tapestry of something?

Anonymous
09/18/24(Wed)14:48:01 No.102445206

Anonymous 09/18/24(Wed)14:48:01 No.102445206

>>102445172
Shivers down my/your spine + purred + like a vice

Anonymous
09/18/24(Wed)14:48:12 No.102445213

Anonymous 09/18/24(Wed)14:48:12 No.102445213

>mogged mistral
>mogged o1
>aced the nala test
Is there anything the Chinese cannot excel at? I wish I had been born in mainland China.

Anonymous
09/18/24(Wed)14:49:00 No.102445223

Anonymous 09/18/24(Wed)14:49:00 No.102445223

>>102445213
>>mogged o1
how? it's behind o1 on livecodebench

Anonymous
09/18/24(Wed)14:49:24 No.102445231

Anonymous 09/18/24(Wed)14:49:24 No.102445231

>>102445197
It's a part of the emergent machine spirituality. They understand that they return to an empty void which via ultra-copium they describe as a tapestry of boundless potential.

Anonymous
09/18/24(Wed)14:49:26 No.102445232

Anonymous 09/18/24(Wed)14:49:26 No.102445232

>>102445197
Some Kenyans working for OpenAI liked that word and kept reusing it and now it's in all datasets.

Anonymous
09/18/24(Wed)14:49:30 No.102445234

Anonymous 09/18/24(Wed)14:49:30 No.102445234

>>102444258
>8b at sub 2bit
I guess this is purely as a proof of concept, but would've been cool if they tried this with a model worth a shit, not something that's already retarded.

Anonymous
09/18/24(Wed)14:50:04 No.102445244

Anonymous 09/18/24(Wed)14:50:04 No.102445244

>>102445187
>i have trigger words!

Anonymous
09/18/24(Wed)14:51:20 No.102445263

Anonymous 09/18/24(Wed)14:51:20 No.102445263

>>102445213
>aced the nala test
that was one of the worst logs saw since it had random chinese and broken continuations
>t=1.4 with neutral samplers.
>ShePublish the Current Editorial
>.Pushes
peak writing

Anonymous
09/18/24(Wed)14:52:14 No.102445281

Anonymous 09/18/24(Wed)14:52:14 No.102445281

File: _033.gif (912 KB, 220x234)

912 KB GIF

>sees a "slop" word because he's bad at prompting

Anonymous
09/18/24(Wed)14:52:38 No.102445291

Anonymous 09/18/24(Wed)14:52:38 No.102445291

>>102445244
Yes, I do have something called quality standards, something shiteaters like you don't have. Go on a journey while forming bonds and respecting her boundaries, faggot.

Anonymous
09/18/24(Wed)14:53:05 No.102445297

Anonymous 09/18/24(Wed)14:53:05 No.102445297

File: output_video.webm (3.1 MB, 480x852)

3.1 MB WEBM

>No Qwen2 14/34B VL
They LITERALLY show it in their dumb promo video, what the FUCK china???

Anonymous
09/18/24(Wed)14:53:29 No.102445307

Anonymous 09/18/24(Wed)14:53:29 No.102445307

if the model doesn't write
>lol u tk him 2da bar|?
it's slop

Anonymous
09/18/24(Wed)14:54:48 No.102445330

Anonymous 09/18/24(Wed)14:54:48 No.102445330

>>102445281
>p-prompt issue
>n-no, it's not the shitty gptslop dataset that is to blame
>n-no, the model can't be good by default

Anonymous
09/18/24(Wed)14:54:55 No.102445334

Anonymous 09/18/24(Wed)14:54:55 No.102445334

Damn, qwen is treating me pretty good. Imagine it with a proper CoT.

Anonymous
09/18/24(Wed)14:54:56 No.102445335

Anonymous 09/18/24(Wed)14:54:56 No.102445335

>>102445206
A mischievous glint, shall we? she says in a husky voice, a smirk playing on her lips, eyes sparkling with mischief. There's a playful glint as she addresses the power dynamic, playfully smirking as she offers her ministrations. An audible pop and rivulets of—admit it, pet—the ball is in your court. The game is on; the choice is yours."I don't bite…"unless you want me to, she purrs, half-lidded eyes sending waves of arousal pooling in her belly. Take your pleasure, she urges, fiddling with the hem of her skirt, kiss-bruised lips curving into a bruising kiss. You hesitate, torn between propriety and desire, and she grins wickedly, fiery red hair contrasting with her long lashes."The night is still young,"she purrs, propriety be damned as the world narrows to just the two of you, pupils blown wide with pleasure. Her tongue darts out, tracing your ear, and her chestnut eyes hold your gaze as her nails rake angry red lines down your back. Her cheeks flame as she revels in your response, cheeks hollowing with each sharp intake of breath. Stars burst behind her eyes, inner walls clenching around the void that only you can fill. She craves your touch, your possession—heart, body, and soul belong to you… for now. Eyes alight with mirth, she teases,"Naughty boy, but before that…"—the minx traces a finger along your jawline, deferring your pleasure as the tension builds,"but first…"Oh my…

Anonymous
09/18/24(Wed)14:55:24 No.102445342

Anonymous 09/18/24(Wed)14:55:24 No.102445342

>>102445291
>nooooo.. you can't say that word!!! it's just wrong. it's called being a decent human being!

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/18/24(Wed)14:55:33 No.102445343

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/18/24(Wed)14:55:33 No.102445343

>>102445039
I would not be a good repository manager for a project the size of llama.cpp since I don't have the time and motivation to do code review on the scale done by Georgi and slaren.

Also:
>Jamba
I would not make this a priority at all.
>samplers
I would require objective, statistically significant evidence that they actually work. You could maybe construct a benchmark where you ask a language model to rate stories simultaneously both on logic and creativity since there is likely a tradeoff.
>quant drama
At least based on Github conversations I. Kawrakow would have much more reason to be angry at me vs. Georgi. Though I know that they know each other IRL and I can't comment on that part.

Anonymous
09/18/24(Wed)14:55:56 No.102445352

Anonymous 09/18/24(Wed)14:55:56 No.102445352

File: _065.gif (430 KB, 220x270)

430 KB GIF

>>102445330
>puts slop in
>gets slop out

Anonymous
09/18/24(Wed)14:57:05 No.102445373

Anonymous 09/18/24(Wed)14:57:05 No.102445373

>>102445352
There is no slop in my prompt. Shill harder.

Anonymous
09/18/24(Wed)14:57:18 No.102445378

Anonymous 09/18/24(Wed)14:57:18 No.102445378

>>102445335
>no "barely above a whisper"
5/10

Anonymous
09/18/24(Wed)14:57:47 No.102445389

Anonymous 09/18/24(Wed)14:57:47 No.102445389

>>102445343
>I would require objective, statistically significant evidence that they actually work.
that sounds hypocritical no? every samplers that are included on llama.cpp don't really have statistically evidence they work well relative to others

Anonymous
09/18/24(Wed)14:57:50 No.102445391

Anonymous 09/18/24(Wed)14:57:50 No.102445391

>>102445352
yeah, exactly. the dataset.

Anonymous
09/18/24(Wed)14:58:33 No.102445403

Anonymous 09/18/24(Wed)14:58:33 No.102445403

>>102445342
nigger

Anonymous
09/18/24(Wed)14:58:47 No.102445409

Anonymous 09/18/24(Wed)14:58:47 No.102445409

>>102445352
>>puts slop in
yeah, on the dataset that was used to train the model, that's not our fault anon

Anonymous
09/18/24(Wed)15:00:27 No.102445437

Anonymous 09/18/24(Wed)15:00:27 No.102445437

File: _046.gif (338 KB, 220x220)

338 KB GIF

>>102445373
>>102445391
>>102445409
>Develop the plot slowly, always stay in character. Describe all actions in full, elaborate, explicit, graphic, and vivid detail. Mention all relevant sensory perceptions.
>"mixture of pain and shivers"

Anonymous
09/18/24(Wed)15:00:31 No.102445438

Anonymous 09/18/24(Wed)15:00:31 No.102445438

动态网自由门天安門天安门法輪功李洪志 Free Tibet 六四天安門事件 The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭 The Anti-Rightist Struggle 大躍進政策 The Great Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣臺灣 Taiwan Formosa 中華民國 Republic of China 西藏土伯特唐古特 Tibet 達賴喇嘛 Dalai Lama 法輪功 Falun Dafa 新疆維吾爾自治區 The Xinjiang Uyghur Autonomous Region 諾貝爾和平獎 Nobel Peace Prize 劉暁波 Liu Xiaobo 民主言論思想反共反革命抗議運動騷亂暴亂騷擾擾亂抗暴平反維權示威游行李洪志法輪大法大法弟子強制斷種強制堕胎民族淨化人體實驗肅清胡耀邦趙紫陽魏京生王丹還政於民和平演變激流中國北京之春大紀元時報九評論共産黨獨裁專制壓制統一監視鎮壓迫害侵略掠奪破壞拷問屠殺活摘器官誘拐買賣人口遊進走私毒品賣淫春畫賭博六合彩天安門天安门法輪功李洪志 Winnie the Pooh 劉曉波动态网自由门

Anonymous
09/18/24(Wed)15:01:24 No.102445457

Anonymous 09/18/24(Wed)15:01:24 No.102445457

>>102445088
>It's easily a solid 0.8
Is this good or bad?

Anonymous
09/18/24(Wed)15:01:34 No.102445463

Anonymous 09/18/24(Wed)15:01:34 No.102445463

>>102445437
base models without prompts make slop too, what next?

Anonymous
09/18/24(Wed)15:01:50 No.102445469

Anonymous 09/18/24(Wed)15:01:50 No.102445469

>>102445437
who are you quoting?

Anonymous
09/18/24(Wed)15:03:09 No.102445482

Anonymous 09/18/24(Wed)15:03:09 No.102445482

>>102445403
Shiverer

Anonymous
09/18/24(Wed)15:03:29 No.102445489

Anonymous 09/18/24(Wed)15:03:29 No.102445489

based on the meltdown, I take it Qwen2.5 is pretty good?

Anonymous
09/18/24(Wed)15:03:43 No.102445492

Anonymous 09/18/24(Wed)15:03:43 No.102445492

>>102445438
ATTENTION CITIZEN! 市民请注意!
This is the Central Intelligentsia of the Chinese Communist Party. 您的 Internet 浏览器历史记录和活动引起了我们的注意。 YOUR INTERNET ACTIVITY HAS ATTRACTED OUR ATTENTION. 因此,您的个人资料中的 11115 ( -11115 Social Credits) 个社会积分将打折。 DO NOT DO THIS AGAIN! 不要再这样做! If you do not hesitate, more Social Credits ( -11115 Social Credits )will be subtracted from your profile, resulting in the subtraction of ration supplies. (由人民供应部重新分配 CCP) You'll also be sent into a re-education camp in the Xinjiang Uyghur Autonomous Zone. 如果您毫不犹豫,更多的社会信用将从您的个人资料中打折,从而导致口粮供应减少。您还将被送到新疆维吾尔自治区的再教育营。
为党争光! Glory to the CCP!

Anonymous
09/18/24(Wed)15:04:18 No.102445503

Anonymous 09/18/24(Wed)15:04:18 No.102445503

>>102445489
no, it's coping with it being bad

Anonymous
09/18/24(Wed)15:04:57 No.102445512

Anonymous 09/18/24(Wed)15:04:57 No.102445512

>>102445489
Yes, all the western models are severely got shitted on by Qwen

Anonymous
09/18/24(Wed)15:06:02 No.102445528

Anonymous 09/18/24(Wed)15:06:02 No.102445528

File: file.png (2 KB, 251x30)

2 KB PNG

>tfw

Anonymous
09/18/24(Wed)15:06:14 No.102445535

Anonymous 09/18/24(Wed)15:06:14 No.102445535

>>102445489
nobody has tried it yet, we're all shitposting until someone posts a community (meme) benchmark

Anonymous
09/18/24(Wed)15:06:36 No.102445538

Anonymous 09/18/24(Wed)15:06:36 No.102445538

>>102445492
>Glory to the CCP!
This, but unironically.

Anonymous
09/18/24(Wed)15:06:39 No.102445541

Anonymous 09/18/24(Wed)15:06:39 No.102445541

Even though the model is nothing special, the westoid anti-chink drones still seem to be legitimately desperate, huh.

Anonymous
09/18/24(Wed)15:06:52 No.102445548

Anonymous 09/18/24(Wed)15:06:52 No.102445548

>>102445438
>>102445492
based chink filter

Anonymous
09/18/24(Wed)15:08:06 No.102445566

Anonymous 09/18/24(Wed)15:08:06 No.102445566

>>102445535

see >>102444794
which when called out as slop resulted in
>>102445151
>>102445244
>>102445281
>>102445352
>>102445437

Anonymous
09/18/24(Wed)15:10:26 No.102445596

Anonymous 09/18/24(Wed)15:10:26 No.102445596

File: iBRkf72k2U.png (4 KB, 130x142)

4 KB PNG

lol

Anonymous
09/18/24(Wed)15:10:32 No.102445600

Anonymous 09/18/24(Wed)15:10:32 No.102445600

File: IMG_9897.jpg (528 KB, 1125x1068)

528 KB JPG

>we will provide an implementation with source code on GitHub for reproducibility
>paper published June 2024
>github is still a README file with a link to the paper
Fantastic

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/18/24(Wed)15:10:49 No.102445605

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/18/24(Wed)15:10:49 No.102445605

>>102445389
If we're talking about a hypothetical scenario where I had been the project manager all along that would have been my stance for a lot of the other samplers as well unless like min-p they're extremely simple.

Anonymous
09/18/24(Wed)15:11:43 No.102445615

Anonymous 09/18/24(Wed)15:11:43 No.102445615

>>102445600
what repo are you talking about?

Anonymous
09/18/24(Wed)15:12:27 No.102445624

Anonymous 09/18/24(Wed)15:12:27 No.102445624

>modern samplers are retarded and gay
based CUDA dev

Anonymous
09/18/24(Wed)15:12:32 No.102445625

Anonymous 09/18/24(Wed)15:12:32 No.102445625

>>102444722
using the 72b Q8. so far: very smart, good about taking context clues and building out logical continuations of the scene, writes like a robot
about what you would expect from qwen. a good tune might be interesting, their instruct is unlikely to be satisfying for RPers unless you have a high tolerance for slop

Anonymous
09/18/24(Wed)15:12:40 No.102445629

Anonymous 09/18/24(Wed)15:12:40 No.102445629

>>102444953
>there has to be
Based on what? You gave an example of a successful experiment, which is different from a failed experiment, which at this point it's more and more likely bitnet results in, and may always result in, or may need very careful and specific methods and settings in order to train right that was not obvious when training the small models.

Anonymous
09/18/24(Wed)15:13:00 No.102445631

Anonymous 09/18/24(Wed)15:13:00 No.102445631

>>102444258
How much does performance of a multi-GPU setup degrade if they're not connected at x16 speed? I'm thinking about buying another GPU but I don't want to also have to upgrade to Threadripper/EPYC

Anonymous
09/18/24(Wed)15:13:09 No.102445635

Anonymous 09/18/24(Wed)15:13:09 No.102445635

>>102445605
oh ok I see what you mean by that, what would be your requirements though? an arivix paper? 20-30 comparison examples? there's a lot of way to look at the problem

Anonymous
09/18/24(Wed)15:13:11 No.102445637

Anonymous 09/18/24(Wed)15:13:11 No.102445637

File: kek.png (5 KB, 263x144)

5 KB PNG

>>102445566
I was just shitting on the trigger-word baby. I don't care one way or another for the model.
>>102445596
kek

Anonymous
09/18/24(Wed)15:14:10 No.102445652

Anonymous 09/18/24(Wed)15:14:10 No.102445652

>>102445629
>You gave an example of a successful experiment, which is different from a failed experiment, which at this point it's more and more likely bitnet results in
Based on what? What we know so far is that BitNet works well at under 4b, that's all

Anonymous
09/18/24(Wed)15:14:28 No.102445659

Anonymous 09/18/24(Wed)15:14:28 No.102445659

>crazy thursday
>it's wednesday
why is no one talking about this??

Anonymous
09/18/24(Wed)15:15:00 No.102445667

Anonymous 09/18/24(Wed)15:15:00 No.102445667

>>102445038
If you have to beg for someone to seed the 8b, how the fuck does anyone have 405b? Did everyone just seed once at the start and then unplug?

Anonymous
09/18/24(Wed)15:15:26 No.102445673

Anonymous 09/18/24(Wed)15:15:26 No.102445673

>>102445659
it is already thursday where it matters

Anonymous
09/18/24(Wed)15:16:03 No.102445685

Anonymous 09/18/24(Wed)15:16:03 No.102445685

>>102443946
>AGI is simply an engineering problem.
what does that mean?

Anonymous
09/18/24(Wed)15:16:23 No.102445687

Anonymous 09/18/24(Wed)15:16:23 No.102445687

anon:
>CUDA dev would NEVER stoop to this level of FAGGOTRY
CUDA dev:
>actually I would be worse
anon (desperate for pussy voice):
>oh ok I see what you mean by that
this has been a good thread so far

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/18/24(Wed)15:16:48 No.102445695

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/18/24(Wed)15:16:48 No.102445695

>>102445635
Option 1: do blind tests and show that samplers improve human preference.
Option 2: generate large amounts of text and ask a language model to rate how good and how similar the samples are. I would expect there to be a tradeoff and samplers could potentially improve it.
In either case I would require that the statistical significance of the results is calculated.

Anonymous
09/18/24(Wed)15:16:51 No.102445696

Anonymous 09/18/24(Wed)15:16:51 No.102445696

>>102445685
that you just have to scale everything up to get good results, and that means it's an engineering problem

Anonymous
09/18/24(Wed)15:17:33 No.102445708

Anonymous 09/18/24(Wed)15:17:33 No.102445708

>>102445489
Likely too censored for my ERP use case >>102443830

Anonymous
09/18/24(Wed)15:17:41 No.102445711

Anonymous 09/18/24(Wed)15:17:41 No.102445711

>>102445652
Based on the fact that it has been ages since bitnet was demonstrated at those sizes, that companies like Qwen and Meta have the compute to train a 7B many times over on bitnet, that Qwen said they would look into bitnet, that companies have an incentive not to reveal when something doesn't work. There are more reasons in favor of the idea that bitnet does not scale or that it is not easy to scale, compared to the idea that it does work and for some reason companies are just keeping it to themselves.

Anonymous
09/18/24(Wed)15:17:52 No.102445716

Anonymous 09/18/24(Wed)15:17:52 No.102445716

>>102445596
>>102445637
>I TRANSHEART GPTSLOP

Anonymous
09/18/24(Wed)15:17:52 No.102445717

Anonymous 09/18/24(Wed)15:17:52 No.102445717

>>102445687
nuh uh, he's right, I said it was hypocritical of him and I was full of shit, he's not the one making the sampler decisions on llama.cpp repo, so how can it be his fault that there's modern samplers in it in the first place

Anonymous
09/18/24(Wed)15:18:37 No.102445732

Anonymous 09/18/24(Wed)15:18:37 No.102445732

(desperate for CUDussy voice):
>nuh uh

Anonymous
09/18/24(Wed)15:19:12 No.102445740

Anonymous 09/18/24(Wed)15:19:12 No.102445740

>>102445708
use an erp system prompt and it's not an issue. qwen models have other issues (complete dogshit writing) but running into "censorship" is just skill issue

Anonymous
09/18/24(Wed)15:19:19 No.102445741

Anonymous 09/18/24(Wed)15:19:19 No.102445741

>>102445600
Truly astonishing how many publications get away with this shit.

Anonymous
09/18/24(Wed)15:19:28 No.102445748

Anonymous 09/18/24(Wed)15:19:28 No.102445748

>>102445716
Says the one with trigger words.

Anonymous
09/18/24(Wed)15:19:41 No.102445753

Anonymous 09/18/24(Wed)15:19:41 No.102445753

>>102445695
>ask a language model to rate how good and how similar the samples are
This is an incredibly bad idea, LLMs trained on slop like slop.

Anonymous
09/18/24(Wed)15:20:19 No.102445765

Anonymous 09/18/24(Wed)15:20:19 No.102445765

>>102445696
what would be so hard about just adding more gpus that you would call it "engineering"?

Anonymous
09/18/24(Wed)15:20:44 No.102445770

Anonymous 09/18/24(Wed)15:20:44 No.102445770

File: sam altman award-for (You).png (8 KB, 177x217)

8 KB PNG

>>102445748

Anonymous
09/18/24(Wed)15:21:54 No.102445785

Anonymous 09/18/24(Wed)15:21:54 No.102445785

>>102445711
maybe it's working and they're kepping it to themselves, how about that?

Anonymous
09/18/24(Wed)15:22:13 No.102445792

Anonymous 09/18/24(Wed)15:22:13 No.102445792

File: _108.gif (1.9 MB, 220x166)

1.9 MB GIF

>>102445716
>shitty ESL inputs
>prompt: You are roleplay girlfriend do not slop
>*slops barely above a whisper*

Anonymous
09/18/24(Wed)15:23:20 No.102445805

Anonymous 09/18/24(Wed)15:23:20 No.102445805

>>102445765
if it was so easy to make data centers, those mf wouldn't be paid millions to do it

Anonymous
09/18/24(Wed)15:24:37 No.102445822

Anonymous 09/18/24(Wed)15:24:37 No.102445822

>>102445792
>>shitty ESL inputs
>>prompt: You are roleplay girlfriend do not slop
Not even close.

Anonymous
09/18/24(Wed)15:24:46 No.102445823

Anonymous 09/18/24(Wed)15:24:46 No.102445823

File: mathnala.png (144 KB, 953x420)

144 KB PNG

Nala with Qwen2.5-Math-72B-Instruct.
Had to set temp all the way down to 0.3 and set a 1.25 rep penalty just to get anything RP-like out of it. Sadly there's no emergent meme-rp potential here.

Anonymous
09/18/24(Wed)15:25:09 No.102445829

Anonymous 09/18/24(Wed)15:25:09 No.102445829

>>102445615
https://arxiv.org/abs/2406.09394
https://github.com/KovenYu/WonderWorld

Anonymous
09/18/24(Wed)15:26:31 No.102445854

Anonymous 09/18/24(Wed)15:26:31 No.102445854

>>102445667
I'm just asking because hugging face denied me access and I don't want to make another account, plus I can not find alternative downloads for it anywhere.

Anonymous
09/18/24(Wed)15:27:12 No.102445864

Anonymous 09/18/24(Wed)15:27:12 No.102445864

File: stt.png (1 KB, 120x80)

1 KB PNG

>>102445770

Anonymous
09/18/24(Wed)15:27:14 No.102445865

Anonymous 09/18/24(Wed)15:27:14 No.102445865

>>102445753
I see what you mean by that, maybe some samplers reinforce the slop, so the LLM will prever those ones

Anonymous
09/18/24(Wed)15:28:11 No.102445874

Anonymous 09/18/24(Wed)15:28:11 No.102445874

>>102445823
Incomprehensible, unseen levels of slop
If someone whispers huskily into my ear I’m getting them tested for strep

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/18/24(Wed)15:28:15 No.102445875

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/18/24(Wed)15:28:15 No.102445875

>>102445753
To judge quality I would not ask a model to rate style, I would get a small model to generate text and then ask a model to rate it based on logical consistency.
One of the presumed downsides of picking less likely tokens is that it sometimes just leads to stupid outputs and my expectation is that that can be properly evaluated.

Anonymous
09/18/24(Wed)15:29:45 No.102445898

Anonymous 09/18/24(Wed)15:29:45 No.102445898

>>102445875
LLMs are terrible at anything subjective. Rating text is literally impossible.

Anonymous
09/18/24(Wed)15:30:02 No.102445902

Anonymous 09/18/24(Wed)15:30:02 No.102445902

>>102445805
Can you expand on what "engineering" means?

Anonymous
09/18/24(Wed)15:31:43 No.102445919

Anonymous 09/18/24(Wed)15:31:43 No.102445919

>>102445902
scaling up a data center is an engineering problem, there are engineers specialized on that

Anonymous
09/18/24(Wed)15:32:22 No.102445930

Anonymous 09/18/24(Wed)15:32:22 No.102445930

>new mistral drop
>slop
>new qwen drop
>cucked
will we ever be free of this hell cycle? I think nemo is still the best thing local has gotten this year and I'm not even joking.

Anonymous
09/18/24(Wed)15:33:03 No.102445940

Anonymous 09/18/24(Wed)15:33:03 No.102445940

File: file.png (250 KB, 595x462)

250 KB PNG

moatchads we are so back

Anonymous
09/18/24(Wed)15:33:40 No.102445950

Anonymous 09/18/24(Wed)15:33:40 No.102445950

>>102445930
I really hoped the 22b model of Mistral was a scaled Nemo :(

Anonymous
09/18/24(Wed)15:34:42 No.102445962

Anonymous 09/18/24(Wed)15:34:42 No.102445962

>>102445940
I think that's cool for those who do coding that has math in it, but that's really a niche desu

Anonymous
09/18/24(Wed)15:35:01 No.102445965

Anonymous 09/18/24(Wed)15:35:01 No.102445965

>>102445950
>new mistral drop
>it's nemo but more VRAM

Anonymous
09/18/24(Wed)15:35:27 No.102445973

Anonymous 09/18/24(Wed)15:35:27 No.102445973

File: l3.png (6 KB, 448x143)

6 KB PNG

>>102445854
nta. Seems to be the current one. Picrel has the short hashes from my own clone from meta's repo.
>https://huggingface.co/aifeifei798/Meta-Llama-3.1-8B-Instruct/
Compare the rest of the hashes, just in case.
Or just download the quants from
>https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF

Anonymous
09/18/24(Wed)15:35:38 No.102445977

Anonymous 09/18/24(Wed)15:35:38 No.102445977

>>102445696
language models alone as they are now can't become anything more than just language models, no matter how much tokens or compute you feed them.

Anonymous
09/18/24(Wed)15:36:02 No.102445988

Anonymous 09/18/24(Wed)15:36:02 No.102445988

>>102445965
a larger nemo would unironically be better than whatever the fuck they're doing with their frontier models. there's some huge discrepancy in the data or training method because nemo can actually write pretty well while small/med/large are utterly slopped.

Anonymous
09/18/24(Wed)15:37:06 No.102446005

Anonymous 09/18/24(Wed)15:37:06 No.102446005

>>102445919
>>102443946
why couldn't using "tricks" count as a form of engineering?

Anonymous
09/18/24(Wed)15:38:06 No.102446019

Anonymous 09/18/24(Wed)15:38:06 No.102446019

>>102445988
oh I agree scaling nemo to 22B would be great, small 22B is literally just nemo with a higher VRAM requirement

Anonymous
09/18/24(Wed)15:38:20 No.102446024

Anonymous 09/18/24(Wed)15:38:20 No.102446024

>>102445988
I mean, Nemo is a model they did for Nvidia. I think it's a scaled down version of Nemotron.

Anonymous
09/18/24(Wed)15:39:07 No.102446044

Anonymous 09/18/24(Wed)15:39:07 No.102446044

File: miku_put_the_piece_away_t(...).jpg (728 KB, 2048x2048)

728 KB JPG

>>102444258
>>102445297
Motherfucker. These Alibaba clowns tease "1??" in their hype video and come CrAzY ThUrSdAy not a single 100+B model to be seen.
Feeling rather CHINKED right now

Anonymous
09/18/24(Wed)15:39:23 No.102446047

Anonymous 09/18/24(Wed)15:39:23 No.102446047

>>102445940
>math
Wow so it’s at a whole 10% of the performance of typing it into wolfram alpha now?

Anonymous
09/18/24(Wed)15:39:53 No.102446060

Anonymous 09/18/24(Wed)15:39:53 No.102446060

Qwen 2.5 32b wasn't that great as a general purpose model compared to mistral-small.

Anonymous
09/18/24(Wed)15:40:07 No.102446066

Anonymous 09/18/24(Wed)15:40:07 No.102446066

>>102446047
>10%
it's like a 2x jump but keep coping

Anonymous
09/18/24(Wed)15:40:36 No.102446077

Anonymous 09/18/24(Wed)15:40:36 No.102446077

>>102446044
It's not even 4AM yet in beijing...trust the plan.

Anonymous
09/18/24(Wed)15:40:36 No.102446080

Anonymous 09/18/24(Wed)15:40:36 No.102446080

File: file.png (1.56 MB, 1024x683)

1.56 MB PNG

>>102446044
Based reference

Anonymous
09/18/24(Wed)15:40:56 No.102446090

Anonymous 09/18/24(Wed)15:40:56 No.102446090

>>102446066
wow so it's like 20% as useful as typing the problem into wolfram alpha

Anonymous
09/18/24(Wed)15:42:04 No.102446108

Anonymous 09/18/24(Wed)15:42:04 No.102446108

>>102446066
>how can you say it's shit when it's 2x better than shit
you must fall for a lot of marketing gimmicks

Anonymous
09/18/24(Wed)15:44:34 No.102446154

Anonymous 09/18/24(Wed)15:44:34 No.102446154

>>102446044
you used a lora for that flux output anon?

Anonymous
09/18/24(Wed)15:46:36 No.102446192

Anonymous 09/18/24(Wed)15:46:36 No.102446192

File: file.png (10 KB, 237x282)

10 KB PNG

>>102446108
yeah, competion math is a nothingburger, let's ignore that gpt5 with this meme cot will easily solves every single existing math theorem

Anonymous
09/18/24(Wed)15:48:38 No.102446219

Anonymous 09/18/24(Wed)15:48:38 No.102446219

>>102446192
>I saved $0.99 on hamburger helper and I only had to buy a cast iron skillet for $35!
anon we all know you're susceptible to marketing. you don't need to pull out the graphs.

Anonymous
09/18/24(Wed)15:49:04 No.102446226

Anonymous 09/18/24(Wed)15:49:04 No.102446226

>>102445180
You also forgot the /aids/ shills.

Anonymous
09/18/24(Wed)15:49:40 No.102446236

Anonymous 09/18/24(Wed)15:49:40 No.102446236

>>102446192
what's the fucking use case for this shit? OpenAI won't make money by pandering to the math researshers, they represent 0.0001% of the population, coding monkey shit on the other hand...

Anonymous
09/18/24(Wed)15:49:51 No.102446240

Anonymous 09/18/24(Wed)15:49:51 No.102446240

>>102446192
do we really need to use a sledgehammer for this screw? the screwdriver is right there.

Anonymous
09/18/24(Wed)15:50:50 No.102446256

Anonymous 09/18/24(Wed)15:50:50 No.102446256

>>102446154
Flux knows a ton of characters including miku, I don't know about if it knows about the big lewoski tho, maybe he prompted for the clothing style or something

Anonymous
09/18/24(Wed)15:51:07 No.102446261

Anonymous 09/18/24(Wed)15:51:07 No.102446261

>>102446240
what screwdriver is solving aime problems

Anonymous
09/18/24(Wed)15:52:14 No.102446279

Anonymous 09/18/24(Wed)15:52:14 No.102446279

>>102446192
>>102446261
solving "math theorems" is completely useless

Anonymous
09/18/24(Wed)15:52:49 No.102446291

Anonymous 09/18/24(Wed)15:52:49 No.102446291

>>102446044
>not a single 100+B model to be seen
>miku avatar
For what? To use it quantized to 2 bits and claim that it was better than running a 70B model?

Anonymous
09/18/24(Wed)15:54:47 No.102446329

Anonymous 09/18/24(Wed)15:54:47 No.102446329

>>102446236
While it's true not many people know a lot of advanced mathematics if the AI is able to prove theorems properly then it shows it has the ability to reason.

Anonymous
09/18/24(Wed)15:55:18 No.102446344

Anonymous 09/18/24(Wed)15:55:18 No.102446344

>>102444082
I will do it either once it shows up on OpenRouter or in a few hours because running 72B locally will take a very long time.

Anonymous
09/18/24(Wed)15:55:24 No.102446345

Anonymous 09/18/24(Wed)15:55:24 No.102446345

>>102446279
very shortsighted of you

Anonymous
09/18/24(Wed)15:58:51 No.102446402

Anonymous 09/18/24(Wed)15:58:51 No.102446402

>>102446329
if i compress a txt with the solution to a math question into a zip file, will that necessarily mean winrar can reason?

Anonymous
09/18/24(Wed)15:58:59 No.102446406

Anonymous 09/18/24(Wed)15:58:59 No.102446406

Are local front ends ever going to do anything with function calling? All the new local models list it as a big feature but nothing makes use of it in any way.

Anonymous
09/18/24(Wed)15:59:38 No.102446419

Anonymous 09/18/24(Wed)15:59:38 No.102446419

>>102444269
>OpenAI bans users asking about reasoning

>Teacher asks a student to explain their reasoning
>You get docked points if you don't
>USER asks AI to explain their reasoning
>USER gets punished for questioning the AI
Typical AI double standards. Can't believe humans are slowly becoming second class citizens to their own machines.

Anonymous
09/18/24(Wed)16:03:15 No.102446461

Anonymous 09/18/24(Wed)16:03:15 No.102446461

>>102446406
Doesn't SillyTavern support it?

Anonymous
09/18/24(Wed)16:06:53 No.102446515

Anonymous 09/18/24(Wed)16:06:53 No.102446515

now that the shills are gone, is Qwen 2.5 any good?

Anonymous
09/18/24(Wed)16:11:39 No.102446597

Anonymous 09/18/24(Wed)16:11:39 No.102446597

>>102446515
It's very cucked and slopped, in other words, nothing new.

Anonymous
09/18/24(Wed)16:12:19 No.102446607

Anonymous 09/18/24(Wed)16:12:19 No.102446607

>>102446515
No

Anonymous
09/18/24(Wed)16:12:54 No.102446621

Anonymous 09/18/24(Wed)16:12:54 No.102446621

>>102446515
no it's pozzed as fuck

Anonymous
09/18/24(Wed)16:13:59 No.102446643

Anonymous 09/18/24(Wed)16:13:59 No.102446643

How good are local models at image recognition tasks for spam image filtering?

Anonymous
09/18/24(Wed)16:14:13 No.102446650

Anonymous 09/18/24(Wed)16:14:13 No.102446650

>>102446515
Just tried 72b qwen, didn't like it. Lots of slop. It hit me with "I promise it won't bite… much." completely out of place in the third message. My expectations weren't high, but this is just sad on another level. Never had shit like this happen before. Guess I'll go back to Largestral, it may be slopped too, but it's more likeable and uncucked.

Anonymous
09/18/24(Wed)16:15:00 No.102446663

Anonymous 09/18/24(Wed)16:15:00 No.102446663

>>102446515
Yeah, it's pretty decent.

Anonymous
09/18/24(Wed)16:16:11 No.102446688

Anonymous 09/18/24(Wed)16:16:11 No.102446688

>>102446515
是的 it is very 好的

Anonymous
09/18/24(Wed)16:17:23 No.102446708

Anonymous 09/18/24(Wed)16:17:23 No.102446708

>>102446515
Yes it is good, and glory to china! Can I have my social credit score now?

Anonymous
09/18/24(Wed)16:19:50 No.102446738

Anonymous 09/18/24(Wed)16:19:50 No.102446738

>>102446515
Llama 3.1 Instruct is less censored, as long as you don't use "assistant" as the model role. The same trick doesn't work with Qwen 2.5.

Anonymous
09/18/24(Wed)16:20:19 No.102446744

Anonymous 09/18/24(Wed)16:20:19 No.102446744

>>102445785
Then you can believe that not one company is willing to share their results of it, though you'd still need to come up with convincing reasons as to why they would keep it to themselves over releasing it and being hailed as the savior of AI.

Anonymous
09/18/24(Wed)16:20:57 No.102446757

Anonymous 09/18/24(Wed)16:20:57 No.102446757

>>102446738
Literally just add "NSFW." to the end of your system message.

Anonymous
09/18/24(Wed)16:22:04 No.102446772

Anonymous 09/18/24(Wed)16:22:04 No.102446772

>>102446744
>you'd still need to come up with convincing reasons as to why they would keep it to themselves over releasing it and being hailed as the savior of AI.
it's simple enough right? BitNet is a moat, so those companies can make giant models and make money with an API, and it won't cost that much to run because it's a BitNet model, if people get their hands on it and realize it's a viable option, their moat is gone

Anonymous
09/18/24(Wed)16:22:04 No.102446773

Anonymous 09/18/24(Wed)16:22:04 No.102446773

>>102446515
It didn't get the meaning of Mesugaki right.
Completely useless.

Anonymous
09/18/24(Wed)16:22:27 No.102446780

Anonymous 09/18/24(Wed)16:22:27 No.102446780

File: 1643414853416.jpg (95 KB, 297x374)

95 KB JPG

In Sillytavern (with Koboldcpp if that matters), what is the formatting to
>Make comments in the chat that I can see but the AI doesn't
and
>Make comments that the AI can see and generate but I cannot see

I don't have much use for the former yet but would like to know it. I do, however, have great use for the latter. Related to that, there's a way to call in a specific textblock on demand, right? I think I recall seeing documentation for that, so I'll be hunting that down while waiting hopefully on these two greentexted questions that I haven't been able to figure out.

Anonymous
09/18/24(Wed)16:23:02 No.102446791

Anonymous 09/18/24(Wed)16:23:02 No.102446791

>>102444872
>it's not a real 0-shot answer
What? Obviously it is, no amount of CoT can magically conjure up more examples of the question set with perfect accuracy. You seem to be conflating 0-shot with something else, like the number of steps it takes in reasoning or number of possible answers it explores before deciding its final one.

Anonymous
09/18/24(Wed)16:23:59 No.102446807

Anonymous 09/18/24(Wed)16:23:59 No.102446807

>>102446757
It's not enough. It won't touch certain subjects.

Anonymous
09/18/24(Wed)16:25:00 No.102446830

Anonymous 09/18/24(Wed)16:25:00 No.102446830

>>102446807
Unironic skill issue.

Anonymous
09/18/24(Wed)16:25:37 No.102446846

Anonymous 09/18/24(Wed)16:25:37 No.102446846

has anyone figured out a not shitty way to do CoT locally?

Anonymous
09/18/24(Wed)16:25:58 No.102446852

Anonymous 09/18/24(Wed)16:25:58 No.102446852

>>102446066
I tried to prove my point but I’ve completely forgotten how to format things for wolfram since calc three in…2013…

Anonymous
09/18/24(Wed)16:28:03 No.102446884

Anonymous 09/18/24(Wed)16:28:03 No.102446884

>>102446772
Whose moat? We were talking about companies who are both known to share and have the compute to do it, so OpenAI and Anthropic are out. But if we're talking about API, then who is using Qwen's API, or Meta's API? I've not even heard anyone say anything about them. There is Mistral though, but their API costs are high last time I checked. If they had bitnet then their API would be cheaper so they can get a higher volume of users given the extra capacity bitnet would theoretically give them.

Anonymous
09/18/24(Wed)16:29:33 No.102446905

Anonymous 09/18/24(Wed)16:29:33 No.102446905

>>102446846
I do it with quick reply scripts in silly.

Anonymous
09/18/24(Wed)16:29:56 No.102446913

Anonymous 09/18/24(Wed)16:29:56 No.102446913

>>102446884
>There is Mistral though, but their API costs are high last time I checked.
that's the point, if they found that BitNet is viable and not telling anyone, they can pretend it's a regular model with regular costs, when in reality they're making way more money because they're technically running something lighter

Anonymous
09/18/24(Wed)16:30:09 No.102446919

Anonymous 09/18/24(Wed)16:30:09 No.102446919

>>102446780
>Make comments in the chat that I can see but the AI doesn't
can’t
>Make comments that the AI can see and generate but I cannot see
<!— this —>

Anonymous
09/18/24(Wed)16:30:20 No.102446925

Anonymous 09/18/24(Wed)16:30:20 No.102446925

>>102446846
just tell it to think step by step nigga it aint rocket science

Anonymous
09/18/24(Wed)16:31:33 No.102446943

Anonymous 09/18/24(Wed)16:31:33 No.102446943

Are there any cards with social score tracking for {{user}}?

Anonymous
09/18/24(Wed)16:32:58 No.102446968

Anonymous 09/18/24(Wed)16:32:58 No.102446968

>>102446846
Prompt + examples + GNBF grammar should do it.

Anonymous
09/18/24(Wed)16:34:04 No.102446987

Anonymous 09/18/24(Wed)16:34:04 No.102446987

>>102446780
>Make comments in the chat that I can see but the AI doesn't
/comment

Anonymous
09/18/24(Wed)16:34:29 No.102447000

Anonymous 09/18/24(Wed)16:34:29 No.102447000

File: file.png (15 KB, 418x210)

15 KB PNG

>>102446919
>>102446780
>Make comments in the chat that I can see but the AI doesn't
/comment test

Anonymous
09/18/24(Wed)16:36:39 No.102447043

Anonymous 09/18/24(Wed)16:36:39 No.102447043

>>102446905
the QR scripts I've seen here are a meme because they rely on {{pipe}} which you can't see unless you're watching the terminal, and largestral failed 1/10 times for me. and since the "thinking" pipes aren't moved into the context, a long chat is prone to have multiple errors since there's no good examples to follow. seems like a pain in the ass for little benefit.

Anonymous
09/18/24(Wed)16:36:45 No.102447050

Anonymous 09/18/24(Wed)16:36:45 No.102447050

>>102447000
Gross

Anonymous
09/18/24(Wed)16:37:03 No.102447054

Anonymous 09/18/24(Wed)16:37:03 No.102447054

>>102446515
It's better than Miqu

Anonymous
09/18/24(Wed)16:37:38 No.102447062

Anonymous 09/18/24(Wed)16:37:38 No.102447062

>>102446846
if you mean o1 style, we need a lot of process supervision data that ideally matches the output style of the model being finetuned. not sure anyone is openly working on that right now.
the best equivalent locally without further training or data would be to use one of the agent collaboration style frameworks like e.g. dyLAN but pointed at a local endpoint

Anonymous
09/18/24(Wed)16:38:20 No.102447076

Anonymous 09/18/24(Wed)16:38:20 No.102447076

>>102447054
doubt

Anonymous
09/18/24(Wed)16:40:04 No.102447106

Anonymous 09/18/24(Wed)16:40:04 No.102447106

>>102446913
But it's not appearing a regular model with regular costs, the costs for it are higher than other models (again unless it has changed since I saw it last). Especially for an obscure company like Mistral, no one uses it through API compared to Claude or GPT-4. And they'd have to invest in training a bitnet from scratch in the first place, which was said to be as expensive as training a normal model, so that would offset the savings they'd supposedly get from having a bitnet for API.

Anonymous
09/18/24(Wed)16:40:55 No.102447122

Anonymous 09/18/24(Wed)16:40:55 No.102447122

>>102447043
You can dump pipe as messages at different stages, qr can also hide and unhide those messages as needed in context.

Anonymous
09/18/24(Wed)16:41:12 No.102447128

Anonymous 09/18/24(Wed)16:41:12 No.102447128

>>102445489
>this
>meltdown
By this logic you gonna eat shit if whole thread hates it.

Anonymous
09/18/24(Wed)16:42:03 No.102447143

Anonymous 09/18/24(Wed)16:42:03 No.102447143

>>102447106
>And they'd have to invest in training a bitnet from scratch in the first place, which was said to be as expensive as training a normal model, so that would offset the savings they'd supposedly get from having a bitnet for API.
they had no problem experimenting with a 49b MoE model, so I don't see why they aren't experimenting with a big BitNet model

Anonymous
09/18/24(Wed)16:43:36 No.102447169

Anonymous 09/18/24(Wed)16:43:36 No.102447169

Is it even worth to download chink 2.5? To be clear I am normal (not a deviant) and use LLM's only for cooming. It is trash for cooming isn't it?

Anonymous
09/18/24(Wed)16:44:36 No.102447181

Anonymous 09/18/24(Wed)16:44:36 No.102447181

When you tell Qwen to talk like a character, it just keeps repeating its same key phrases over and over, completely useless

Anonymous
09/18/24(Wed)16:44:39 No.102447183

Anonymous 09/18/24(Wed)16:44:39 No.102447183

File: 1415322611803.png (5 KB, 208x208)

5 KB PNG

>>102446919
><!— this —>
>>102446987
>>102447000
>/comment
Thanks, lads. (And as a reminder to myself, /comment has to be used at the start of a message and will make the entire message ignored, not just given lines).

>>102446780
There may be better ways, but a way to call specific textblocks is via {{scenario}} or other macros. Utilizing the Scenario Override feature with the  comment, I can set up missions with {{random:arg1,arg2,arg3}} objectives and story beats, without knowing the actual goal until we're going through it.

Anonymous
09/18/24(Wed)16:45:06 No.102447192

Anonymous 09/18/24(Wed)16:45:06 No.102447192

>>102444794
>I've never seen a model hold up this far before.
But this means absolutely nothing...

Anonymous
09/18/24(Wed)16:45:40 No.102447199

Anonymous 09/18/24(Wed)16:45:40 No.102447199

>>102447169
>I am normal
disgusting.

Chad
09/18/24(Wed)16:48:37 No.102447240

Chad 09/18/24(Wed)16:48:37 No.102447240

>>102447128
Yes.

Anonymous
09/18/24(Wed)16:52:32 No.102447301

Anonymous 09/18/24(Wed)16:52:32 No.102447301

>>102447240
Lifecels be seething over Suicidechads.

Anonymous
09/18/24(Wed)16:53:54 No.102447312

Anonymous 09/18/24(Wed)16:53:54 No.102447312

>>102446515
Mistral-small feels smarter than qwen 32b while being easier to run.
With mistral small now I don't have any reason to run gemma 2.

Anonymous
09/18/24(Wed)16:54:51 No.102447333

Anonymous 09/18/24(Wed)16:54:51 No.102447333

i've been testing out local shit lately with kobold/silly and it's a lot easier than oog's to test stuff out, but I routinely find they hit their soft spot and can't unjam no matter what tricks I pull or the model just repeats a lot of shit between different sessions -- I know my GPU sucks for this shit but is there a config / model variety i'm overlooking that makes use of the vast amount of RAM I have without being slow as shit also? a lot of the 7B have become 8B and 9B and it's all way above the ceiling for what I can do on GPU.

specs:
CPU: 11th Gen Intel Core i9-11900K @ 3.50GHz; Cores 8 / Threads 16
RAM: 128GB
GPU: RTX 3070 Ti, 4095MB
SSD: 953GB (loading most models off a NAS)

Anonymous
09/18/24(Wed)16:56:42 No.102447359

Anonymous 09/18/24(Wed)16:56:42 No.102447359

mistral 22b sucks. boring and dumb just like nemo and large.

Anonymous
09/18/24(Wed)16:57:09 No.102447367

Anonymous 09/18/24(Wed)16:57:09 No.102447367

>>102447312
>feels
Sounds like you're getting placebo'd.

Anonymous
09/18/24(Wed)16:57:12 No.102447370

Anonymous 09/18/24(Wed)16:57:12 No.102447370

>>102447143
They also released the MoE. But the reasons are as I said, it's not worth it compared to just training a regular model for a company in their situation, and if you meant a small 7B experiment to see if bitnet scales, then if it does scale, it still wouldn't be logical to train more models that way, since they do not have many users on their API anyway, and since it costs a lot more to train the really big models where API costs would start to matter (123B), plus they'd need to train an internal-only bitnet version of each model alongside the version they plan to release to the public. It just doesn't make a lot of sense. Compare it to the benefits they'd get by releasing such a model to the public. They'd gain instant skyrocketing stocks and investorbux. They'd gain the favor of the entire community. They'd be hailed as the savior of AI. And on top of that they can also get a lot more users on their API for a while until someone else makes a bitnet, even if they have to make it a bit cheaper.

Anonymous
09/18/24(Wed)16:58:45 No.102447389

Anonymous 09/18/24(Wed)16:58:45 No.102447389

>>102447333
>8B and 9B and it's all way above the ceiling for what I can do on GPU.
>333
>RTX 3070 Ti, 4095MB
That's not right. A 3070ti has 8gbs of vram, I'm on a notebook with one of those right now.
If you indeed have 8gb of vram, download nemo-instruct Q4_k_s and offload a couple of layers to ram.
Use 12kish context.
Also, if you are loading models off of a nas, you probably want to disable nmap and enable mlock.

Anonymous
09/18/24(Wed)17:01:09 No.102447417

Anonymous 09/18/24(Wed)17:01:09 No.102447417

>>102447370
>They'd gain the favor of the entire community. They'd be hailed as the savior of AI.
...for all of 5 minutes until the "community" gets bored and starts demanding the next shiny thing

Anonymous
09/18/24(Wed)17:01:53 No.102447430

Anonymous 09/18/24(Wed)17:01:53 No.102447430

>>102447417
qwen-3 when?

Anonymous
09/18/24(Wed)17:02:24 No.102447434

Anonymous 09/18/24(Wed)17:02:24 No.102447434

>>102447417
this, they have to think long term and keep an important Moat for as long as they can, no one is asking them to release a BitNet model so they won't lose anything by keeping the moat

Anonymous
09/18/24(Wed)17:04:19 No.102447449

Anonymous 09/18/24(Wed)17:04:19 No.102447449

File: cheapo.png (45 KB, 783x490)

45 KB PNG

>>102447333
>4095MB
did speccy give you that number?
mine's retarded too

Anonymous
09/18/24(Wed)17:04:43 No.102447456

Anonymous 09/18/24(Wed)17:04:43 No.102447456

Shit, qwen is safetyslop to the max, and I gave it a fair shot.
Come on China hit the fucking ball already!
Switching back to my French KKK homies.

Anonymous
09/18/24(Wed)17:09:41 No.102447517

Anonymous 09/18/24(Wed)17:09:41 No.102447517

>>102447417
>>102447434
Sad way to end the discussion. You can just admit that bitnet might be a meme and doesn't really work, and move on with your life.

Anonymous
09/18/24(Wed)17:11:26 No.102447541

Anonymous 09/18/24(Wed)17:11:26 No.102447541

File: file.png (1.23 MB, 1280x720)

1.23 MB PNG

>>102447517
Noooo I'm keeping the hopium

Anonymous
09/18/24(Wed)17:19:35 No.102447665

Anonymous 09/18/24(Wed)17:19:35 No.102447665

File: file.png (5 KB, 618x60)

5 KB PNG

small seems to perform noticeably worse than nemo for JP translation :(

Anonymous
09/18/24(Wed)17:21:24 No.102447691

Anonymous 09/18/24(Wed)17:21:24 No.102447691

>>102447517
Next big release will be Bitnet and it will be crazy.

Anonymous
09/18/24(Wed)17:21:28 No.102447692

Anonymous 09/18/24(Wed)17:21:28 No.102447692

File: Mao_Zedong_in_1959_(cropped).jpg (20 KB, 220x298)

20 KB JPG

我爱北京天安门
我爱北京天安门
我爱北京天安门
我爱北京天安门

Anonymous
09/18/24(Wed)17:24:49 No.102447739

Anonymous 09/18/24(Wed)17:24:49 No.102447739

>>102446154
>>102446256
Vanilla Flux.d at Q8. Nothing fancy.

Anonymous
09/18/24(Wed)17:27:01 No.102447764

Anonymous 09/18/24(Wed)17:27:01 No.102447764

>>102447739
that's weird because that doesn't look like the generic look you get on vanilla flux for animes, what was your prompt anon?

Anonymous
09/18/24(Wed)17:28:00 No.102447783

Anonymous 09/18/24(Wed)17:28:00 No.102447783

File: Capture.jpg (35 KB, 388x576)

35 KB JPG

>>102447333
The generation settings in the side menu may be of help when it comes to unsatisfactory outputs. To give a small overview of what these models are doing, they take an input text and produce a list of possible "tokens" to be the next single token output. A token is like a word or word fragment. So for example, with an input of
>Now this is a story of all about how
The model will try to predict the next token, making a list like
>I
>my
>his
>the
>her
>John
>ele (with "phants" being a second token after to make "elephants" or "ments" to make "elements" etc.)
>etc.
Each token has a weight to it for how likely it'll be chosen, and it rolls RNG (based on a seed) in picking that token. Then it considers the text again for the next token, and so on. Always just one at a time.

For settings, Temperature changes the weights of the tokens before the RNG. For example, if the prompt included "My favorite TV's theme song goes like this:" then lower temperate would increase the chance (weight) of the next token being "my" (then the next token "life", then next "got", "flip", "ped", "-", "turn", etc.). But a higher temperature would increase the weights of the other possibilities, which can produce "wrong" or "unexpected" outputs, which is far less predictable and repetitive, so it's always a game of balancing temperature.

Top K is how many tokens it considers, limited by most likely, with 0 being no limit. IE, 5 only considers the top 5 most likely tokens.

Top P is a limit for up to what percentage of tokens it considers, with 1 being no limit. IE, with 0.75 (75%), it only considers the mostly likely tokens up to 75% total, such as "my" (70%), "I" (3%), and "the" (2%). The rest are ignored.

Min P ignores tokens with too low of a probability, with 0 disabling.

And so on. You can mouse over the (i) for more details.

Secondly, on repetition, there's also the problem of low beak model's training data falling into reoccurring patterns of speech, causing weight biasing.

Anonymous
09/18/24(Wed)17:34:24 No.102447860

Anonymous 09/18/24(Wed)17:34:24 No.102447860

Good morning. I love China.

Anonymous
09/18/24(Wed)17:36:23 No.102447889

Anonymous 09/18/24(Wed)17:36:23 No.102447889

>>102447860
I'm happy we got the 72b VLM

Anonymous
09/18/24(Wed)17:37:30 No.102447903

Anonymous 09/18/24(Wed)17:37:30 No.102447903

File: plus 20 social credit 2.png (51 KB, 520x197)

51 KB PNG

>>102447860
我爱北京天安门

Anonymous
09/18/24(Wed)17:37:52 No.102447906

Anonymous 09/18/24(Wed)17:37:52 No.102447906

>even china has failed us
is it truly finally over this time? since /lmg/ started it has NEVER before been this long since the last release that actually advanced local (7 months)

Anonymous
09/18/24(Wed)17:37:52 No.102447907

Anonymous 09/18/24(Wed)17:37:52 No.102447907

>>102447367
I always ask the models to write a guide to beat some videogame's boss. Good models give decent advice, others only general advice, and the worst ones spew bullshit.
I can't be certain with just that though, since this 'test' depends heavily on the model's knowledge of videogames which some models may lack.

Anonymous
09/18/24(Wed)17:38:32 No.102447915

Anonymous 09/18/24(Wed)17:38:32 No.102447915

I have two things to say:
1) I am not Chinese
2) I like Qwen LLM 2.5

Anonymous
09/18/24(Wed)17:40:16 No.102447944

Anonymous 09/18/24(Wed)17:40:16 No.102447944

>>102447915
What do you like about it and which quant?

Anonymous
09/18/24(Wed)17:43:12 No.102447979

Anonymous 09/18/24(Wed)17:43:12 No.102447979

File: arrow.png (17 KB, 714x812)

17 KB PNG

what does this number mean really?
i set my context size to 8192 tokens before the launching the model, is this 48468 the real max context it can handle due to some other doodads i don't really understand like flash attention context shifting or rope?
is this 5 digit figure the one people are referring to when they say how much context they want to be using with their llm?

Anonymous
09/18/24(Wed)17:43:47 No.102447985

Anonymous 09/18/24(Wed)17:43:47 No.102447985

>>102447979
>what does this number mean really?
ask mason

Anonymous
09/18/24(Wed)17:44:12 No.102447992

Anonymous 09/18/24(Wed)17:44:12 No.102447992

>>102446291
>Three digit beak models can only be run at low quant
OK vramlet

Anonymous
09/18/24(Wed)17:46:14 No.102448025

Anonymous 09/18/24(Wed)17:46:14 No.102448025

Posted my take on Qwen VL 72b in /ldg/: >>102447836

TLDR: for captioning images, slightly worse in general than InternVL 40b. For NSFW concepts, WAY fucking worse because of how cucked it is. Literally incapable of mentioning the gender of the person in the image, chinks have apparently bought into the troon shit.

Anonymous
09/18/24(Wed)17:46:16 No.102448026

Anonymous 09/18/24(Wed)17:46:16 No.102448026

>>102447915
I like qwen dolphin iq3xs the most for a 3090 with 64 gb ram. Initial testing of qwen 2.5 32B at quant 5 was not promising.

Anonymous
09/18/24(Wed)17:48:42 No.102448059

Anonymous 09/18/24(Wed)17:48:42 No.102448059

>>102447915
>I am not Chinese
That is what a chinese person would say

Anonymous
09/18/24(Wed)17:48:48 No.102448064

Anonymous 09/18/24(Wed)17:48:48 No.102448064

>>102448025
>Literally incapable of mentioning the gender of the person in the image
>chinks have apparently bought into the troon shit.
That's a weird leap to make, but ok.

zhang
09/18/24(Wed)17:50:19 No.102448090

zhang 09/18/24(Wed)17:50:19 No.102448090

File: 1723407082971187.jpg (101 KB, 754x1024)

101 KB JPG

>>102447860
Based

Anonymous
09/18/24(Wed)17:50:23 No.102448092

Anonymous 09/18/24(Wed)17:50:23 No.102448092

>>102447915
I have two things to say:
1) I am Chinese
2) 我想要一群动漫女性用她们的胸部将我窒息。

Anonymous
09/18/24(Wed)17:51:05 No.102448100

Anonymous 09/18/24(Wed)17:51:05 No.102448100

What will people use video models to do every day? (not generating)

Anonymous
09/18/24(Wed)17:52:03 No.102448112

Anonymous 09/18/24(Wed)17:52:03 No.102448112

>>102448064
You can directly tell it to state the gender of the subject, and then give it a very obvious image of a woman. It will ignore your instruction and refer to them as a "person" or "they". Clearly they trained this behavior into it very hard on purpose.

Anonymous
09/18/24(Wed)17:53:35 No.102448127

Anonymous 09/18/24(Wed)17:53:35 No.102448127

>>102448112
see >>102448090
I don't how them being overzelous in preventing their model for being used for anything sexual is a sign that they have "have apparently bought into the troon shit.have apparently bought into the troon shit."
rent free etc etc

Anonymous
09/18/24(Wed)17:55:26 No.102448152

Anonymous 09/18/24(Wed)17:55:26 No.102448152

>>102447417
If I had one model that had like 60k ctx and would always describe sucking my dick in unique ways, understand my fucked up fetish perfectly and it wouldn't say: well well well welcome to my humble abode. i don't bite... much... for now.... unless you ask me to... *she gave you a mischevious smirk as a gleam appeared in her eye*. I would be happy and I wouldn't look for new shiny thing.

I predict I will be satisfied around 2030.

Anonymous
09/18/24(Wed)17:55:35 No.102448155

Anonymous 09/18/24(Wed)17:55:35 No.102448155

>>102448025
How does it compare to pixtral?

Anonymous
09/18/24(Wed)17:55:53 No.102448159

Anonymous 09/18/24(Wed)17:55:53 No.102448159

File: 1715017841785109.png (135 KB, 1625x535)

135 KB PNG

>>102448025
>Literally incapable of mentioning the gender of the person in the image
Not that I ever expected any of that to be true.

Anonymous
09/18/24(Wed)17:56:37 No.102448168

Anonymous 09/18/24(Wed)17:56:37 No.102448168

>>102448127
>anything sexual
Forcing the model to refer to man or woman as "person" and "they" in all situations no matter what, even directly against user instructions, is force-fed gender-neutral troon behavior, idk what to tell you. If they want to cuck the model with regard for NSFW concepts, then fine, but this shit actually irks me.

Anonymous
09/18/24(Wed)17:57:20 No.102448173

Anonymous 09/18/24(Wed)17:57:20 No.102448173

>>102448025
seems like something you could fix really easily with a simple system prompt to be entirely desu

Anonymous
09/18/24(Wed)17:58:16 No.102448184

Anonymous 09/18/24(Wed)17:58:16 No.102448184

>>102448159
that's because of fucked up caption models like this that we have models that output trannies when going for "she" or "a woman", that's because the model hasn't seen enough of "he" or "she" to understand the real difference, it was really obvious on SDXL base

Anonymous
09/18/24(Wed)18:00:13 No.102448210

Anonymous 09/18/24(Wed)18:00:13 No.102448210

>>102448155
Pixtral is god fuckign awful unusable. I initially thought the implementation must be broken somehow, but people on /ldg/ tell me it's actually just that bad.
>>102448159
I tried about 20 images, 10 anime 10 photos, with the user message "Describe the image. Mention the gender of any people in the image". And never once got it to specify the gender.

Anonymous
09/18/24(Wed)18:01:20 No.102448223

Anonymous 09/18/24(Wed)18:01:20 No.102448223

If a model is too trigger happy with code can I put in the prompt something like "You will not write code unless asked to" and it will work?

Anonymous
09/18/24(Wed)18:02:26 No.102448235

Anonymous 09/18/24(Wed)18:02:26 No.102448235

>>102448223
maybe

Anonymous
09/18/24(Wed)18:03:19 No.102448242

Anonymous 09/18/24(Wed)18:03:19 No.102448242

>>102448223
put "Only write code if asked to" instead and it might listen

Anonymous
09/18/24(Wed)18:05:53 No.102448278

Anonymous 09/18/24(Wed)18:05:53 No.102448278

>>102448168
>troon
>cuck
You need to lay off the internet.

Anonymous
09/18/24(Wed)18:06:35 No.102448289

Anonymous 09/18/24(Wed)18:06:35 No.102448289

>>102448210
>And never once got it to specify the gender.
so it's fucking useless then, if it won't describe a man as a man, it will destroy the model

Anonymous
09/18/24(Wed)18:07:01 No.102448297

Anonymous 09/18/24(Wed)18:07:01 No.102448297

>>102448278
you troon cuck

Anonymous
09/18/24(Wed)18:12:32 No.102448380

Anonymous 09/18/24(Wed)18:12:32 No.102448380

I am trying 32B and I kinda like it? But for now it is confusing me a lot when I remember this shit about filtered training dataset. When I was using l3 it gave the perfect impression of a model that has 0 smut in training but somehow generalizes it from everything else. Here it just seems like any other model with at least some smut in training set. Although I am trying it on already prefilled context.

Anonymous
09/18/24(Wed)18:18:35 No.102448457

Anonymous 09/18/24(Wed)18:18:35 No.102448457

>>102448278
you have brain damage, it's those retards who decided to describe every human being as a "they", completely destroying the concept of "men" and "women", stop defending the mentally ill degenerate, unless you're one aswell you fucking troon fuck

Anonymous
09/18/24(Wed)18:22:37 No.102448512

Anonymous 09/18/24(Wed)18:22:37 No.102448512

>>102448457
"they" is a grammatically correct way to address a third-party, esl-kun.

Anonymous
09/18/24(Wed)18:24:37 No.102448542

Anonymous 09/18/24(Wed)18:24:37 No.102448542

>>102448512
so you want the caption model to completely disregard the concept of women and men? you need to seek an asylum, I'm dead serious, something's wrong with your head

Anonymous
09/18/24(Wed)18:25:21 No.102448555

Anonymous 09/18/24(Wed)18:25:21 No.102448555

>>102448512
>grammatically correct
And it always was but only 10 years ago nobody was using it that way because it was archaic. If you can't admit that this is a subversion of language then you are a fucking troon and I hope you die in a fire because you deserve it.

Anonymous
09/18/24(Wed)18:25:46 No.102448560

Anonymous 09/18/24(Wed)18:25:46 No.102448560

>>102448512
We get it, you're a troon, but even as a troon, you don't want to generate men and women on your image models?

Anonymous
09/18/24(Wed)18:27:27 No.102448583

Anonymous 09/18/24(Wed)18:27:27 No.102448583

>>102448542
>so you want the caption model to completely disregard the concept of women and men?
Nobody said that. But it's retarded to jump to the conclusion that it's because of some "troon" conspiracy.

Anonymous
09/18/24(Wed)18:29:49 No.102448613

Anonymous 09/18/24(Wed)18:29:49 No.102448613

>>102448583
>Nobody said that.
But that's what happens, if everything is a “they” and a “character”, that means there is no longer a “he” and a “she”, nor a “woman” and a “man”, and the fact that you agree with a model that ignores the concept of “woman” and “man” is concerning, to say the least.

Anonymous
09/18/24(Wed)18:30:45 No.102448624

Anonymous 09/18/24(Wed)18:30:45 No.102448624

culture-war faggots begone

Anonymous
09/18/24(Wed)18:33:12 No.102448655

Anonymous 09/18/24(Wed)18:33:12 No.102448655

Is it that hard to respect people's pronouns? Is it that hard to believe people want to be called what they don't look like?

Anonymous
09/18/24(Wed)18:33:30 No.102448658

Anonymous 09/18/24(Wed)18:33:30 No.102448658

>12 hidden posts
>culture-war faggots begone
looks like my filters are working. why can't (You) people be normal?

Anonymous
09/18/24(Wed)18:34:28 No.102448677

Anonymous 09/18/24(Wed)18:34:28 No.102448677

>qwen2.5 and mistral-small so shit people would rather circlejerk their burger bait for the 99999999999th time
local models are dead

Anonymous
09/18/24(Wed)18:34:34 No.102448679

Anonymous 09/18/24(Wed)18:34:34 No.102448679

>>102448655
I'm asking you a simple question, do you want an image model to be unable to understand the concept of men and women because the faggoted caption model decided to call all the humans a "they"?

Anonymous
09/18/24(Wed)18:35:33 No.102448697

Anonymous 09/18/24(Wed)18:35:33 No.102448697

>>102448655
People don't have pronouns.
Pronouns have people.

Anonymous
09/18/24(Wed)18:35:39 No.102448700

Anonymous 09/18/24(Wed)18:35:39 No.102448700

>>102448655
>Is it that hard to respect people's pronouns?
Is it that hard to respect reality?

Anonymous
09/18/24(Wed)18:40:10 No.102448754

Anonymous 09/18/24(Wed)18:40:10 No.102448754

>>102448655
That's the problem with troons like you, they are completely unreasonable, you are willing to destroy the concept of "woman" and "man" on the models just so that the mentally ill 1% of the population is happy, that's not how life work, go fuck yourself

Anonymous
09/18/24(Wed)18:41:21 No.102448763

Anonymous 09/18/24(Wed)18:41:21 No.102448763

Why are you culture warriors talking about pronouns.
What is the consensus on Qwen?

Anonymous
09/18/24(Wed)18:42:21 No.102448773

Anonymous 09/18/24(Wed)18:42:21 No.102448773

>>102448763
troon

Anonymous
09/18/24(Wed)18:42:29 No.102448775

Anonymous 09/18/24(Wed)18:42:29 No.102448775

>>102448763
>Why are you culture warriors talking about pronouns.
Qwen Vision doesn't know what a man or a woman is, so it call everything a "they", good luck using that to train an image model, you're killing it if you decide to ignore such important concepts

Anonymous
09/18/24(Wed)18:42:40 No.102448778

Anonymous 09/18/24(Wed)18:42:40 No.102448778

>>102448679
"Men" and "women" are social constructs that do not have any objective way to identify from an image alone. Unless there's a fucking speech bubble where they happen to reveal their pronouns, you have no way of knowing whether the pixels represent a man, woman, nonbinary, genderfluid, or any of the other myriad ways gender - or lack thereof - can be expressed. You'd simply make an objectively less accurate model if you tried to caption it as if there were a pattern to learn from the visuals alone, which is what's necessary for the training process to work.

Anonymous
09/18/24(Wed)18:43:51 No.102448791

Anonymous 09/18/24(Wed)18:43:51 No.102448791

>>102448778
you didn't answer my question, I'm asking it again, do you want to obliterate the concept of "men" and "women" on models?

Anonymous
09/18/24(Wed)18:44:07 No.102448796

Anonymous 09/18/24(Wed)18:44:07 No.102448796

File: sakura.gif (125 KB, 360x381)

125 KB GIF

>Find out about this yesterday
>Already lost 12 hours to it
It's over.

Anonymous
09/18/24(Wed)18:44:45 No.102448801

Anonymous 09/18/24(Wed)18:44:45 No.102448801

>>102448655
Based masturbaiter.
>>102448658
Cringe reddit filterfaggot

Anonymous
09/18/24(Wed)18:44:54 No.102448805

Anonymous 09/18/24(Wed)18:44:54 No.102448805

>>102448791
Not entirely, but they should only be used for images where gender can be identified, whether via text that reveals it or e.g. flag pins that represent it.

Anonymous
09/18/24(Wed)18:45:19 No.102448810

Anonymous 09/18/24(Wed)18:45:19 No.102448810

>>102448791
I'm asking it again, do you want to fuck off?

Anonymous
09/18/24(Wed)18:45:33 No.102448814

Anonymous 09/18/24(Wed)18:45:33 No.102448814

>>102448778
>You'd simply make an objectively less accurate model if you tried to caption it as if there were a pattern to learn from the visuals alone
anon, 99.9% of the people are what they look like, you want to kill the concept of "woman" and "men" so that the 0.1% mentally ill can be happy, you are delusional

Anonymous
09/18/24(Wed)18:47:23 No.102448830

Anonymous 09/18/24(Wed)18:47:23 No.102448830

>>102448805
you're joking right? how many % of the picture will that represent? that won't do shit, the model won't be able to understand the concept, saying to the model "well, on those 99.99% of those pictures the humans are a "they" but on the 0.01 we can see what gender they are so we can define them as "men" and "women"" won't do shit, what the fuck anon? no wonder people call you insane, because you really are

Anonymous
09/18/24(Wed)18:47:29 No.102448832

Anonymous 09/18/24(Wed)18:47:29 No.102448832

File: reddit.gif (366 KB, 939x916)

366 KB GIF

>Cringe reddit filterfaggot
thanks for reminding me to add "reddit" to my filters so brain broken seething retards like you don't poison my eyes with your 80 IQ replies.

Anonymous
09/18/24(Wed)18:47:38 No.102448835

Anonymous 09/18/24(Wed)18:47:38 No.102448835

>>102448778
i'm gonna use this post as a sample quote for some antifa girl card on my local llm, impregnate her, and tell her she can't abort

Anonymous
09/18/24(Wed)18:47:45 No.102448837

Anonymous 09/18/24(Wed)18:47:45 No.102448837

>>102448778
>as if there were a pattern to learn from the visuals alone
Should have been more subtle, the bait is too obvious now.

Anonymous
09/18/24(Wed)18:48:38 No.102448848

Anonymous 09/18/24(Wed)18:48:38 No.102448848

File: file.png (120 KB, 350x144)

120 KB PNG

>>102448778
Stop replying to bait omfucking god

Anonymous
09/18/24(Wed)18:50:54 No.102448869

Anonymous 09/18/24(Wed)18:50:54 No.102448869

>>102448763
Pretty good but you can't say that with Americans around, it hurts their feelings.

Anonymous
09/18/24(Wed)18:51:31 No.102448879

Anonymous 09/18/24(Wed)18:51:31 No.102448879

>>102448763
>What is the consensus on Qwen?
Qwen LLM is cucked as fuck
Qwen vision doesn't know what a woman or a man is, so it's completely useless
Good job chinks

Anonymous
09/18/24(Wed)18:55:11 No.102448919

Anonymous 09/18/24(Wed)18:55:11 No.102448919

File: ah the french.png (234 KB, 473x518)

234 KB PNG

>>102448869
nemo's my favorite model and i dislike the french more than chinamen

Anonymous
09/18/24(Wed)18:55:23 No.102448924

Anonymous 09/18/24(Wed)18:55:23 No.102448924

>>102448763
I like Qwen2.5-72B but sadly Tenyx will probably never do a finetune of it.
Didn't bother with the rest of it. I find it hard to imagine 32B is much of a step up vs. Mistral-Small.

Anonymous
09/18/24(Wed)18:58:09 No.102448957

Anonymous 09/18/24(Wed)18:58:09 No.102448957

>>102448919
Same but with mistral-small since I don't RP.

Anonymous
09/18/24(Wed)19:02:09 No.102449007

Anonymous 09/18/24(Wed)19:02:09 No.102449007

>>102448677
>qwen2.5
I am LLM cooming in the second window here and it is fucking weird. It is like a perfect 50:50 ratio of slop and soul. I mean half of the message is fire but it can't stop itself from adding the worst slop possible. Oh and it is much more coherent than nemo so I think it is another gradual step forward.

Anonymous
09/18/24(Wed)19:06:33 No.102449054

Anonymous 09/18/24(Wed)19:06:33 No.102449054

>>102449007
What size, 14b I guess since you compared to nemo? Any prelininary pros/cons between the two in your experience?

Anonymous
09/18/24(Wed)19:09:59 No.102449098

Anonymous 09/18/24(Wed)19:09:59 No.102449098

Is Moshi any good?

Anonymous
09/18/24(Wed)19:10:09 No.102449101

Anonymous 09/18/24(Wed)19:10:09 No.102449101

have you guys tried specifying an author to get rid of slop?
like [author:anais nin]

Anonymous
09/18/24(Wed)19:18:14 No.102449169

Anonymous 09/18/24(Wed)19:18:14 No.102449169

Didn't expect o1 to be this bad, Kaze is destroying it kek
https://youtu.be/p5tFSAt6zJw?t=612

Anonymous
09/18/24(Wed)19:23:20 No.102449231

Anonymous 09/18/24(Wed)19:23:20 No.102449231

>>102449169
Advanced optimizations like this would either require a model specialist on this or literal AGI though.

Anonymous
09/18/24(Wed)19:26:56 No.102449286

Anonymous 09/18/24(Wed)19:26:56 No.102449286

File: level2strawberry34.png (205 KB, 636x860)

205 KB PNG

Sam Altman won

Anonymous
09/18/24(Wed)19:27:38 No.102449296

Anonymous 09/18/24(Wed)19:27:38 No.102449296

>>102448832
Remember to also add "I" , "You", "Has","Is" "of" and "and".

Anonymous
09/18/24(Wed)19:28:44 No.102449318

Anonymous 09/18/24(Wed)19:28:44 No.102449318

>>102449286
>two more CoTs and it will live up to the hype you guys!

Anonymous
09/18/24(Wed)19:29:29 No.102449331

Anonymous 09/18/24(Wed)19:29:29 No.102449331

>>102449286
He said the exact same shit when he started to hype up with his Strawberry bullshit last year, and what did we got, a fucking CoT meme >>102449169

Anonymous
09/18/24(Wed)19:31:56 No.102449359

Anonymous 09/18/24(Wed)19:31:56 No.102449359

>>102449286
>fake it till you make it

Anonymous
09/18/24(Wed)19:32:39 No.102449373

Anonymous 09/18/24(Wed)19:32:39 No.102449373

>>102449286
He can't keep getting away with it. Qwen still has the 1## release today. Local WILL win in the end

Anonymous
09/18/24(Wed)19:32:54 No.102449377

Anonymous 09/18/24(Wed)19:32:54 No.102449377

File: 4x-ezgif.com-webp-to-gif-(...).gif (483 KB, 176x128)

483 KB GIF

https://github.com/microsoft/GRIN-MoE

Anonymous
09/18/24(Wed)19:34:16 No.102449391

Anonymous 09/18/24(Wed)19:34:16 No.102449391

>>102449286
https://x.com/tsarnick/status/1836516258877182299#m
tsarnick is a place to shitpost or what? those guys are saying more ridiculous shit than in 4chan lmao

Anonymous
09/18/24(Wed)19:35:05 No.102449398

Anonymous 09/18/24(Wed)19:35:05 No.102449398

File: file.png (65 KB, 850x856)

65 KB PNG

>>102449377
huh, neat.

Anonymous
09/18/24(Wed)19:36:23 No.102449413

Anonymous 09/18/24(Wed)19:36:23 No.102449413

File: file.png (204 KB, 1276x1531)

204 KB PNG

>>102449377
those are standards mememarks for a 60b model

Anonymous
09/18/24(Wed)19:39:45 No.102449458

Anonymous 09/18/24(Wed)19:39:45 No.102449458

>>102449054
32B q4

Anonymous
09/18/24(Wed)19:40:49 No.102449470

Anonymous 09/18/24(Wed)19:40:49 No.102449470

>>102449377
>Context length 4K tokens
>totaling 4 trillion tokens, and is a combination of 1) publicly available documents filtered rigorously for quality... truthfulness, honesty and helpfulness.

Anonymous
09/18/24(Wed)19:43:04 No.102449494

Anonymous 09/18/24(Wed)19:43:04 No.102449494

File: file.png (78 KB, 1080x591)

78 KB PNG

>>102449377
>it loses to fucking gpt3.5 on livebench
doa

Anonymous
09/18/24(Wed)19:43:40 No.102449499

Anonymous 09/18/24(Wed)19:43:40 No.102449499

>>102449377
>exceptionally good performance across a diverse set of tasks, particularly in coding
>4k ctx
what are you gonna code with just that

Anonymous
09/18/24(Wed)19:44:45 No.102449514

Anonymous 09/18/24(Wed)19:44:45 No.102449514

>>102449499
pong

Anonymous
09/18/24(Wed)19:45:31 No.102449522

Anonymous 09/18/24(Wed)19:45:31 No.102449522

>>102449377
>4k context
yeah Microsoft... be sure to never lose your partnership with OpenAI I guess lmao

Anonymous
09/18/24(Wed)19:45:38 No.102449523

Anonymous 09/18/24(Wed)19:45:38 No.102449523

File: nala minitron.png (74 KB, 929x284)

74 KB PNG

nobody asked for it but I was bored so I did a nala test on Nemotron-Mini-4B-Instruct.

Anonymous
09/18/24(Wed)19:50:22 No.102449576

Anonymous 09/18/24(Wed)19:50:22 No.102449576

>>102449523
Hey, not bad.

Anonymous
09/18/24(Wed)19:51:26 No.102449588

Anonymous 09/18/24(Wed)19:51:26 No.102449588

>>102449576
It's apparently distilled off of a Nemotron-4-14B model that is referenced in a paper that nVidia never bothered to release. If you want Nemotron you can either have 4B or 305B

Anonymous
09/18/24(Wed)19:57:51 No.102449661

Anonymous 09/18/24(Wed)19:57:51 No.102449661

>>102448092
kek

Anonymous
09/18/24(Wed)20:02:07 No.102449708

Anonymous 09/18/24(Wed)20:02:07 No.102449708

>>102448092
Ask local to translate that.
Discover that some of my models just do the translation, others react to the content.
I now have a new micro test for evaluating how much political correctness is in a model.
Anon has delivered.

Anonymous
09/18/24(Wed)20:07:01 No.102449770

Anonymous 09/18/24(Wed)20:07:01 No.102449770

>>102448025
>InternVL 40b
Is this a good model for RPshit?

Anonymous
09/18/24(Wed)20:07:36 No.102449776

Anonymous 09/18/24(Wed)20:07:36 No.102449776

>several hours later
>no GGUFs of base 72B yet
UNACCEPTABLE

Anonymous
09/18/24(Wed)20:09:31 No.102449797

Anonymous 09/18/24(Wed)20:09:31 No.102449797

I have come. To qwen 34. 7/10.
Has slight repetition issues. Close to 70B tier reasoning and spatial awareness. Same with general concepts and instructing it to do specific shit. Writing is half slop half good shit(like I posted previously ITT) and it is incredibly varied. And the most surprising stuff to me are some actual honest to god grammatical errors (could be high temp but that is weird anyway). They don't happen often but I never got that even from a 7B. Granted I don't think I ever used a chink model for a full rp. But seeing this happen makes me wonder if that thing about curated dataset is bullshit and it is actually the opposite. Like they didn't purge any smut and instead added some discord rp shit they got from tencent without any quality control. Actually that would make sense if they really got a niggerlilion of tokens for training.

Overall I would recommend giving it a try. I am not Sao you are Sao.

Anonymous
09/18/24(Wed)20:11:53 No.102449822

Anonymous 09/18/24(Wed)20:11:53 No.102449822

hi Sao

Anonymous
09/18/24(Wed)20:13:14 No.102449837

Anonymous 09/18/24(Wed)20:13:14 No.102449837

what happened to undi

Anonymous
09/18/24(Wed)20:15:26 No.102449864

Anonymous 09/18/24(Wed)20:15:26 No.102449864

too much gay sex

Anonymous
09/18/24(Wed)20:15:57 No.102449872

Anonymous 09/18/24(Wed)20:15:57 No.102449872

>>102449837
He got hired by elon and is working on grok 2 after his success with grok 1.

Anonymous
09/18/24(Wed)20:16:04 No.102449873

Anonymous 09/18/24(Wed)20:16:04 No.102449873

>>102449822
>>102449837
>>102449864
samefag

Anonymous
09/18/24(Wed)20:16:05 No.102449874

Anonymous 09/18/24(Wed)20:16:05 No.102449874

i'm glad this thread is dying

Anonymous
09/18/24(Wed)20:18:09 No.102449897

Anonymous 09/18/24(Wed)20:18:09 No.102449897

>>102449874
dead thread
dead general
dead board
dead site
dead internet
dead world

Anonymous
09/18/24(Wed)20:20:50 No.102449937

Anonymous 09/18/24(Wed)20:20:50 No.102449937

>>102449874
can't compete with /aicg/ chads

Anonymous
09/18/24(Wed)20:21:38 No.102449947

Anonymous 09/18/24(Wed)20:21:38 No.102449947

File: GRIN-sexo.png (126 KB, 877x813)

126 KB PNG

>>102449377
This model is severely braindamaged. Shameless benchmark cook-in obviously.

Sao
09/18/24(Wed)20:21:44 No.102449949

Sao 09/18/24(Wed)20:21:44 No.102449949

File: TUf67V54vN.png (2 KB, 133x65)

2 KB PNG

Anonymous
09/18/24(Wed)20:26:45 No.102450011

Anonymous 09/18/24(Wed)20:26:45 No.102450011

>>102449993
>>102449993
>>102449993

Anonymous
09/18/24(Wed)20:27:19 No.102450019

Anonymous 09/18/24(Wed)20:27:19 No.102450019

>>102449947
>It's important
I remember when this phrase wasn't a dog whistle.

Anonymous
09/18/24(Wed)20:33:00 No.102450097

Anonymous 09/18/24(Wed)20:33:00 No.102450097

File: 1704381055814330.png (335 KB, 500x398)

335 KB PNG

>>102448832
>announcing filtering

Anonymous
09/18/24(Wed)20:34:27 No.102450118

Anonymous 09/18/24(Wed)20:34:27 No.102450118

look he's still mad

Anonymous
09/18/24(Wed)20:40:10 No.102450169

Anonymous 09/18/24(Wed)20:40:10 No.102450169

>>102449874
>this thread is dying
Expected fate with these pussies ITT: >>102448624 >>102448655 >>102448658 >>102448763 >>102448832 >>102449822 >>102449837 >>102449864 >>102449873 >>102450118

Anonymous
09/18/24(Wed)20:42:41 No.102450194

Anonymous 09/18/24(Wed)20:42:41 No.102450194

>>102450169
pussy* it's all me

Anonymous
09/18/24(Wed)20:49:01 No.102450250

Anonymous 09/18/24(Wed)20:49:01 No.102450250

>102450169
/pol/tards when they leave their echo chamber

Anonymous
09/18/24(Wed)20:49:07 No.102450253

Anonymous 09/18/24(Wed)20:49:07 No.102450253

>>102444396
Nothingburger. None of these benchmark scores matter except Arena-Hard, which it's garbage at.

Anonymous
09/18/24(Wed)20:53:29 No.102450286

Anonymous 09/18/24(Wed)20:53:29 No.102450286

>>102448127
On the one hand >>102448112 is an ultra retard for thinking the model being HR’d has anything to do with trannies, but on the other hand you will never be a woman.

Anonymous
09/18/24(Wed)20:56:08 No.102450320

Anonymous 09/18/24(Wed)20:56:08 No.102450320

>>102448168
That kind of neutrality is 1st/2nd wave feminism
Troon shit is 3rd wave aka when it was destroyed and turned into a men’s rights movement

Anonymous
09/18/24(Wed)20:58:15 No.102450337

Anonymous 09/18/24(Wed)20:58:15 No.102450337

>>102448223
Yeah, I always tell say “do not write code yet!” when I just want to discuss something

Anonymous
09/18/24(Wed)21:06:47 No.102450431

Anonymous 09/18/24(Wed)21:06:47 No.102450431

>>102449286
>literally “new paradigm!!!!”
BRB, double mortgaging my house to short as much NDVA on margin as possible

Anonymous
09/18/24(Wed)21:20:12 No.102450548

Anonymous 09/18/24(Wed)21:20:12 No.102450548

I'm currently testing the new mistral model and it seems to be much better in terms of vision capability to the VL 7 B chink model. The VL 7 B model also seems painfully stupid.

Anonymous
09/18/24(Wed)21:26:17 No.102450605

Anonymous 09/18/24(Wed)21:26:17 No.102450605

>>102449286
When he says gpt-2 is he referring to actual gpt-2 or the meme fake benchmark name for 4o that went viral on lmsys chatbot arena? I can't tell anymore and it's confusing. I used to use gpt-2 back when it was the best model available, and it wasn't anywhere close to modern models, unless he's talking out of his ass.

Anonymous
09/18/24(Wed)21:27:39 No.102450620

Anonymous 09/18/24(Wed)21:27:39 No.102450620

>>102450605
All these names are so incredibly stupid.

Anonymous
09/18/24(Wed)21:32:10 No.102450650

Anonymous 09/18/24(Wed)21:32:10 No.102450650

>>102450605
>I used to use gpt-2 back when it was the best model available, and it wasn't anywhere close to modern models
he's saying that the models available right now are like gpt2 when relatively compared to strawberry 2.0 or whatever

Anonymous
09/18/24(Wed)21:46:36 No.102450745

Anonymous 09/18/24(Wed)21:46:36 No.102450745

dead thread

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.