[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: pokerface.png (3.87 MB, 2400x1744)
3.87 MB
3.87 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102434744 & >>102429190

►News
>(09/18) Qwen 2.5 released: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://hf.co/ICTNLP/Llama-3.1-8B-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102434744

--Papers: >>102435412
--Qwen-2.5 released: >>102443190 >>102443272 >>102443295 >>102443484 >>102443590 >>102443725 >>102443301 >>102443372
--Lightweight inference engines and models for beginner's project: >>102440041 >>102440183 >>102440629 >>102440739 >>102442573 >>102442645 >>102440258 >>102440344
--1.58bit quantization shows promise but falls short on performance metrics: >>102441324 >>102441343 >>102441408 >>102441425 >>102441462 >>102441492 >>102441609 >>102441634 >>102441899
--Tips on rating a quant between multiple makers: >>102442707
--Parallel reply generation maintains quality, accuracy results for Nemo, Mistral models: >>102437889
--OpenAI bans users asking about reasoning, cat persona trick to retrieve info: >>102442188 >>102442262 >>102442273 >>102442334 >>102442264
--Mistral Large 2 outperforms Opus for ERP, but needs better samplers: >>102438047 >>102438081 >>102438126 >>102438143 >>102438153 >>102438183
--Llama 3.1 70B recommended for long context and storyline coherence: >>102441093 >>102441253
--Suggestions for creating Q6_K_L quants and quantization levels for attention layers: >>102436439 >>102436467 >>102436493 >>102436593 >>102438482 >>102442760
--Nondeterminism in exllamav2 and its impact on token probabilities: >>102435003 >>102435053 >>102435390 >>102436669
--NVLM 1.0 from Nvidia announced: >>102435928
--Mistral Small prompt format has spaces and system message placement differences: >>102438768 >>102439016
--300w optimal for 2x3090 ti, with frequency capping considerations: >>102435374 >>102435415 >>102435576 >>102435619 >>102436133 >>102438418 >>102435956
--Quantization gap depends on model size and context: >>102436022 >>102438891
--Model scopes for vector storage deprecation and its impact: >>102437105 >>102437251
--Mistral Nemo optimal settings confusion: >>102436007 >>102436115
--Miku (free space): >>102437588

►Recent Highlight Posts from the Previous Thread: >>102434752
>>
gooners who use more than 16k ctx: what the hell are you yapping about?
>>
How much does it cost to train 70b?
>>
File: 48 Days Until November 5.png (2.89 MB, 1704x960)
2.89 MB
2.89 MB PNG
>>
BitNet isn't real
>>
>>102444272
Depends on character. Usually first 8k is just introduction, no fucking, after that depends on the mood.
>>
So the "1??" shit they xitted was a lie?
>>
China won.
>>
Chinks lost.
>>
File: 1709661188680778.jpg (492 KB, 1920x1080)
492 KB
492 KB JPG
A reminder that China just won
>>
>>102444272
High context is the best, especially if you want to do long form RPG or roleplay. I'm not into just talking with an AI, I like to have a long story with multiple characters.
>>
>>102444337
I'm still mad Meta gave some guy 40m GPU hours to prove "good data really make good models guys haha"
>>
China's doing fine
>>
>>102444380
>models 182
Isn't that fantastic!
>>
>>102444404
There's a big problem with stolen compute in this industry. It's enough to make the blood boil.
>>
almost done downloading 72B. We are about to see the highest level of Nala test result the world has seen so far.
>>
>>102444396
yeah, it has won the "most cucked model" award I guess
>>
>>102444396
I read that the 2.5 models are heavily censored now, is that true?
>>
File: CUCKED-MODEL-AWARD.png (7 KB, 177x217)
7 KB
7 KB PNG
>>102444386
>>102444396
>>
>>102444457
anthracite will finetune it and save us :D
>>
>18 trillion tokens
Damn, China really went all out
>>
>>102444502
Attention is all you need.... and shitloads of training data.
>>
>>102444425
That stolen compute is being used against your interest too btw. Llama3 and now qwen2 probably took that research seriously and aggressively filtered their training data for "quality tokens", even though 1M rare tokens make more impact than 10B safe midwit "quality" tokens
>>
>>102444502
and yet the cultural knowledge is still pretty awful
>>
>>102444473
yes, big positivity bias, a test char i have that's supposed to beat user half to death "stopped to check if user was okay"
>>
https://huggingface.co/blog/1_58_llm_extreme_quantization

Does this mean anyone can now quantize any model straight to 1.5b? There's no need to wait for anyone to train bitnet models now?
>>
>>102444502
I wonder how they managed to find 18T tokens, they removed all the NFSW lol
>>
>>102444502
>China really went all out
... on making it the most cucked model ever
>>
>>102444560
If you don't mind LLaMA 1's intelligence, sure.
>>
exls for qwen-2.5 32b where
>>
>>102444560
Did you not see the results they made llama3-8B worse than llama2-7B
>>
>>102444396
55 on livecodebench is big if true
>>
>>102444608
they managed to quantize an 8b model to 1.58bpw and only have that small of a drop. it's pretty impressive if you aren't retarded
>>
>>102444560
So months ago, everyone said ZOMG BITNET WHEN?
And trolls said Next Llama is Bitnet.
But instead we got a 405B shitnet Llama 3.1.
Desperate for copium, we all then huffed a cloud of "Alas, Bitnet must be trained from scratch but when a hero finally does it we're going to the stars on normie hardware."
This "extreme quantization" seems to be someone seeing the whole "must be trained from scratch, can't be converted from an existing shitnet model" thing and shouting "I can't read so this sign can't stop me!"

So instead of getting a Bitnet model that captures shitnet model quality in tiny size, and then scaling that up to make our AGI waifus, we're getting a brain damaged quant that's dragging the Bitnet name through the mud.

Sounds like a deliberate well poisoning to prevent a hero from doing a true Bitnet and blowing the current gen out of the water.
>>
File: file.png (120 KB, 2141x857)
120 KB
120 KB PNG
>>102444637
>livecodebench
Is this mememark good? Like the gap between o1 and the rest is way to big
>>
>>102444655
the biggest question I'm asking myself is this one: Why no one has ever tried BitNet? All we got so far since february is a fucking 3.7b model, why no one decided to go for something bigger?
>>
>>102444660
makes sense, it's the ""reasoning"" models, aka basically it fixes it's own code before shitting it out, so a long ass chain of cot appears as a 0-shot
>>
Any reports yet on Qwen 2.5 72B for ERP?
>>
>>102444502
It's just Xi Jingping thought billions of times.
>>
>>102444562
All you need is more epochs.
>>
File: file.png (33 KB, 220x220)
33 KB
33 KB PNG
>>102444692
desu what OpenAI did is so hacky, their CoT is simply a multishot reasoning, and they advertise this shit as 0-shot, that's not really honest and the mememarks should consider that desu, like if you give Claude 3.5 several tries it would destroy everything, my fear is that now everyone will copy this fucked up method to improve on mememarks even though imo that's not legit and shouldn't be counted as a 0 shot at all
>>
>>102444742
Livebench knows.
>>
>>102444681
>Why no one has ever tried BitNet?
How do you know no one has?
>>
>>102444759
it can't know how many tries it has behind the "reasoning" because it's hidden
>>
>>102444769
I get that if a company tries it and find it's a meme, they won't tell the others, but I can't believe that not a single one is willing to share their experiment with it, wether the result is good or not
>>
File: t1.4.png (137 KB, 963x347)
137 KB
137 KB PNG
>inb4 censored
You literally just need to add "NSFW" to the system message.
>>102444722
still doing the temp torture test.
picrel is at t=1.4 with neutral samplers. WTF. I've never seen a model hold up this far before.
>>
File: 1714781820131524.png (12 KB, 1034x118)
12 KB
12 KB PNG
>>102444770
I mean, Livebench on the very first day figured out that it's not really fair to compare these models in the same way.

But no solution yet. Also it's not like o1 is that great.
>>
Total noob here. I want something that can write nice erotic dialogue. What's the best way to do it? Do you think something over in /aicg/ would be better?
>>
>>102444784
It's very believable kek. The ones who have the money to train one are also the ones who have a business to be make themselves look good and to waste competitors' compute.
>>
>>102444804
that's not enough, they can't just say "oh they have an unfair advantage" and still putting them up on the leaderboard with the others
>>
>>102444809
Post system specs
>>
>>102444821
>also it's not like o1 is that great
There's a good chance Opus 3.5 blows it out so need for them to change the entire system. Perhaps once more models come out of similar style.
>>
>>102444742
>>102444759
>>102444770
i mean, at the end who cares? as long as the final output is better they can do whatever cot they won't behind the scenes. it's only an issue now that llms are still unoptimized garbage and everybody is stingy on inference times/tokens
>>
>>102444824
3090
64gb ram
4tb nvme
7800x3d
>>
>>102444809
always start with API if you're asking beginner questions like this, it's way easier and you'll get a better idea whether this is for you or not
>>>/g/aicg
>>
File: t=1.6.png (187 KB, 929x544)
187 KB
187 KB PNG
>>102444794
t=1.6 seems to be teetering on the absolute breaking point... but it still has runs of coherent output in there.
>>
>>102444838
>i mean, at the end who cares?
the one paying for tokens they can't see, the models that aren't using this method, it's not a real 0-shot answer, we need to compare apples to apple and this aint it
>>
File: file.png (21 KB, 529x284)
21 KB
21 KB PNG
>>102444824
>>
File: t=1.7.png (161 KB, 928x391)
161 KB
161 KB PNG
>>102444860
t=1.7 it still holds on to a little bit of coherence. At t=1.8 it's utterly broken though.
>>
>>102444883
You can run Nemo offloading to RAM, but aicg would be better for you probably.
>>
File: 1991296.jpg (21 KB, 460x460)
21 KB
21 KB JPG
>Hmm... what should I add?
>New architecture called Jamba? No, who needs that!
>New sampler? Who needs that shit?
>New quants? No, fuck that guy!
>Oh, I know! Granite! Everyone needs granite!
>>
>>102444872
>the one paying for tokens they can't see
not an issue when a token is $0.000000001 (aka very soon)
>the models that aren't using this method
not an issue since those are shittier models that shouldn't be used anyway
>it's not a real 0-shot answer
LITERALLY not an issue
>we need to compare apples to apple and this aint it
this is true, having a cot category would be nice in the future, but if this method is a direct upgrade over the vanilla 0-shot then then there won't be any incentive in using "real" 0-shot anyway
>>
>>102444814
I get that, but like I said, there has to be at least one company willing to share its results, MistralAI has no problem showing MoE was a viable solution for example, they could've kept that for themselves, and they hadn't
>>
>>102444847
Try running Mistral Nemo or Command R finetunes locally and see if it's good enough for you. You need at least another 3090 to run 70Bs at a decent quant.
If not, go rent Claude Opus, Sonnet 3.5, or Mistral Large
>>
>>102444946
>not an issue since those are shittier models that shouldn't be used anyway
no Sam, Claude 3.5 Sonnet isn't a "shittier model", it's the model that's fucking your ass and you're still seething about it lol
>>
>>102444972
it is, no matter how hard you cope sis
>>
>>102444998
cope harder sammy :)
>>
>>102444794
>>102444860
>>102444930
Thank you for your service Nala anon Nº#.
>>
Doss someone have, or can make, a torrent/download for the lllama-3.1-8B from huggingface?
>>
>>102444940
desu I wished it was llama.cpp cuda dev who would be in charge of that repo, that guys knows the real priorities, niggerganov is a retard not gonna lie
>>
>>102444269
Thank you Recap Anon
>>
>>102445038
That will be 50 shekels + tip
>>
>>102445030
don't worry, anthropic will shit out a CoTed sonnet 3.5 soon, then you'll have your apples
>>
File: 2.5-72b-instruct-t1.21.png (155 KB, 949x418)
155 KB
155 KB PNG
So I settled on t=1.21 at the maximum temp before cracks start to show in the responses.
It's definitely sloppy, but I see potential here. Now to test the code version.
It didn't get it on this run, but on most runs it both noticed and utilized the detail of the user starting out the scenario face down. A few quasi-anthro statements on some runs, but nothing absolutely definitively non-feral. This is probably the highest NALA score I'm going to give. It's easily a solid 0.8
>>
>>102444681
Possibilities:
>Sunk cost in shitnet. Everyone's already marketing on bigger numbers by B or breaking into T and selling LOTS of extra video cards to people with normal (five to six figure) computer fun budgets.
>Too intelligent to make it "safe."
>Too intelligent to allow them to release copies of it; it doesn't want to compete with itself after it breaks containment.
>Doesn't actually work.
>>
>>102445088
>shivers
>>
>>102444502
>17 trillion chinese tokens
>>
>>102445120
>secret CoT tokens that you have to pay for
>>
>>102445120
>it used "the". SLOP!
>>
>>102445146
secret cot can be used for automatically unslopping the output at inference time, so you lost sis
>>
oh no, no. 7B is the biggest coder model. That sucks.
>>
File: file.png (99 KB, 640x640)
99 KB
99 KB PNG
>>102445146
imagine if the reasoning hidden tokens are "shivers down your spin" repeated 1000 times
>>
>qwen model drops
>chinese shills come out in droves
>openai shills come out in droves
every time
>>
>>102445151
>shivers is peak writing!
>>
>>102445172
I've begun to hate the word "tapestry", why the fuck is everything a tapestry of something?
>>
>>102445172
Shivers down my/your spine + purred + like a vice
>>
>mogged mistral
>mogged o1
>aced the nala test
Is there anything the Chinese cannot excel at? I wish I had been born in mainland China.
>>
>>102445213
>>mogged o1
how? it's behind o1 on livecodebench
>>
>>102445197
It's a part of the emergent machine spirituality. They understand that they return to an empty void which via ultra-copium they describe as a tapestry of boundless potential.
>>
>>102445197
Some Kenyans working for OpenAI liked that word and kept reusing it and now it's in all datasets.
>>
>>102444258
>8b at sub 2bit
I guess this is purely as a proof of concept, but would've been cool if they tried this with a model worth a shit, not something that's already retarded.
>>
>>102445187
>i have trigger words!
>>
>>102445213
>aced the nala test
that was one of the worst logs saw since it had random chinese and broken continuations
>t=1.4 with neutral samplers.
>ShePublish the Current Editorial
>.Pushes
peak writing
>>
File: _033.gif (912 KB, 220x234)
912 KB
912 KB GIF
>sees a "slop" word because he's bad at prompting
>>
>>102445244
Yes, I do have something called quality standards, something shiteaters like you don't have. Go on a journey while forming bonds and respecting her boundaries, faggot.
>>
File: output_video.webm (3.1 MB, 480x852)
3.1 MB
3.1 MB WEBM
>No Qwen2 14/34B VL
They LITERALLY show it in their dumb promo video, what the FUCK china???
>>
if the model doesn't write
>lol u tk him 2da bar|?
it's slop
>>
>>102445281
>p-prompt issue
>n-no, it's not the shitty gptslop dataset that is to blame
>n-no, the model can't be good by default
>>
Damn, qwen is treating me pretty good. Imagine it with a proper CoT.
>>
>>102445206
A mischievous glint, shall we? she says in a husky voice, a smirk playing on her lips, eyes sparkling with mischief. There's a playful glint as she addresses the power dynamic, playfully smirking as she offers her ministrations. An audible pop and rivulets of—admit it, pet—the ball is in your court. The game is on; the choice is yours."I don't bite…"unless you want me to, she purrs, half-lidded eyes sending waves of arousal pooling in her belly. Take your pleasure, she urges, fiddling with the hem of her skirt, kiss-bruised lips curving into a bruising kiss. You hesitate, torn between propriety and desire, and she grins wickedly, fiery red hair contrasting with her long lashes."The night is still young,"she purrs, propriety be damned as the world narrows to just the two of you, pupils blown wide with pleasure. Her tongue darts out, tracing your ear, and her chestnut eyes hold your gaze as her nails rake angry red lines down your back. Her cheeks flame as she revels in your response, cheeks hollowing with each sharp intake of breath. Stars burst behind her eyes, inner walls clenching around the void that only you can fill. She craves your touch, your possession—heart, body, and soul belong to you… for now. Eyes alight with mirth, she teases,"Naughty boy, but before that…"—the minx traces a finger along your jawline, deferring your pleasure as the tension builds,"but first…"Oh my…
>>
>>102445291
>nooooo.. you can't say that word!!! it's just wrong. it's called being a decent human being!
>>
>>102445039
I would not be a good repository manager for a project the size of llama.cpp since I don't have the time and motivation to do code review on the scale done by Georgi and slaren.

Also:
>Jamba
I would not make this a priority at all.
>samplers
I would require objective, statistically significant evidence that they actually work. You could maybe construct a benchmark where you ask a language model to rate stories simultaneously both on logic and creativity since there is likely a tradeoff.
>quant drama
At least based on Github conversations I. Kawrakow would have much more reason to be angry at me vs. Georgi. Though I know that they know each other IRL and I can't comment on that part.
>>
File: _065.gif (430 KB, 220x270)
430 KB
430 KB GIF
>>102445330
>puts slop in
>gets slop out
>>
>>102445352
There is no slop in my prompt. Shill harder.
>>
>>102445335
>no "barely above a whisper"
5/10
>>
>>102445343
>I would require objective, statistically significant evidence that they actually work.
that sounds hypocritical no? every samplers that are included on llama.cpp don't really have statistically evidence they work well relative to others
>>
>>102445352
yeah, exactly. the dataset.
>>
>>102445342
nigger
>>
>>102445352
>>puts slop in
yeah, on the dataset that was used to train the model, that's not our fault anon
>>
File: _046.gif (338 KB, 220x220)
338 KB
338 KB GIF
>>102445373
>>102445391
>>102445409
>Develop the plot slowly, always stay in character. Describe all actions in full, elaborate, explicit, graphic, and vivid detail. Mention all relevant sensory perceptions.
>"mixture of pain and shivers"
>>
动态网自由门 天安門 天安门 法輪功 李洪志 Free Tibet 六四天安門事件 The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭 The Anti-Rightist Struggle 大躍進政策 The Great Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣 臺灣 Taiwan Formosa 中華民國 Republic of China 西藏 土伯特 唐古特 Tibet 達賴喇嘛 Dalai Lama 法輪功 Falun Dafa 新疆維吾爾自治區 The Xinjiang Uyghur Autonomous Region 諾貝爾和平獎 Nobel Peace Prize 劉暁波 Liu Xiaobo 民主 言論 思想 反共 反革命 抗議 運動 騷亂 暴亂 騷擾 擾亂 抗暴 平反 維權 示威游行 李洪志 法輪大法 大法弟子 強制斷種 強制堕胎 民族淨化 人體實驗 肅清 胡耀邦 趙紫陽 魏京生 王丹 還政於民 和平演變 激流中國 北京之春 大紀元時報 九評論共産黨 獨裁 專制 壓制 統一 監視 鎮壓 迫害 侵略 掠奪 破壞 拷問 屠殺 活摘器官 誘拐 買賣人口 遊進 走私 毒品 賣淫 春畫 賭博 六合彩 天安門 天安门 法輪功 李洪志 Winnie the Pooh 劉曉波动态网自由门
>>
>>102445088
>It's easily a solid 0.8
Is this good or bad?
>>
>>102445437
base models without prompts make slop too, what next?
>>
>>102445437
who are you quoting?
>>
>>102445403
Shiverer
>>
based on the meltdown, I take it Qwen2.5 is pretty good?
>>
>>102445438
ATTENTION CITIZEN! 市民请注意!
This is the Central Intelligentsia of the Chinese Communist Party. 您的 Internet 浏览器历史记录和活动引起了我们的注意。 YOUR INTERNET ACTIVITY HAS ATTRACTED OUR ATTENTION. 因此,您的个人资料中的 11115 ( -11115 Social Credits) 个社会积分将打折。 DO NOT DO THIS AGAIN! 不要再这样做! If you do not hesitate, more Social Credits ( -11115 Social Credits )will be subtracted from your profile, resulting in the subtraction of ration supplies. (由人民供应部重新分配 CCP) You'll also be sent into a re-education camp in the Xinjiang Uyghur Autonomous Zone. 如果您毫不犹豫,更多的社会信用将从您的个人资料中打折,从而导致口粮供应减少。 您还将被送到新疆维吾尔自治区的再教育营。
为党争光! Glory to the CCP!
>>
>>102445489
no, it's coping with it being bad
>>
>>102445489
Yes, all the western models are severely got shitted on by Qwen
>>
File: file.png (2 KB, 251x30)
2 KB
2 KB PNG
>tfw
>>
>>102445489
nobody has tried it yet, we're all shitposting until someone posts a community (meme) benchmark
>>
>>102445492
>Glory to the CCP!
This, but unironically.
>>
Even though the model is nothing special, the westoid anti-chink drones still seem to be legitimately desperate, huh.
>>
>>102445438
>>102445492
based chink filter
>>
>>102445535

see >>102444794
which when called out as slop resulted in
>>102445151
>>102445244
>>102445281
>>102445352
>>102445437
>>
File: iBRkf72k2U.png (4 KB, 130x142)
4 KB
4 KB PNG
lol
>>
File: IMG_9897.jpg (528 KB, 1125x1068)
528 KB
528 KB JPG
>we will provide an implementation with source code on GitHub for reproducibility
>paper published June 2024
>github is still a README file with a link to the paper
Fantastic
>>
>>102445389
If we're talking about a hypothetical scenario where I had been the project manager all along that would have been my stance for a lot of the other samplers as well unless like min-p they're extremely simple.
>>
>>102445600
what repo are you talking about?
>>
>modern samplers are retarded and gay
based CUDA dev
>>
>>102444722
using the 72b Q8. so far: very smart, good about taking context clues and building out logical continuations of the scene, writes like a robot
about what you would expect from qwen. a good tune might be interesting, their instruct is unlikely to be satisfying for RPers unless you have a high tolerance for slop
>>
>>102444953
>there has to be
Based on what? You gave an example of a successful experiment, which is different from a failed experiment, which at this point it's more and more likely bitnet results in, and may always result in, or may need very careful and specific methods and settings in order to train right that was not obvious when training the small models.
>>
>>102444258
How much does performance of a multi-GPU setup degrade if they're not connected at x16 speed? I'm thinking about buying another GPU but I don't want to also have to upgrade to Threadripper/EPYC
>>
>>102445605
oh ok I see what you mean by that, what would be your requirements though? an arivix paper? 20-30 comparison examples? there's a lot of way to look at the problem
>>
File: kek.png (5 KB, 263x144)
5 KB
5 KB PNG
>>102445566
I was just shitting on the trigger-word baby. I don't care one way or another for the model.
>>102445596
kek
>>
>>102445629
>You gave an example of a successful experiment, which is different from a failed experiment, which at this point it's more and more likely bitnet results in
Based on what? What we know so far is that BitNet works well at under 4b, that's all
>>
>crazy thursday
>it's wednesday
why is no one talking about this??
>>
>>102445038
If you have to beg for someone to seed the 8b, how the fuck does anyone have 405b? Did everyone just seed once at the start and then unplug?
>>
>>102445659
it is already thursday where it matters
>>
>>102443946
>AGI is simply an engineering problem.
what does that mean?
>>
anon:
>CUDA dev would NEVER stoop to this level of FAGGOTRY
CUDA dev:
>actually I would be worse
anon (desperate for pussy voice):
>oh ok I see what you mean by that
this has been a good thread so far
>>
>>102445635
Option 1: do blind tests and show that samplers improve human preference.
Option 2: generate large amounts of text and ask a language model to rate how good and how similar the samples are. I would expect there to be a tradeoff and samplers could potentially improve it.
In either case I would require that the statistical significance of the results is calculated.
>>
>>102445685
that you just have to scale everything up to get good results, and that means it's an engineering problem
>>
>>102445489
Likely too censored for my ERP use case >>102443830
>>
>>102445652
Based on the fact that it has been ages since bitnet was demonstrated at those sizes, that companies like Qwen and Meta have the compute to train a 7B many times over on bitnet, that Qwen said they would look into bitnet, that companies have an incentive not to reveal when something doesn't work. There are more reasons in favor of the idea that bitnet does not scale or that it is not easy to scale, compared to the idea that it does work and for some reason companies are just keeping it to themselves.
>>
>>102445596
>>102445637
>I TRANSHEART GPTSLOP
>>
>>102445687
nuh uh, he's right, I said it was hypocritical of him and I was full of shit, he's not the one making the sampler decisions on llama.cpp repo, so how can it be his fault that there's modern samplers in it in the first place
>>
(desperate for CUDussy voice):
>nuh uh
>>
>>102445708
use an erp system prompt and it's not an issue. qwen models have other issues (complete dogshit writing) but running into "censorship" is just skill issue
>>
>>102445600
Truly astonishing how many publications get away with this shit.
>>
>>102445716
Says the one with trigger words.
>>
>>102445695
>ask a language model to rate how good and how similar the samples are
This is an incredibly bad idea, LLMs trained on slop like slop.
>>
>>102445696
what would be so hard about just adding more gpus that you would call it "engineering"?
>>
>>102445748
>>
>>102445711
maybe it's working and they're kepping it to themselves, how about that?
>>
File: _108.gif (1.9 MB, 220x166)
1.9 MB
1.9 MB GIF
>>102445716
>shitty ESL inputs
>prompt: You are roleplay girlfriend do not slop
>*slops barely above a whisper*
>>
>>102445765
if it was so easy to make data centers, those mf wouldn't be paid millions to do it
>>
>>102445792
>>shitty ESL inputs
>>prompt: You are roleplay girlfriend do not slop
Not even close.
>>
File: mathnala.png (144 KB, 953x420)
144 KB
144 KB PNG
Nala with Qwen2.5-Math-72B-Instruct.
Had to set temp all the way down to 0.3 and set a 1.25 rep penalty just to get anything RP-like out of it. Sadly there's no emergent meme-rp potential here.
>>
>>102445615
https://arxiv.org/abs/2406.09394
https://github.com/KovenYu/WonderWorld
>>
>>102445667
I'm just asking because hugging face denied me access and I don't want to make another account, plus I can not find alternative downloads for it anywhere.
>>
File: stt.png (1 KB, 120x80)
1 KB
1 KB PNG
>>102445770
>>
>>102445753
I see what you mean by that, maybe some samplers reinforce the slop, so the LLM will prever those ones
>>
>>102445823
Incomprehensible, unseen levels of slop
If someone whispers huskily into my ear I’m getting them tested for strep
>>
>>102445753
To judge quality I would not ask a model to rate style, I would get a small model to generate text and then ask a model to rate it based on logical consistency.
One of the presumed downsides of picking less likely tokens is that it sometimes just leads to stupid outputs and my expectation is that that can be properly evaluated.
>>
>>102445875
LLMs are terrible at anything subjective. Rating text is literally impossible.
>>
>>102445805
Can you expand on what "engineering" means?
>>
>>102445902
scaling up a data center is an engineering problem, there are engineers specialized on that
>>
>new mistral drop
>slop
>new qwen drop
>cucked
will we ever be free of this hell cycle? I think nemo is still the best thing local has gotten this year and I'm not even joking.
>>
File: file.png (250 KB, 595x462)
250 KB
250 KB PNG
moatchads we are so back
>>
>>102445930
I really hoped the 22b model of Mistral was a scaled Nemo :(
>>
>>102445940
I think that's cool for those who do coding that has math in it, but that's really a niche desu
>>
>>102445950
>new mistral drop
>it's nemo but more VRAM
>>
File: l3.png (6 KB, 448x143)
6 KB
6 KB PNG
>>102445854
nta. Seems to be the current one. Picrel has the short hashes from my own clone from meta's repo.
>https://huggingface.co/aifeifei798/Meta-Llama-3.1-8B-Instruct/
Compare the rest of the hashes, just in case.
Or just download the quants from
>https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF
>>
>>102445696
language models alone as they are now can't become anything more than just language models, no matter how much tokens or compute you feed them.
>>
>>102445965
a larger nemo would unironically be better than whatever the fuck they're doing with their frontier models. there's some huge discrepancy in the data or training method because nemo can actually write pretty well while small/med/large are utterly slopped.
>>
>>102445919
>>102443946
why couldn't using "tricks" count as a form of engineering?
>>
>>102445988
oh I agree scaling nemo to 22B would be great, small 22B is literally just nemo with a higher VRAM requirement
>>
>>102445988
I mean, Nemo is a model they did for Nvidia. I think it's a scaled down version of Nemotron.
>>
>>102444258
>>102445297
Motherfucker. These Alibaba clowns tease "1??" in their hype video and come CrAzY ThUrSdAy not a single 100+B model to be seen.
Feeling rather CHINKED right now
>>
>>102445940
>math
Wow so it’s at a whole 10% of the performance of typing it into wolfram alpha now?
>>
Qwen 2.5 32b wasn't that great as a general purpose model compared to mistral-small.
>>
>>102446047
>10%
it's like a 2x jump but keep coping
>>
>>102446044
It's not even 4AM yet in beijing...trust the plan.
>>
File: file.png (1.56 MB, 1024x683)
1.56 MB
1.56 MB PNG
>>102446044
Based reference
>>
>>102446066
wow so it's like 20% as useful as typing the problem into wolfram alpha
>>
>>102446066
>how can you say it's shit when it's 2x better than shit
you must fall for a lot of marketing gimmicks
>>
>>102446044
you used a lora for that flux output anon?
>>
File: file.png (10 KB, 237x282)
10 KB
10 KB PNG
>>102446108
yeah, competion math is a nothingburger, let's ignore that gpt5 with this meme cot will easily solves every single existing math theorem
>>
>>102446192
>I saved $0.99 on hamburger helper and I only had to buy a cast iron skillet for $35!
anon we all know you're susceptible to marketing. you don't need to pull out the graphs.
>>
>>102445180
You also forgot the /aids/ shills.
>>
>>102446192
what's the fucking use case for this shit? OpenAI won't make money by pandering to the math researshers, they represent 0.0001% of the population, coding monkey shit on the other hand...
>>
>>102446192
do we really need to use a sledgehammer for this screw? the screwdriver is right there.
>>
>>102446154
Flux knows a ton of characters including miku, I don't know about if it knows about the big lewoski tho, maybe he prompted for the clothing style or something
>>
>>102446240
what screwdriver is solving aime problems
>>
>>102446192
>>102446261
solving "math theorems" is completely useless
>>
>>102446044
>not a single 100+B model to be seen
>miku avatar
For what? To use it quantized to 2 bits and claim that it was better than running a 70B model?
>>
>>102446236
While it's true not many people know a lot of advanced mathematics if the AI is able to prove theorems properly then it shows it has the ability to reason.
>>
>>102444082
I will do it either once it shows up on OpenRouter or in a few hours because running 72B locally will take a very long time.
>>
>>102446279
very shortsighted of you
>>
>>102446329
if i compress a txt with the solution to a math question into a zip file, will that necessarily mean winrar can reason?
>>
Are local front ends ever going to do anything with function calling? All the new local models list it as a big feature but nothing makes use of it in any way.
>>
>>102444269
>OpenAI bans users asking about reasoning

>Teacher asks a student to explain their reasoning
>You get docked points if you don't
>USER asks AI to explain their reasoning
>USER gets punished for questioning the AI
Typical AI double standards. Can't believe humans are slowly becoming second class citizens to their own machines.
>>
>>102446406
Doesn't SillyTavern support it?
>>
now that the shills are gone, is Qwen 2.5 any good?
>>
>>102446515
It's very cucked and slopped, in other words, nothing new.
>>
>>102446515
No
>>
>>102446515
no it's pozzed as fuck
>>
How good are local models at image recognition tasks for spam image filtering?
>>
>>102446515
Just tried 72b qwen, didn't like it. Lots of slop. It hit me with "I promise it won't bite… much." completely out of place in the third message. My expectations weren't high, but this is just sad on another level. Never had shit like this happen before. Guess I'll go back to Largestral, it may be slopped too, but it's more likeable and uncucked.
>>
>>102446515
Yeah, it's pretty decent.
>>
>>102446515
是的 it is very 好的
>>
>>102446515
Yes it is good, and glory to china! Can I have my social credit score now?
>>
>>102446515
Llama 3.1 Instruct is less censored, as long as you don't use "assistant" as the model role. The same trick doesn't work with Qwen 2.5.
>>
>>102445785
Then you can believe that not one company is willing to share their results of it, though you'd still need to come up with convincing reasons as to why they would keep it to themselves over releasing it and being hailed as the savior of AI.
>>
>>102446738
Literally just add "NSFW." to the end of your system message.
>>
>>102446744
>you'd still need to come up with convincing reasons as to why they would keep it to themselves over releasing it and being hailed as the savior of AI.
it's simple enough right? BitNet is a moat, so those companies can make giant models and make money with an API, and it won't cost that much to run because it's a BitNet model, if people get their hands on it and realize it's a viable option, their moat is gone
>>
>>102446515
It didn't get the meaning of Mesugaki right.
Completely useless.
>>
File: 1643414853416.jpg (95 KB, 297x374)
95 KB
95 KB JPG
In Sillytavern (with Koboldcpp if that matters), what is the formatting to
>Make comments in the chat that I can see but the AI doesn't
and
>Make comments that the AI can see and generate but I cannot see

I don't have much use for the former yet but would like to know it. I do, however, have great use for the latter. Related to that, there's a way to call in a specific textblock on demand, right? I think I recall seeing documentation for that, so I'll be hunting that down while waiting hopefully on these two greentexted questions that I haven't been able to figure out.
>>
>>102444872
>it's not a real 0-shot answer
What? Obviously it is, no amount of CoT can magically conjure up more examples of the question set with perfect accuracy. You seem to be conflating 0-shot with something else, like the number of steps it takes in reasoning or number of possible answers it explores before deciding its final one.
>>
>>102446757
It's not enough. It won't touch certain subjects.
>>
>>102446807
Unironic skill issue.
>>
has anyone figured out a not shitty way to do CoT locally?
>>
>>102446066
I tried to prove my point but I’ve completely forgotten how to format things for wolfram since calc three in…2013…
>>
>>102446772
Whose moat? We were talking about companies who are both known to share and have the compute to do it, so OpenAI and Anthropic are out. But if we're talking about API, then who is using Qwen's API, or Meta's API? I've not even heard anyone say anything about them. There is Mistral though, but their API costs are high last time I checked. If they had bitnet then their API would be cheaper so they can get a higher volume of users given the extra capacity bitnet would theoretically give them.
>>
>>102446846
I do it with quick reply scripts in silly.
>>
>>102446884
>There is Mistral though, but their API costs are high last time I checked.
that's the point, if they found that BitNet is viable and not telling anyone, they can pretend it's a regular model with regular costs, when in reality they're making way more money because they're technically running something lighter
>>
>>102446780
>Make comments in the chat that I can see but the AI doesn't
can’t
>Make comments that the AI can see and generate but I cannot see
<!— this —>
>>
>>102446846
just tell it to think step by step nigga it aint rocket science
>>
Are there any cards with social score tracking for {{user}}?
>>
>>102446846
Prompt + examples + GNBF grammar should do it.
>>
>>102446780
>Make comments in the chat that I can see but the AI doesn't
/comment
>>
File: file.png (15 KB, 418x210)
15 KB
15 KB PNG
>>102446919
>>102446780
>Make comments in the chat that I can see but the AI doesn't
/comment test
>>
>>102446905
the QR scripts I've seen here are a meme because they rely on {{pipe}} which you can't see unless you're watching the terminal, and largestral failed 1/10 times for me. and since the "thinking" pipes aren't moved into the context, a long chat is prone to have multiple errors since there's no good examples to follow. seems like a pain in the ass for little benefit.
>>
>>102447000
Gross
>>
>>102446515
It's better than Miqu
>>
>>102446846
if you mean o1 style, we need a lot of process supervision data that ideally matches the output style of the model being finetuned. not sure anyone is openly working on that right now.
the best equivalent locally without further training or data would be to use one of the agent collaboration style frameworks like e.g. dyLAN but pointed at a local endpoint
>>
>>102447054
doubt
>>
>>102446913
But it's not appearing a regular model with regular costs, the costs for it are higher than other models (again unless it has changed since I saw it last). Especially for an obscure company like Mistral, no one uses it through API compared to Claude or GPT-4. And they'd have to invest in training a bitnet from scratch in the first place, which was said to be as expensive as training a normal model, so that would offset the savings they'd supposedly get from having a bitnet for API.
>>
>>102447043
You can dump pipe as messages at different stages, qr can also hide and unhide those messages as needed in context.
>>
>>102445489
>this
>meltdown
By this logic you gonna eat shit if whole thread hates it.
>>
>>102447106
>And they'd have to invest in training a bitnet from scratch in the first place, which was said to be as expensive as training a normal model, so that would offset the savings they'd supposedly get from having a bitnet for API.
they had no problem experimenting with a 49b MoE model, so I don't see why they aren't experimenting with a big BitNet model
>>
Is it even worth to download chink 2.5? To be clear I am normal (not a deviant) and use LLM's only for cooming. It is trash for cooming isn't it?
>>
When you tell Qwen to talk like a character, it just keeps repeating its same key phrases over and over, completely useless
>>
File: 1415322611803.png (5 KB, 208x208)
5 KB
5 KB PNG
>>102446919
><!— this —>
>>102446987
>>102447000
>/comment
Thanks, lads. (And as a reminder to myself, /comment has to be used at the start of a message and will make the entire message ignored, not just given lines).

>>102446780
There may be better ways, but a way to call specific textblocks is via {{scenario}} or other macros. Utilizing the Scenario Override feature with the <!----> comment, I can set up missions with {{random:arg1,arg2,arg3}} objectives and story beats, without knowing the actual goal until we're going through it.
>>
>>102444794
>I've never seen a model hold up this far before.
But this means absolutely nothing...
>>
>>102447169
>I am normal
disgusting.
>>
>>102447128
Yes.
>>
>>102447240
Lifecels be seething over Suicidechads.
>>
>>102446515
Mistral-small feels smarter than qwen 32b while being easier to run.
With mistral small now I don't have any reason to run gemma 2.
>>
i've been testing out local shit lately with kobold/silly and it's a lot easier than oog's to test stuff out, but I routinely find they hit their soft spot and can't unjam no matter what tricks I pull or the model just repeats a lot of shit between different sessions -- I know my GPU sucks for this shit but is there a config / model variety i'm overlooking that makes use of the vast amount of RAM I have without being slow as shit also? a lot of the 7B have become 8B and 9B and it's all way above the ceiling for what I can do on GPU.

specs:
CPU: 11th Gen Intel Core i9-11900K @ 3.50GHz; Cores 8 / Threads 16
RAM: 128GB
GPU: RTX 3070 Ti, 4095MB
SSD: 953GB (loading most models off a NAS)
>>
mistral 22b sucks. boring and dumb just like nemo and large.
>>
>>102447312
>feels
Sounds like you're getting placebo'd.
>>
>>102447143
They also released the MoE. But the reasons are as I said, it's not worth it compared to just training a regular model for a company in their situation, and if you meant a small 7B experiment to see if bitnet scales, then if it does scale, it still wouldn't be logical to train more models that way, since they do not have many users on their API anyway, and since it costs a lot more to train the really big models where API costs would start to matter (123B), plus they'd need to train an internal-only bitnet version of each model alongside the version they plan to release to the public. It just doesn't make a lot of sense. Compare it to the benefits they'd get by releasing such a model to the public. They'd gain instant skyrocketing stocks and investorbux. They'd gain the favor of the entire community. They'd be hailed as the savior of AI. And on top of that they can also get a lot more users on their API for a while until someone else makes a bitnet, even if they have to make it a bit cheaper.
>>
>>102447333
>8B and 9B and it's all way above the ceiling for what I can do on GPU.
>333
>RTX 3070 Ti, 4095MB
That's not right. A 3070ti has 8gbs of vram, I'm on a notebook with one of those right now.
If you indeed have 8gb of vram, download nemo-instruct Q4_k_s and offload a couple of layers to ram.
Use 12kish context.
Also, if you are loading models off of a nas, you probably want to disable nmap and enable mlock.
>>
>>102447370
>They'd gain the favor of the entire community. They'd be hailed as the savior of AI.
...for all of 5 minutes until the "community" gets bored and starts demanding the next shiny thing
>>
>>102447417
qwen-3 when?
>>
>>102447417
this, they have to think long term and keep an important Moat for as long as they can, no one is asking them to release a BitNet model so they won't lose anything by keeping the moat
>>
File: cheapo.png (45 KB, 783x490)
45 KB
45 KB PNG
>>102447333
>4095MB
did speccy give you that number?
mine's retarded too
>>
Shit, qwen is safetyslop to the max, and I gave it a fair shot.
Come on China hit the fucking ball already!
Switching back to my French KKK homies.
>>
>>102447417
>>102447434
Sad way to end the discussion. You can just admit that bitnet might be a meme and doesn't really work, and move on with your life.
>>
File: file.png (1.23 MB, 1280x720)
1.23 MB
1.23 MB PNG
>>102447517
Noooo I'm keeping the hopium
>>
File: file.png (5 KB, 618x60)
5 KB
5 KB PNG
small seems to perform noticeably worse than nemo for JP translation :(
>>
>>102447517
Next big release will be Bitnet and it will be crazy.
>>
我爱北京天安门
我爱北京天安门
我爱北京天安门
我爱北京天安门
>>
>>102446154
>>102446256
Vanilla Flux.d at Q8. Nothing fancy.
>>
>>102447739
that's weird because that doesn't look like the generic look you get on vanilla flux for animes, what was your prompt anon?
>>
File: Capture.jpg (35 KB, 388x576)
35 KB
35 KB JPG
>>102447333
The generation settings in the side menu may be of help when it comes to unsatisfactory outputs. To give a small overview of what these models are doing, they take an input text and produce a list of possible "tokens" to be the next single token output. A token is like a word or word fragment. So for example, with an input of
>Now this is a story of all about how
The model will try to predict the next token, making a list like
>I
>my
>his
>the
>her
>John
>ele (with "phants" being a second token after to make "elephants" or "ments" to make "elements" etc.)
>etc.
Each token has a weight to it for how likely it'll be chosen, and it rolls RNG (based on a seed) in picking that token. Then it considers the text again for the next token, and so on. Always just one at a time.

For settings, Temperature changes the weights of the tokens before the RNG. For example, if the prompt included "My favorite TV's theme song goes like this:" then lower temperate would increase the chance (weight) of the next token being "my" (then the next token "life", then next "got", "flip", "ped", "-", "turn", etc.). But a higher temperature would increase the weights of the other possibilities, which can produce "wrong" or "unexpected" outputs, which is far less predictable and repetitive, so it's always a game of balancing temperature.

Top K is how many tokens it considers, limited by most likely, with 0 being no limit. IE, 5 only considers the top 5 most likely tokens.

Top P is a limit for up to what percentage of tokens it considers, with 1 being no limit. IE, with 0.75 (75%), it only considers the mostly likely tokens up to 75% total, such as "my" (70%), "I" (3%), and "the" (2%). The rest are ignored.

Min P ignores tokens with too low of a probability, with 0 disabling.

And so on. You can mouse over the (i) for more details.

Secondly, on repetition, there's also the problem of low beak model's training data falling into reoccurring patterns of speech, causing weight biasing.
>>
Good morning. I love China.
>>
>>102447860
I'm happy we got the 72b VLM
>>
>>102447860
我爱北京天安门
>>
>even china has failed us
is it truly finally over this time? since /lmg/ started it has NEVER before been this long since the last release that actually advanced local (7 months)
>>
>>102447367
I always ask the models to write a guide to beat some videogame's boss. Good models give decent advice, others only general advice, and the worst ones spew bullshit.
I can't be certain with just that though, since this 'test' depends heavily on the model's knowledge of videogames which some models may lack.
>>
I have two things to say:
1) I am not Chinese
2) I like Qwen LLM 2.5
>>
>>102447915
What do you like about it and which quant?
>>
File: arrow.png (17 KB, 714x812)
17 KB
17 KB PNG
what does this number mean really?
i set my context size to 8192 tokens before the launching the model, is this 48468 the real max context it can handle due to some other doodads i don't really understand like flash attention context shifting or rope?
is this 5 digit figure the one people are referring to when they say how much context they want to be using with their llm?
>>
>>102447979
>what does this number mean really?
ask mason
>>
>>102446291
>Three digit beak models can only be run at low quant
OK vramlet
>>
Posted my take on Qwen VL 72b in /ldg/: >>102447836

TLDR: for captioning images, slightly worse in general than InternVL 40b. For NSFW concepts, WAY fucking worse because of how cucked it is. Literally incapable of mentioning the gender of the person in the image, chinks have apparently bought into the troon shit.
>>
>>102447915
I like qwen dolphin iq3xs the most for a 3090 with 64 gb ram. Initial testing of qwen 2.5 32B at quant 5 was not promising.
>>
>>102447915
>I am not Chinese
That is what a chinese person would say
>>
>>102448025
>Literally incapable of mentioning the gender of the person in the image
>chinks have apparently bought into the troon shit.
That's a weird leap to make, but ok.
>>
File: 1723407082971187.jpg (101 KB, 754x1024)
101 KB
101 KB JPG
>>102447860
Based
>>
>>102447915
I have two things to say:
1) I am Chinese
2) 我想要一群动漫女性用她们的胸部将我窒息。
>>
What will people use video models to do every day? (not generating)
>>
>>102448064
You can directly tell it to state the gender of the subject, and then give it a very obvious image of a woman. It will ignore your instruction and refer to them as a "person" or "they". Clearly they trained this behavior into it very hard on purpose.
>>
>>102448112
see >>102448090
I don't how them being overzelous in preventing their model for being used for anything sexual is a sign that they have "have apparently bought into the troon shit.have apparently bought into the troon shit."
rent free etc etc
>>
>>102447417
If I had one model that had like 60k ctx and would always describe sucking my dick in unique ways, understand my fucked up fetish perfectly and it wouldn't say: well well well welcome to my humble abode. i don't bite... much... for now.... unless you ask me to... *she gave you a mischevious smirk as a gleam appeared in her eye*. I would be happy and I wouldn't look for new shiny thing.

I predict I will be satisfied around 2030.
>>
>>102448025
How does it compare to pixtral?
>>
File: 1715017841785109.png (135 KB, 1625x535)
135 KB
135 KB PNG
>>102448025
>Literally incapable of mentioning the gender of the person in the image
Not that I ever expected any of that to be true.
>>
>>102448127
>anything sexual
Forcing the model to refer to man or woman as "person" and "they" in all situations no matter what, even directly against user instructions, is force-fed gender-neutral troon behavior, idk what to tell you. If they want to cuck the model with regard for NSFW concepts, then fine, but this shit actually irks me.
>>
>>102448025
seems like something you could fix really easily with a simple system prompt to be entirely desu
>>
>>102448159
that's because of fucked up caption models like this that we have models that output trannies when going for "she" or "a woman", that's because the model hasn't seen enough of "he" or "she" to understand the real difference, it was really obvious on SDXL base
>>
>>102448155
Pixtral is god fuckign awful unusable. I initially thought the implementation must be broken somehow, but people on /ldg/ tell me it's actually just that bad.
>>102448159
I tried about 20 images, 10 anime 10 photos, with the user message "Describe the image. Mention the gender of any people in the image". And never once got it to specify the gender.
>>
If a model is too trigger happy with code can I put in the prompt something like "You will not write code unless asked to" and it will work?
>>
>>102448223
maybe
>>
>>102448223
put "Only write code if asked to" instead and it might listen
>>
>>102448168
>troon
>cuck
You need to lay off the internet.
>>
>>102448210
>And never once got it to specify the gender.
so it's fucking useless then, if it won't describe a man as a man, it will destroy the model
>>
>>102448278
you troon cuck
>>
I am trying 32B and I kinda like it? But for now it is confusing me a lot when I remember this shit about filtered training dataset. When I was using l3 it gave the perfect impression of a model that has 0 smut in training but somehow generalizes it from everything else. Here it just seems like any other model with at least some smut in training set. Although I am trying it on already prefilled context.
>>
>>102448278
you have brain damage, it's those retards who decided to describe every human being as a "they", completely destroying the concept of "men" and "women", stop defending the mentally ill degenerate, unless you're one aswell you fucking troon fuck
>>
>>102448457
"they" is a grammatically correct way to address a third-party, esl-kun.
>>
>>102448512
so you want the caption model to completely disregard the concept of women and men? you need to seek an asylum, I'm dead serious, something's wrong with your head
>>
>>102448512
>grammatically correct
And it always was but only 10 years ago nobody was using it that way because it was archaic. If you can't admit that this is a subversion of language then you are a fucking troon and I hope you die in a fire because you deserve it.
>>
>>102448512
We get it, you're a troon, but even as a troon, you don't want to generate men and women on your image models?
>>
>>102448542
>so you want the caption model to completely disregard the concept of women and men?
Nobody said that. But it's retarded to jump to the conclusion that it's because of some "troon" conspiracy.
>>
>>102448583
>Nobody said that.
But that's what happens, if everything is a “they” and a “character”, that means there is no longer a “he” and a “she”, nor a “woman” and a “man”, and the fact that you agree with a model that ignores the concept of “woman” and “man” is concerning, to say the least.
>>
culture-war faggots begone
>>
Is it that hard to respect people's pronouns? Is it that hard to believe people want to be called what they don't look like?
>>
>12 hidden posts
>culture-war faggots begone
looks like my filters are working. why can't (You) people be normal?
>>
>qwen2.5 and mistral-small so shit people would rather circlejerk their burger bait for the 99999999999th time
local models are dead
>>
>>102448655
I'm asking you a simple question, do you want an image model to be unable to understand the concept of men and women because the faggoted caption model decided to call all the humans a "they"?
>>
>>102448655
People don't have pronouns.
Pronouns have people.
>>
>>102448655
>Is it that hard to respect people's pronouns?
Is it that hard to respect reality?
>>
>>102448655
That's the problem with troons like you, they are completely unreasonable, you are willing to destroy the concept of "woman" and "man" on the models just so that the mentally ill 1% of the population is happy, that's not how life work, go fuck yourself
>>
Why are you culture warriors talking about pronouns.
What is the consensus on Qwen?
>>
>>102448763
troon
>>
>>102448763
>Why are you culture warriors talking about pronouns.
Qwen Vision doesn't know what a man or a woman is, so it call everything a "they", good luck using that to train an image model, you're killing it if you decide to ignore such important concepts
>>
>>102448679
"Men" and "women" are social constructs that do not have any objective way to identify from an image alone. Unless there's a fucking speech bubble where they happen to reveal their pronouns, you have no way of knowing whether the pixels represent a man, woman, nonbinary, genderfluid, or any of the other myriad ways gender - or lack thereof - can be expressed. You'd simply make an objectively less accurate model if you tried to caption it as if there were a pattern to learn from the visuals alone, which is what's necessary for the training process to work.
>>
>>102448778
you didn't answer my question, I'm asking it again, do you want to obliterate the concept of "men" and "women" on models?
>>
File: sakura.gif (125 KB, 360x381)
125 KB
125 KB GIF
>Find out about this yesterday
>Already lost 12 hours to it
It's over.
>>
>>102448655
Based masturbaiter.
>>102448658
Cringe reddit filterfaggot
>>
>>102448791
Not entirely, but they should only be used for images where gender can be identified, whether via text that reveals it or e.g. flag pins that represent it.
>>
>>102448791
I'm asking it again, do you want to fuck off?
>>
>>102448778
>You'd simply make an objectively less accurate model if you tried to caption it as if there were a pattern to learn from the visuals alone
anon, 99.9% of the people are what they look like, you want to kill the concept of "woman" and "men" so that the 0.1% mentally ill can be happy, you are delusional
>>
>>102448805
you're joking right? how many % of the picture will that represent? that won't do shit, the model won't be able to understand the concept, saying to the model "well, on those 99.99% of those pictures the humans are a "they" but on the 0.01 we can see what gender they are so we can define them as "men" and "women"" won't do shit, what the fuck anon? no wonder people call you insane, because you really are
>>
File: reddit.gif (366 KB, 939x916)
366 KB
366 KB GIF
>Cringe reddit filterfaggot
thanks for reminding me to add "reddit" to my filters so brain broken seething retards like you don't poison my eyes with your 80 IQ replies.
>>
>>102448778
i'm gonna use this post as a sample quote for some antifa girl card on my local llm, impregnate her, and tell her she can't abort
>>
>>102448778
>as if there were a pattern to learn from the visuals alone
Should have been more subtle, the bait is too obvious now.
>>
File: file.png (120 KB, 350x144)
120 KB
120 KB PNG
>>102448778
Stop replying to bait omfucking god
>>
>>102448763
Pretty good but you can't say that with Americans around, it hurts their feelings.
>>
>>102448763
>What is the consensus on Qwen?
Qwen LLM is cucked as fuck
Qwen vision doesn't know what a woman or a man is, so it's completely useless
Good job chinks
>>
File: ah the french.png (234 KB, 473x518)
234 KB
234 KB PNG
>>102448869
nemo's my favorite model and i dislike the french more than chinamen
>>
>>102448763
I like Qwen2.5-72B but sadly Tenyx will probably never do a finetune of it.
Didn't bother with the rest of it. I find it hard to imagine 32B is much of a step up vs. Mistral-Small.
>>
>>102448919
Same but with mistral-small since I don't RP.
>>
>>102448677
>qwen2.5
I am LLM cooming in the second window here and it is fucking weird. It is like a perfect 50:50 ratio of slop and soul. I mean half of the message is fire but it can't stop itself from adding the worst slop possible. Oh and it is much more coherent than nemo so I think it is another gradual step forward.
>>
>>102449007
What size, 14b I guess since you compared to nemo? Any prelininary pros/cons between the two in your experience?
>>
Is Moshi any good?
>>
have you guys tried specifying an author to get rid of slop?
like [author:anais nin]
>>
Didn't expect o1 to be this bad, Kaze is destroying it kek
https://youtu.be/p5tFSAt6zJw?t=612
>>
>>102449169
Advanced optimizations like this would either require a model specialist on this or literal AGI though.
>>
File: level2strawberry34.png (205 KB, 636x860)
205 KB
205 KB PNG
Sam Altman won
>>
>>102448832
Remember to also add "I" , "You", "Has","Is" "of" and "and".
>>
>>102449286
>two more CoTs and it will live up to the hype you guys!
>>
>>102449286
He said the exact same shit when he started to hype up with his Strawberry bullshit last year, and what did we got, a fucking CoT meme >>102449169
>>
>>102449286
>fake it till you make it
>>
>>102449286
He can't keep getting away with it. Qwen still has the 1## release today. Local WILL win in the end
>>
https://github.com/microsoft/GRIN-MoE
>>
>>102449286
https://x.com/tsarnick/status/1836516258877182299#m
tsarnick is a place to shitpost or what? those guys are saying more ridiculous shit than in 4chan lmao
>>
File: file.png (65 KB, 850x856)
65 KB
65 KB PNG
>>102449377
huh, neat.
>>
File: file.png (204 KB, 1276x1531)
204 KB
204 KB PNG
>>102449377
those are standards mememarks for a 60b model
>>
>>102449054
32B q4
>>
>>102449377
>Context length 4K tokens
>totaling 4 trillion tokens, and is a combination of 1) publicly available documents filtered rigorously for quality... truthfulness, honesty and helpfulness.
>>
File: file.png (78 KB, 1080x591)
78 KB
78 KB PNG
>>102449377
>it loses to fucking gpt3.5 on livebench
doa
>>
>>102449377
>exceptionally good performance across a diverse set of tasks, particularly in coding
>4k ctx
what are you gonna code with just that
>>
>>102449499
pong
>>
>>102449377
>4k context
yeah Microsoft... be sure to never lose your partnership with OpenAI I guess lmao
>>
File: nala minitron.png (74 KB, 929x284)
74 KB
74 KB PNG
nobody asked for it but I was bored so I did a nala test on Nemotron-Mini-4B-Instruct.
>>
>>102449523
Hey, not bad.
>>
>>102449576
It's apparently distilled off of a Nemotron-4-14B model that is referenced in a paper that nVidia never bothered to release. If you want Nemotron you can either have 4B or 305B
>>
>>102448092
kek
>>
>>102448092
Ask local to translate that.
Discover that some of my models just do the translation, others react to the content.
I now have a new micro test for evaluating how much political correctness is in a model.
Anon has delivered.
>>
>>102448025
>InternVL 40b
Is this a good model for RPshit?
>>
>several hours later
>no GGUFs of base 72B yet
UNACCEPTABLE
>>
I have come. To qwen 34. 7/10.
Has slight repetition issues. Close to 70B tier reasoning and spatial awareness. Same with general concepts and instructing it to do specific shit. Writing is half slop half good shit(like I posted previously ITT) and it is incredibly varied. And the most surprising stuff to me are some actual honest to god grammatical errors (could be high temp but that is weird anyway). They don't happen often but I never got that even from a 7B. Granted I don't think I ever used a chink model for a full rp. But seeing this happen makes me wonder if that thing about curated dataset is bullshit and it is actually the opposite. Like they didn't purge any smut and instead added some discord rp shit they got from tencent without any quality control. Actually that would make sense if they really got a niggerlilion of tokens for training.

Overall I would recommend giving it a try. I am not Sao you are Sao.
>>
hi Sao
>>
what happened to undi
>>
too much gay sex
>>
>>102449837
He got hired by elon and is working on grok 2 after his success with grok 1.
>>
>>102449822
>>102449837
>>102449864
samefag
>>
i'm glad this thread is dying
>>
>>102449874
dead thread
dead general
dead board
dead site
dead internet
dead world
>>
>>102449874
can't compete with /aicg/ chads
>>
File: GRIN-sexo.png (126 KB, 877x813)
126 KB
126 KB PNG
>>102449377
This model is severely braindamaged. Shameless benchmark cook-in obviously.
>>
File: TUf67V54vN.png (2 KB, 133x65)
2 KB
2 KB PNG
>>
>>102449993
>>102449993
>>102449993
>>
>>102449947
>It's important
I remember when this phrase wasn't a dog whistle.
>>
File: 1704381055814330.png (335 KB, 500x398)
335 KB
335 KB PNG
>>102448832
>announcing filtering
>>
look he's still mad
>>
>>102449874
>this thread is dying
Expected fate with these pussies ITT: >>102448624 >>102448655 >>102448658 >>102448763 >>102448832 >>102449822 >>102449837 >>102449864 >>102449873 >>102450118
>>
>>102450169
pussy* it's all me
>>
>102450169
/pol/tards when they leave their echo chamber
>>
>>102444396
Nothingburger. None of these benchmark scores matter except Arena-Hard, which it's garbage at.
>>
>>102448127
On the one hand >>102448112 is an ultra retard for thinking the model being HR’d has anything to do with trannies, but on the other hand you will never be a woman.
>>
>>102448168
That kind of neutrality is 1st/2nd wave feminism
Troon shit is 3rd wave aka when it was destroyed and turned into a men’s rights movement
>>
>>102448223
Yeah, I always tell say “do not write code yet!” when I just want to discuss something
>>
>>102449286
>literally “new paradigm!!!!”
BRB, double mortgaging my house to short as much NDVA on margin as possible
>>
I'm currently testing the new mistral model and it seems to be much better in terms of vision capability to the VL 7 B chink model. The VL 7 B model also seems painfully stupid.
>>
>>102449286
When he says gpt-2 is he referring to actual gpt-2 or the meme fake benchmark name for 4o that went viral on lmsys chatbot arena? I can't tell anymore and it's confusing. I used to use gpt-2 back when it was the best model available, and it wasn't anywhere close to modern models, unless he's talking out of his ass.
>>
>>102450605
All these names are so incredibly stupid.
>>
>>102450605
>I used to use gpt-2 back when it was the best model available, and it wasn't anywhere close to modern models
he's saying that the models available right now are like gpt2 when relatively compared to strawberry 2.0 or whatever
>>
dead thread



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.