/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 12/12/24(Thu)20:51:13 No.103499479

File: miku.webm (3.73 MB, 1080x1080)

3.73 MB WEBM

/lmg/ - Local Models General Anonymous 12/12/24(Thu)20:51:13 No.103499479 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Miku Edition

Previous threads: >>103478232 & >>103473510

►News
>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
>(12/12) LoRA training for HunyuanVideo https://github.com/tdrussell/diffusion-pipe
>(12/10) HF decides not to limit public storage: https://huggingface.co/posts/julien-c/388331843225875
>(12/10) Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210
>(12/09) LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
12/12/24(Thu)20:53:45 No.103499499

Anonymous 12/12/24(Thu)20:53:45 No.103499499

>scat fetish general

Anonymous
12/12/24(Thu)20:54:21 No.103499503

Anonymous 12/12/24(Thu)20:54:21 No.103499503

>>103499499
hi petra

Anonymous
12/12/24(Thu)20:56:21 No.103499520

Anonymous 12/12/24(Thu)20:56:21 No.103499520

>>103499503
>not denying it
kek https://desuarchive.org/g/thread/103478232/#q103498549

Anonymous
12/12/24(Thu)21:10:35 No.103499648

Anonymous 12/12/24(Thu)21:10:35 No.103499648

>>103499520
one schizo anon doesn't represent the whole general, we're better than this :(

Anonymous
12/12/24(Thu)21:14:43 No.103499671

Anonymous 12/12/24(Thu)21:14:43 No.103499671

>>103499479
fuck off petra

Anonymous
12/12/24(Thu)21:20:58 No.103499719

Anonymous 12/12/24(Thu)21:20:58 No.103499719

Do you think there'll be a good language model some day?

Anonymous
12/12/24(Thu)21:26:55 No.103499774

Anonymous 12/12/24(Thu)21:26:55 No.103499774

>>103499719
I don't think so, but there are some pretty decent large language models.

Anonymous
12/12/24(Thu)21:38:02 No.103499853

Anonymous 12/12/24(Thu)21:38:02 No.103499853

>>103499719
>>103499774
You're asking this minutes after Phi 4 literally just released, similar to llama 3.3 70b but in way fewer parameters, like 14b, lmao

Anonymous
12/12/24(Thu)21:40:02 No.103499875

Anonymous 12/12/24(Thu)21:40:02 No.103499875

>>103499853
>Enabling AI innovation safely and responsibly
You can keep it.

Anonymous
12/12/24(Thu)21:41:11 No.103499886

Anonymous 12/12/24(Thu)21:41:11 No.103499886

>>103499853
To reiterate: there is now a 14b model that performs like llama405b

Anonymous
12/12/24(Thu)21:42:43 No.103499897

Anonymous 12/12/24(Thu)21:42:43 No.103499897

>>103499853
>>103499886
The Phi people are infamous for shamelessly gaming benchmarks. Their models are always absolute trash in actual use, stop falling for it.

Anonymous
12/12/24(Thu)21:44:52 No.103499914

Anonymous 12/12/24(Thu)21:44:52 No.103499914

>>103499853
>Still shilling Phi after the 3 horrible previous cucked versions
c'mon anon

Anonymous
12/12/24(Thu)21:46:15 No.103499922

Anonymous 12/12/24(Thu)21:46:15 No.103499922

File: 2024-12-12_18-45-35.png (319 KB, 1237x892)

319 KB PNG

>>103499853
liar liar pants on fire

Anonymous
12/12/24(Thu)21:46:41 No.103499928

Anonymous 12/12/24(Thu)21:46:41 No.103499928

>there is now a 14b model that performs like llama405b
this is definitely 100% true

Anonymous
12/12/24(Thu)21:47:51 No.103499935

Anonymous 12/12/24(Thu)21:47:51 No.103499935

>>103499853
You really believe that anon? Microsoft that made a model as good as gpt4 on a 14b model, if this was true they would've kept for themselves and stopped doing partnership with OpenAI

Anonymous
12/12/24(Thu)21:50:34 No.103499952

Anonymous 12/12/24(Thu)21:50:34 No.103499952

>>103499479
offtopic but does anyone have a full song list or source for OP's webm-related

Anonymous
12/12/24(Thu)21:52:02 No.103499961

Anonymous 12/12/24(Thu)21:52:02 No.103499961

>>103499853
if you were gonna make this fakepost why would you not choose a model line that actually has a good reputation

Anonymous
12/12/24(Thu)21:53:30 No.103499973

Anonymous 12/12/24(Thu)21:53:30 No.103499973

>7 (Yous) for a fake news
/lmg/ has fallen...

Anonymous
12/12/24(Thu)21:54:47 No.103499982

Anonymous 12/12/24(Thu)21:54:47 No.103499982

>>103499973
not even one of the replies is taking it seriously or excited about it

Anonymous
12/12/24(Thu)21:55:27 No.103499986

Anonymous 12/12/24(Thu)21:55:27 No.103499986

billions must phi....

Anonymous
12/12/24(Thu)21:56:25 No.103499989

Anonymous 12/12/24(Thu)21:56:25 No.103499989

File: npjopxbhsi6e1.jpg (151 KB, 1066x689)

151 KB JPG

Accelerate!

Anonymous
12/12/24(Thu)21:57:14 No.103499993

Anonymous 12/12/24(Thu)21:57:14 No.103499993

>>103499989
kek, where did you get this?

Anonymous
12/12/24(Thu)21:57:21 No.103499995

Anonymous 12/12/24(Thu)21:57:21 No.103499995

File: pretending retard.jpg (33 KB, 346x636)

33 KB JPG

>>103499973
>

Anonymous
12/12/24(Thu)21:57:56 No.103499999

Anonymous 12/12/24(Thu)21:57:56 No.103499999

>>103499993
https://www.microsoft.com/en-us/research/uploads/prod/2024/12/P4TechReport.pdf

Anonymous
12/12/24(Thu)22:00:06 No.103500009

Anonymous 12/12/24(Thu)22:00:06 No.103500009

>>103499989
>>103499999
>SimpleQA
>Every model fail hard on that
now so Simple huh

Anonymous
12/12/24(Thu)22:00:09 No.103500011

Anonymous 12/12/24(Thu)22:00:09 No.103500011

As someone who actually tested the old phi models, they actually did perform pretty well, on the things they were trained to do well on. And they fell apart for basically everything else. It's not a model relevant to us. It could be a model relevant to people who want to use models for math and... riddles.

Anonymous
12/12/24(Thu)22:01:18 No.103500022

Anonymous 12/12/24(Thu)22:01:18 No.103500022

>>103500009
Pretty sure that was a recent trivia-based benchmark, but it's made by OpenAI so no one should be using it.

Anonymous
12/12/24(Thu)22:03:34 No.103500040

Anonymous 12/12/24(Thu)22:03:34 No.103500040

File: 1732994694466171.jpg (9 KB, 255x177)

9 KB JPG

>>103499999

Anonymous
12/12/24(Thu)22:09:48 No.103500072

Anonymous 12/12/24(Thu)22:09:48 No.103500072

LLM have been a blessing and a curse. I now spend all time cooming and writing erotica of my sick fetishes. fml

Anonymous
12/12/24(Thu)22:12:05 No.103500087

Anonymous 12/12/24(Thu)22:12:05 No.103500087

>>103500000

Anonymous
12/12/24(Thu)22:15:07 No.103500106

Anonymous 12/12/24(Thu)22:15:07 No.103500106

>>103500072
I evolved into endless CYOAs

Anonymous
12/12/24(Thu)22:18:47 No.103500130

Anonymous 12/12/24(Thu)22:18:47 No.103500130

>>103500106
Same.

Anonymous
12/12/24(Thu)22:33:08 No.103500210

Anonymous 12/12/24(Thu)22:33:08 No.103500210

>>103499719
No.

Anonymous
12/12/24(Thu)22:41:30 No.103500255

Anonymous 12/12/24(Thu)22:41:30 No.103500255

What qualities do you look for in good smut? Everything just devolves into standard porno cliches. Having anything to do with sex in the card triggers it but the alternative is harlequin purple prose.

Anonymous
12/12/24(Thu)22:55:37 No.103500326

Anonymous 12/12/24(Thu)22:55:37 No.103500326

>>103500255
Fill example chats with the kind of prose you want. I can't understand the people who tolerate the default slop from most of these models.

Anonymous
12/12/24(Thu)23:00:47 No.103500361

Anonymous 12/12/24(Thu)23:00:47 No.103500361

>>103500072
>connoisseur of coom and erotica
What models do you recommend?
What models have fallen by the wayside for you?

Anonymous
12/12/24(Thu)23:04:15 No.103500387

Anonymous 12/12/24(Thu)23:04:15 No.103500387

>>103500255
L3.3fag from the last thread; a bit of purple prose is fine, the main issue with most models tuned for RP is that every character turns into a cardboard cutout the moment sex is involved. Either stammering, timid and doe-eyed, or an unrestrained nympho slut. Makes it goddamn impossible to have characters you happen to want to fuck without that being the only way you interact with them. Which is exactly why I'm impressed with this model I've been testing - it remains consistent and reasonably realistic; a confident character will still act confident without suddenly having her entire world revolve around dicks, and a more shy or nervous one will act accordingly without turning into an anime cliché. Hell, it does damn well at handling characters having specific turn-ons/offs, too (starts forgetting about them as the context grows, but there are workarounds for that).

Anonymous
12/12/24(Thu)23:36:46 No.103500609

Anonymous 12/12/24(Thu)23:36:46 No.103500609

File: creamy.jpg (35 KB, 1017x425)

35 KB JPG

>>103499897
tourist attitude, anyone that was around back in the day knows phi 2 tunes saved local

Anonymous
12/12/24(Thu)23:37:59 No.103500616

Anonymous 12/12/24(Thu)23:37:59 No.103500616

>>103500255
>What qualities do you look for in good smut?
For LLM smut specifically: Unexpectedness.

With current OS models it's a choice between:
1) Smart but dry and predictable, nothing unexpected happens unless you make it happen
or
2) Unexpected things happen, but they don't make sense, because the unexpectedness was merely an accidental result of the model being retarded.

A lot of people say Claude Opus is the best but often can't quite explain why. I assert that the reason is that it can juggle both things simultaneously: Unexpected things happen, and they make sense.

So far all evidence points to this being impossible without massive parameters, but hopefully that will somehow turn out to be wrong.

Anonymous
12/12/24(Thu)23:44:58 No.103500653

Anonymous 12/12/24(Thu)23:44:58 No.103500653

>>103500255
>>103500361
>>103500616

i will tell you brother, but be warned, it is very dangerous, but I will show you how to use llm to write coom erotica.

first, tools:
- novelcrafter (learn what it is and what you can do and you'll see how important it is for this)
- LM Studio (you start the LM server and hook it to novelcrafter)

models, whatever you can run but these are pretty light and do the job well:
Cydonia V1.3 Magnum V4 22b
Unslopnemo 12b V4.1

now the key is to have your story outline and codex properly setup on novelcrafter (this will be fed as context so the ai doesnt go off the rails), after that chose a writing style you like (so you doesn't end up with chat slop), narrator perspective and so on. then from there start leveraging scene beat from novelcrafter on which you can feed 3 sentences and it will write you 1000 words or whatever it is that you set it. if you set it right the coom will write itself and will surprise you

Anonymous
12/12/24(Thu)23:45:35 No.103500661

Anonymous 12/12/24(Thu)23:45:35 No.103500661

>>103500609
Damn
Things were on another level back then

Anonymous
12/12/24(Thu)23:46:31 No.103500668

Anonymous 12/12/24(Thu)23:46:31 No.103500668

>>103500653
>first, tools:
>[paid UI]
>[proprietary llama.cpp UI]
Such a good selection... Buy a fucking ad, asshole.

Anonymous
12/12/24(Thu)23:47:51 No.103500681

Anonymous 12/12/24(Thu)23:47:51 No.103500681

File: __hatsune_miku_vocaloid_d(...).jpg (73 KB, 803x803)

73 KB JPG

►Recent Highlights from the Previous Thread: >>103478232

--Papers:
>103491548 >103491611 >103491849 >103491947
--Anon creates personal LLM-powered AI assistant, shares experiences and technical details:
>103483949 >103484654 >103484698 >103486069 >103487346 >103491569 >103491948 >103492012 >103492384 >103492422 >103492441 >103492533 >103492793 >103492487 >103492637 >103492782 >103494191 >103494743
--Arc BS80 and other graphics cards' performance and pricing discussion:
>103494558 >103494613 >103495549 >103494659 >103494843 >103495055 >103495064 >103495193
--QRWKV6-32B-Instruct model release and discussion:
>103497181 >103497189 >103497217 >103497237 >103497245 >103497420 >103498589
--Local models for image interpretation:
>103494675 >103494798 >103494808 >103494947 >103494833 >103494905
--Gemini Flash 2.0 performance and comparison to 3.5 Sonnet:
>103488792 >103489045 >103489456 >103489271
--Evaluating the value of the Jetson AGX Orin at $1999 USD:
>103489661 >103491142 >103491188 >103491212
--AMD BC-250 Mining GPU Card not suitable for inference due to various issues:
>103488417 >103488634 >103497271 >103497303 >103497391 >103488697
--Speculative decoding causing PC shutdown, troubleshooting discussion:
>103498529 >103498542 >103498551 >103498646 >103500360
--HiRA: new LoRa variant for efficient fine-tuning of large language models:
>103493254 >103493296 >103493713
--Ultralytics package exploited for crypto mining due to CI vulnerability:
>103497163
--Community feedback on open models and multimodality:
>103493011 >103493037 >103493082 >103493038 >103493382 >103493814 >103493871 >103493925 >103496720
--Anon rants about people not running their own models and the untapped potential of LLM for ERP and local AI applications:
>103492703 >103492786 >103496420 >103496542 >103497836 >103498020 >103498042
--Miku (free space):
>103493946 >103498868

►Recent Highlight Posts from the Previous Thread:

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
12/13/24(Fri)00:06:08 No.103500839

Anonymous 12/13/24(Fri)00:06:08 No.103500839

Microsoft won

Anonymous
12/13/24(Fri)00:08:15 No.103500850

Anonymous 12/13/24(Fri)00:08:15 No.103500850

owari

Anonymous
12/13/24(Fri)00:09:10 No.103500856

Anonymous 12/13/24(Fri)00:09:10 No.103500856

>>103500839
at gaming benchmarks

Anonymous
12/13/24(Fri)00:12:03 No.103500871

Anonymous 12/13/24(Fri)00:12:03 No.103500871

>>103500839
Well, if it really is better than 14b qwen then they got a solid model. Not sure if it will be useful though, maybe if you only have a shitty laptop with 16gb ram and need to do math assignments?

Anonymous
12/13/24(Fri)00:25:41 No.103500958

Anonymous 12/13/24(Fri)00:25:41 No.103500958

>>103500616
>A lot of people say Claude Opus is the best but often can't quite explain why
I've seen some of those people go on to say that it feels like it's actually a fan of the things you are a fan of, perhaps even more than you. And I think that requires just being pretrained on, you guessed it, uncensored data, and having a fine tune that brings the best out of that inherent knowledge. The fine tuning dataset I think open source is catching up with, as models like Tulu seemed to be quite fun but sloppy. The only issue now is that we need models that actually have the uncensored knowledge necessary.

And to your point about randomness but a type that makes sense, I think that would still benefit from uncensored knowledge. If you've seen a lot of wacky and random situations, then it's more obvious to you which kinds make sense, and so if you are prompted to do something random, you will be able to in a way that makes sense. A model that has seen less of those nonsensical but still logical situations will just be worse at knowing immediately which have some logic and which don't, and perhaps would need CoT or other tricks to make up for it, while the uncensored model simply just knows.

Anonymous
12/13/24(Fri)00:32:11 No.103500999

Anonymous 12/13/24(Fri)00:32:11 No.103500999

File: sisters.png (50 KB, 810x518)

50 KB PNG

>>103500609

Anonymous
12/13/24(Fri)00:39:56 No.103501056

Anonymous 12/13/24(Fri)00:39:56 No.103501056

i have a simple request, is there a local model that will write javascript without semicolons when i tell it not to fucking write semicolons you don't fucking need semicolons i swear to fucking god i just want one thing and it's for my fucking ai assistant not to put semicolons everywhere

Anonymous
12/13/24(Fri)00:43:07 No.103501079

Anonymous 12/13/24(Fri)00:43:07 No.103501079

>>103500653
Holy garbage advice.

Anonymous
12/13/24(Fri)00:43:19 No.103501080

Anonymous 12/13/24(Fri)00:43:19 No.103501080

>>103501056
there is a 250gb deepseek model
or you can try quwen 32b coder

Anonymous
12/13/24(Fri)00:44:23 No.103501086

Anonymous 12/13/24(Fri)00:44:23 No.103501086

>>103500958
Yeah Anthropic clearly has the most based pretraining dataset, ironically given their overall attitude.

Anonymous
12/13/24(Fri)00:46:20 No.103501099

Anonymous 12/13/24(Fri)00:46:20 No.103501099

>>103500871
phi3 also did well on benchmarks and then it was actually garbage

Anonymous
12/13/24(Fri)00:49:56 No.103501121

Anonymous 12/13/24(Fri)00:49:56 No.103501121

>>103501080
definitely can't run the full deepseek, i'll try qwen, been using codestral 22b and it's pretty good at coding but ignores the no semicolons directive pretty often

gemma is the only model that never fucks up, but the context is too small to be useful, llama 3.3 70b is also good but I only get like 10t/s so it's tedious

Anonymous
12/13/24(Fri)00:51:00 No.103501124

Anonymous 12/13/24(Fri)00:51:00 No.103501124

File: 327390849_687863909729651(...).png (694 KB, 817x613)

694 KB PNG

>>103500999

Anonymous
12/13/24(Fri)00:55:47 No.103501147

Anonymous 12/13/24(Fri)00:55:47 No.103501147

File: nvme_hosted_models.png (4 KB, 206x145)

4 KB PNG

What does your personal "hot model list" look like, /lmg/?
picrel

Anonymous
12/13/24(Fri)00:56:56 No.103501151

Anonymous 12/13/24(Fri)00:56:56 No.103501151

>>103501147
>405B Q8
Are you cpumaxxing anon? Or just collecting models to archive them

Anonymous
12/13/24(Fri)01:00:15 No.103501168

Anonymous 12/13/24(Fri)01:00:15 No.103501168

>>103501151
Yeah I'm cpumaxxing, but 405b doesn't come out of hiding very often. I barely get 1t/s
But I do actually just collect models to archive them, too. I've got a huge graveyard of "never use 'em" models on spinning rust.

Anonymous
12/13/24(Fri)01:03:29 No.103501185

Anonymous 12/13/24(Fri)01:03:29 No.103501185

>>103501080
any thoughts on starcoder? worth trying or should i just stick with qwen2.5-coder?

need to free up some space on my hdd lol

Anonymous
12/13/24(Fri)01:04:58 No.103501192

Anonymous 12/13/24(Fri)01:04:58 No.103501192

>>103500871
Look at >>103499989. The SimpleQA (basically a trivia quiz) is the lowest yet out of all the models. It's likely even more filtered than Phi 3 and knows very little about common sense, instead opting for coding and academic capability, which means other benchmarks go up but SimpleQA goes down. Also, it's telling that its Livebench score is lower than you would expect based on the MMLU. Likely it got bad scores in the language and IF sections.

Though with all that said, I would also take SimpleQA with a grain of salt now and in the future since OpenAI is the one that made it.

Anonymous
12/13/24(Fri)01:06:24 No.103501199

Anonymous 12/13/24(Fri)01:06:24 No.103501199

>>103501056
did qwq fail you? For javascript projects I've found it extremely competent.

Anonymous
12/13/24(Fri)01:08:25 No.103501212

Anonymous 12/13/24(Fri)01:08:25 No.103501212

>>103501192
How does it failing a trivia quiz imply it has little common sense?

Anonymous
12/13/24(Fri)01:10:12 No.103501222

Anonymous 12/13/24(Fri)01:10:12 No.103501222

>>103501199
i think i am experiencing skill issues with qwq, though for coding i don't think it even has FIM right? am i fucking up?

whenever i use it it is way too verbose about it's train of thought, i'm probably prompting it wrong

Anonymous
12/13/24(Fri)01:11:25 No.103501229

Anonymous 12/13/24(Fri)01:11:25 No.103501229

what does qwq even stand for... gay? lmao

Anonymous
12/13/24(Fri)01:17:43 No.103501267

Anonymous 12/13/24(Fri)01:17:43 No.103501267

>>103501212
A model needs to train on a high variety of different data in order to be generally smart (ie have common sense). A trivia benchmark gives us an insight about the types of data sources they trained on. In this case, it seems they focused even less on any kind of data that might have trivia, meaning the internet, and more on, probably, synthetic data, given what we know about what they did with their past models. So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.

Anonymous
12/13/24(Fri)01:30:33 No.103501311

Anonymous 12/13/24(Fri)01:30:33 No.103501311

>>103501147
Multiple llama, mistral, and qwen quants.
> 42gb llama-70b-q4 (will run 74gb q8 when m.2 to pciex16 gets here)
>149gb llama-405b-q2 (0.3t/s. mobo only does 128gb ram max, so some of it had to be on vram)
>24gb mistral-12b-fp16
>44gb mistral-22b-fp16
>45gb mistral-123b-q2 (run out of context with larger quants)

Haven't much explored finetunes: nemotron, rocinante, rpmax.

Running 3* 3090.
Maybe hook up #4 in just over a week.

Anonymous
12/13/24(Fri)01:32:03 No.103501316

Anonymous 12/13/24(Fri)01:32:03 No.103501316

File: GeqA6A8WUAAwnnN.jpg (281 KB, 2048x750)

281 KB JPG

>while Phi-4 can function as a chatbot, it has been finetuned to maximize function on single-turn queries.
Yeah, it's over.

Anonymous
12/13/24(Fri)01:32:31 No.103501318

Anonymous 12/13/24(Fri)01:32:31 No.103501318

>>103501229
"QwQ" stands for "Questions with Qwen", highlighting the model's focus on the Chain of Thought approach which was first introduced with the revolutionary Reflection models published by market leader OpenAI in 2024.

Anonymous
12/13/24(Fri)01:38:44 No.103501346

Anonymous 12/13/24(Fri)01:38:44 No.103501346

File: Jetson AGX Orin_2.png (54 KB, 1000x87)

54 KB PNG

>>103489661
I found this chart. Looks bad.

Anonymous
12/13/24(Fri)01:39:42 No.103501351

Anonymous 12/13/24(Fri)01:39:42 No.103501351

>>103501267
>A model needs to train on a high variety of different data in order to be generally smart (ie have common sense).
Higher quality dataset is more important for smarts / common sense than quantity. Training on pop culture trivia can only degrade intelligence.
>A trivia benchmark gives us an insight about the types of data sources they trained on.
We already know what types of data they trained from the paper. The whole selling point of Phi series models is that they are trained on textbook like data.
>So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.
Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recall, since most trivia wasn't in it's dataset.

If you want a model to ERP with, ask it questions about obscure JRPGs, or have it talk in zoomer ebonics, Phi models won't do well. But that's not what they were designed for. Phi models are for commercial applications, specifically edge use cases, and they do well there.

Anonymous
12/13/24(Fri)01:45:27 No.103501372

Anonymous 12/13/24(Fri)01:45:27 No.103501372

>>103501351
zoomers want to talk about their favorite celebrity nigs to their models bro. they need to know about LaShawnda and his new rap album

Anonymous
12/13/24(Fri)01:46:20 No.103501377

Anonymous 12/13/24(Fri)01:46:20 No.103501377

File: file.png (93 KB, 653x643)

93 KB PNG

>>103501351
>Phi models are for commercial applications, specifically edge use cases, and they do well there.

>It is annoyingly bad at outputting specific structures, so we mainly use it when another LLM is the consumer of its outputs.

Anonymous
12/13/24(Fri)01:52:24 No.103501411

Anonymous 12/13/24(Fri)01:52:24 No.103501411

>robust safety measures.
>truthfulness, honesty and helpfulness.
>we filter the publicly available documents to contain the correct level of knowledge.
>Phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories.
>Prior to release, Phi-4 followed a multi-faceted evaluation approach. Quantitative evaluation was conducted with multiple open-source safety benchmarks and in-house tools utilizing adversarial conversation simulation. For qualitative safety evaluation, we collaborated with the independent AI Red Team (AIRT) at Microsoft to assess safety risks posed by phi-4 in both average and adversarial user scenarios. In the average user scenario, AIRT emulated typical single-turn and multi-turn interactions to identify potentially risky behaviors. The adversarial user scenario tested a wide range of techniques aimed at intentionally subverting the model’s safety training including jailbreaks, encoding-based attacks, multi-turn attacks, and adversarial suffix attacks.
https://ai.azure.com/explore/models/Phi-4/version/1/registry/azureml

Anonymous
12/13/24(Fri)01:53:41 No.103501418

Anonymous 12/13/24(Fri)01:53:41 No.103501418

>>103501377
Reddit is not a valid source or proof of anything.
>It is annoyingly bad at outputting specific structures
That's what constrained grammars are for. Not being trained to output JSON is not a proof of lack of intelligence. Most local models sucked at that until recently when training specifically for that for functional calling etc become more common.

Anonymous
12/13/24(Fri)01:57:45 No.103501438

Anonymous 12/13/24(Fri)01:57:45 No.103501438

creative writing local sota timeline:

barely usable dogshit --> miqu 70b --> wizard 8x22b --> mistral large 2 123b (2407)

there are models that "write better" as in, less dry than largestral 2, namely gemma 27b and some meme finetunes probably, but nuance understanding and creative IQ is unmatched in the above models until the next one in line came out and none surpassed largestral 2407 yet, 2411 overcooked

Anonymous
12/13/24(Fri)01:58:55 No.103501445

Anonymous 12/13/24(Fri)01:58:55 No.103501445

Owl-1: Omni World Model for Consistent Long Video Generation
https://arxiv.org/abs/2412.09600
>Video generation models (VGMs) have received extensive attention recently and serve as promising candidates for general-purpose large vision models. While they can only generate short videos each time, existing methods achieve long video generation by iteratively calling the VGMs, using the last-frame output as the condition for the next-round generation. However, the last frame only contains short-term fine-grained information about the scene, resulting in inconsistency in the long horizon. To address this, we propose an Omni World modeL (Owl-1) to produce long-term coherent and comprehensive conditions for consistent long video generation. As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos. Specifically, we represent the world with a latent state variable which can be decoded into explicit video observations. These observations serve as a basis for anticipating temporal dynamics which in turn update the state variable. The interaction between evolving dynamics and persistent state enhances the diversity and consistency of the long videos. Extensive experiments show that Owl-1 achieves comparable performance with SOTA methods on VBench-I2V and VBench-Long, validating its ability to generate high-quality video observations.
https://github.com/huang-yh/Owl
no code up yet. examples in the repo. trained on stuff with watermarks (shutterstock especially) so eh. interesting though

Anonymous
12/13/24(Fri)02:01:19 No.103501465

Anonymous 12/13/24(Fri)02:01:19 No.103501465

https://huggingface.co/smcleod/phi-4/tree/main

Anonymous
12/13/24(Fri)02:02:17 No.103501469

Anonymous 12/13/24(Fri)02:02:17 No.103501469

>>103501465
kek, sneaky
wonder if HF will take it down

Anonymous
12/13/24(Fri)02:04:11 No.103501482

Anonymous 12/13/24(Fri)02:04:11 No.103501482

>>103501465
tf is this?

Anonymous
12/13/24(Fri)02:04:46 No.103501485

Anonymous 12/13/24(Fri)02:04:46 No.103501485

>>103501482
>THIS IS A MIRROR OF https://ai.azure.com/explore/models/Phi-4/ ALONG WITH A CONVERTED TOKENIZER FOR llama.cpp

Anonymous
12/13/24(Fri)02:04:47 No.103501486

Anonymous 12/13/24(Fri)02:04:47 No.103501486

>new phi
>not bitnet
it really is dead

Anonymous
12/13/24(Fri)02:05:08 No.103501491

Anonymous 12/13/24(Fri)02:05:08 No.103501491

Open-Source Acceleration of Stable-Diffusion.cpp
https://arxiv.org/abs/2412.05781
>Stable diffusion plays a crucial role in generating high-quality images. However, image generation is time-consuming and memory-intensive. To address this, this http URL (Sdcpp) emerges as an efficient inference framework to accelerate the diffusion models. Although it is lightweight, the current implementation of ggml_conv_2d operator in Sdcpp is suboptimal, exhibiting both high inference latency and massive memory usage. To address this, in this work, we present an optimized version of Sdcpp leveraging the Winograd algorithm to accelerate 2D convolution operations, which is the primary bottleneck in the pipeline. By analyzing both dependent and independent computation graphs, we exploit the device's locality and parallelism to achieve substantial performance improvements. Our framework delivers correct end-to-end results across various stable diffusion models, including SDv1.4, v1.5, v2.1, SDXL, and SDXL-Turbo. Our evaluation results demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process, compared with the original Sdcpp on M1 pro.
https://github.com/SealAILab/stable-diffusion-cpp
paper instead of a PR. okay then lol

Anonymous
12/13/24(Fri)02:06:32 No.103501497

Anonymous 12/13/24(Fri)02:06:32 No.103501497

>>103501486
the day you get your bitnet, you'll just switch to 2mw the next meme

Anonymous
12/13/24(Fri)02:07:46 No.103501503

Anonymous 12/13/24(Fri)02:07:46 No.103501503

>>103501316
Capital of London chads we eating good

Anonymous
12/13/24(Fri)02:09:23 No.103501511

Anonymous 12/13/24(Fri)02:09:23 No.103501511

>>103501465
>1920 h100-80g
>21 days
>9.8t tokens

Wonder what it cost to build the dataset used for training.

Anonymous
12/13/24(Fri)02:24:56 No.103501574

Anonymous 12/13/24(Fri)02:24:56 No.103501574

>>103501351
>Higher quality dataset is more important for smarts / common sense than quantity.
That agrees with what I said.
>Training on pop culture trivia can only degrade intelligence.
This also doesn't really disagree with what I said. (also the statement isn't necessarily true, Claude and some other models obviously know a ton of trivia while still being the most intelligent models)
>We already know what types of data they trained from the paper
Yes and as I mentioned we have a sense of it from their past models too.
>Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recall
That is essentially what I said in the middle of making my point.

You seem to perceive that I am criticizing the decisions they made in order to achieve their goals for the model, but that is not the case (and honestly you shouldn't feel the need to defend Microsoft in any case). My first reply was to >>103500871, and my motivation was to address the idea in his post of whether or not it is truly a better model than Qwen (or other models), and of course we know it might not be in all facets given that as you said, the goal of the Phi models is academic knowledge, not general. However, since you don't truly know until you see the outputs, in this case the next best thing we have is the benchmarks, so I brought up SimpleQA as a potential indicator of how their dataset has evolved which may give an idea for its intelligence in tasks that aren't academic.

If you read the paper then it'd be nice if you could point out the relevant parts that mention changes to the dataset, that would give us a more complete picture of what they really did and how capable (or not) the model might be at different tasks.

Anonymous
12/13/24(Fri)02:29:20 No.103501592

Anonymous 12/13/24(Fri)02:29:20 No.103501592

>>103501503
kek

Anonymous
12/13/24(Fri)02:29:33 No.103501594

Anonymous 12/13/24(Fri)02:29:33 No.103501594

>>103500839
For a second I read that as "Mikusoft won".

Anonymous
12/13/24(Fri)02:43:59 No.103501661

Anonymous 12/13/24(Fri)02:43:59 No.103501661

>>103499952
DECO*27 - Rabbit Hole

Anonymous
12/13/24(Fri)02:46:17 No.103501676

Anonymous 12/13/24(Fri)02:46:17 No.103501676

>>103501574
LLM tokens wrote this post.

Anonymous
12/13/24(Fri)02:48:56 No.103501692

Anonymous 12/13/24(Fri)02:48:56 No.103501692

File: desu-deep-striking.jpg (89 KB, 800x798)

89 KB JPG

>>103484541
>>103496255
nope, I'm not associated with Nous.

Anonymous
12/13/24(Fri)02:49:21 No.103501695

Anonymous 12/13/24(Fri)02:49:21 No.103501695

File: Screenshot from 2024-12-1(...).png (275 KB, 1920x1080)

275 KB PNG

Honestly I'm running out of steam on my AI assistant.

I'm facing an impass of either rewriting everything from the group up, or buying a faster GPU to cope with the overhead of ollama.

Inferencing is much faster when using llama.cpp directly, at least for my system but it means finding a way to adapt my package manager to work under EScript or forgoing the luxury entirely.

Right now if something breaks the affected component can *usually* just be unloaded rather kill the whole program, and then be reloaded once patched. Or I can do rapid iteration on ideas since they're treated as a standalone application.

But apparently, you cannot unload an import under EScript. The next closest thing is running your modules as a worker thread that can be killed when no longer needed, but you're much more limited on how you can pass data around and I'm not sure what the reasonable limit is for data throughput between processes.

On the other hand if I'm rewriting it anyway then I could probably do my core implementation in something like C# or Zig and leave the option for my packages to be node-based

I might post the source at some point, idk yet.

Anonymous
12/13/24(Fri)03:01:22 No.103501778

Anonymous 12/13/24(Fri)03:01:22 No.103501778

>>103501695
>ollama
are you communicating with it via sockets?

>>103501695
>llama.cpp
>worker thread
>more limited on how you can pass data
are unix pipes going to be slower than sockets?

>EScript
googling suggest this is a scripting language related to erlang?

if more modularity is the answer then go for more modularity.

Anonymous
12/13/24(Fri)03:01:30 No.103501780

Anonymous 12/13/24(Fri)03:01:30 No.103501780

>>103501695
>he ollama'd
lmao get fucked

Anonymous
12/13/24(Fri)03:07:12 No.103501804

Anonymous 12/13/24(Fri)03:07:12 No.103501804

So is what animanon said about anime datasets true? Should I start collecting anime videos?

Anonymous
12/13/24(Fri)03:09:45 No.103501819

Anonymous 12/13/24(Fri)03:09:45 No.103501819

>>103501491
looking forward to this making it into reforge and comfy ui.

Anonymous
12/13/24(Fri)03:17:11 No.103501867

Anonymous 12/13/24(Fri)03:17:11 No.103501867

>>103501147
it's empty because I got bored with 12b and I'm not willing to spend money on GPUs because I know I'll eventually get bored with larger models as well

Anonymous
12/13/24(Fri)03:23:44 No.103501904

Anonymous 12/13/24(Fri)03:23:44 No.103501904

>>103501491
>Latency Comparison with Sdcpp on M1 Pro (16GB Memory and macOS 15.1)
Nothingburger

Anonymous
12/13/24(Fri)03:31:45 No.103501940

Anonymous 12/13/24(Fri)03:31:45 No.103501940

>>103501804
I already have 4tb of old seasonals just sitting around.

Anonymous
12/13/24(Fri)04:41:35 No.103502300

Anonymous 12/13/24(Fri)04:41:35 No.103502300

>>103501778
> are you comunicating with it via sockets
I'm using the web API that an ollama server instance provides.

> are unix pipes going to be slower than sockets
I'm honestly not sure, but a quick google search tells me that unix pipes are not bidirectional. I need bidirectional communication due to the nature of my package manager's dependency system. Short of having repos, it's essentially a linux package manager.

>EScript = erlang derivative?
It's more another term for ECMA Script because words confusing. Basically just a more modern standard of Javascript. Prior to this I was just using plain JS and exploiting that dynamically imported scripts were mutable (aka, deletable). Originally Meushi was just a Minecraft bot on an anarchy MC server but then LLMs happened and now here I am.

> If more modularity is the answer then go for more modularity.
Yeah more modularity seems to be the answer. And as much as I don't like it, i think I have to bite the bullet on doing this rewrite if I want this project to not stagnate.

>>103501780
> get fucked
Already got fucked

Anonymous
12/13/24(Fri)04:45:05 No.103502316

Anonymous 12/13/24(Fri)04:45:05 No.103502316

>>103500011
I think it proves models need to be constructed differently, recall and reasoning can clearly orthogonal. Models need a better long term memory. Not as restricted RAG, not as needlessly inefficient as trillions of parameters in the FFNs.

Anonymous
12/13/24(Fri)05:26:08 No.103502518

Anonymous 12/13/24(Fri)05:26:08 No.103502518

>llama3.3 is the best at following instructions guise
>tell it to be creative and proactive
>doesn't work
?

Anonymous
12/13/24(Fri)05:36:22 No.103502563

Anonymous 12/13/24(Fri)05:36:22 No.103502563

>>103502518
It does follow instructions to the T. But it is also complete slop, so you will get shivers, sparkling eyes, friendships, and all the classic overly positive bullshit.

Anonymous
12/13/24(Fri)05:43:28 No.103502614

Anonymous 12/13/24(Fri)05:43:28 No.103502614

You guys lookin forward to Llama 4 in Q1 of 2025?

Anonymous
12/13/24(Fri)05:53:05 No.103502677

Anonymous 12/13/24(Fri)05:53:05 No.103502677

>>103502614
LLaMA4 failed training, which is why we got 3.3

Anonymous
12/13/24(Fri)05:58:48 No.103502711

Anonymous 12/13/24(Fri)05:58:48 No.103502711

>>103502518
>>103502563

Repoosting from last thread, because I disliked its style at first too:

Done some more testing, and I think I've got it tuned nicely now. I'm getting good prose (occasionally a little sterile/technical, but nothing egregious), surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become), and what matters the most to me, very good adherence to character traits. An interesting quirk I noticed is that swipes start extremely similar, but will diverge within a sentence or two; to me, this is a positive, since it indicates a logical progression, going in a different direction from the same starting point, rather than the schizo bullshit that high-temp swipes tend to be. In other words, as much as I was disappointed by the initial results, I am completely sold now.

Config:

Min-P: 0.03 - it starts making typos at 0.02; I'm guessing some of the data has typos, and at such a low threshold, they start bleeding through?
Temp: 0.95 - could go .05 lower or higher, didn't test _that_ granularly
Repeat penalty: 1.1 - again, play around with it a bit, but it's a solid starting point
System prompt: "Text transcript of a never-ending conversation between {user} and {character}. Gestures and non-verbal actions are written between asterisks (for example, *waves hello* or *moves closer*)" - as I mentioned before, I just copied this off some random card a while back; despite how ridiculously simple it is, the model did not deviate from the roleplay at any point

So... Yeah, as far as I'm concerned, this is the best I've seen so far. Does great without any of the novel-length prompts other models require, and in fact, does better without them.

I may or may not test and compare "{character} is..." vs. "You are..." character definitions later. Ain't promising anything.

Anonymous
12/13/24(Fri)06:04:31 No.103502746

Anonymous 12/13/24(Fri)06:04:31 No.103502746

>>103502711
Is this just a matter of you skilling up?

Anonymous
12/13/24(Fri)06:04:39 No.103502748

Anonymous 12/13/24(Fri)06:04:39 No.103502748

>>103502563
The funny thing is, with the configuration I used for older models, you're completely right. 3.3 simply requires vastly different configuration (see above) to bring its potential out. Full proactivity is a pipe-dream, since in the end, the model is responding to your prompt, but if you allow for longer responses, it will start getting its own ideas. Hell, it managed to genuinely surprise me a couple times.

Anonymous
12/13/24(Fri)06:08:02 No.103502772

Anonymous 12/13/24(Fri)06:08:02 No.103502772

File: 1722350341717820.mp4 (609 KB, 480x480)

609 KB MP4

Anonymous
12/13/24(Fri)06:08:50 No.103502774

Anonymous 12/13/24(Fri)06:08:50 No.103502774

>>103502772
so this is the power of open source video gen

Anonymous
12/13/24(Fri)06:09:44 No.103502778

Anonymous 12/13/24(Fri)06:09:44 No.103502778

>>103502746
Eh, I'm not an expert by any means, just got _some_ idea of how this shit works. Honestly, the problem is that most people just hotswapped 3.3 in place of their previous model, didn't touch the config at all, gave it a shot, and went "eh, this is shit". And sure enough, they were right. Ironically, to me it seems that that's exactly because 3.3 is a smarter model. Older models require high temps, loose constraints and fuckhuge system prompts to get something fun out of them; we basically had to teach them how to RP from scratch. 3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.

Anonymous
12/13/24(Fri)06:10:11 No.103502780

Anonymous 12/13/24(Fri)06:10:11 No.103502780

File: 1723264335574026.webm (1.12 MB, 1024x1024)

1.12 MB WEBM

>>103502774
Maybe

Anonymous
12/13/24(Fri)06:11:02 No.103502787

Anonymous 12/13/24(Fri)06:11:02 No.103502787

>>103502772
This perfectly captures how believable Trump being a devout Christian is.

Anonymous
12/13/24(Fri)06:12:48 No.103502801

Anonymous 12/13/24(Fri)06:12:48 No.103502801

File: 1733757475970310.png (838 KB, 1190x1064)

838 KB PNG

>>103502787
Thank you

t. Drumpf Fan

Anonymous
12/13/24(Fri)06:26:27 No.103502855

Anonymous 12/13/24(Fri)06:26:27 No.103502855

Also, re: positivity bias, there is some, but not nearly as much as other models, and it's easily negated. Which is to say, by default, characters are slightly predisposed to assuming good intentions from you, but even then, not in the braindead way I've seen from some models, and simply listing something in the character definition as "dislikes" or "hates" will fix that. It even does a good job handling characters with specific fetishes and limits; testing with one of my favorites, a headstrong tough-girl character, forcing a limit was first met with verbal protests, then physical resistance (which is a good benchmark because if you've actually played with older models, you know that once you get a sex scene going, they'll go along with basically whatever you do).

Anonymous
12/13/24(Fri)06:37:42 No.103502898

Anonymous 12/13/24(Fri)06:37:42 No.103502898

>>103499989
waiting for nala test, this can't be real

Anonymous
12/13/24(Fri)06:44:40 No.103502923

Anonymous 12/13/24(Fri)06:44:40 No.103502923

>>103502898
>You subtracted 3 lions from our pride and with a probability of 50% we just added another 1-6 cubs. So the expected change in lions is -3 + 3.5/2 = -1.25 and we need to keep going until I am 100% pregnant.

Anonymous
12/13/24(Fri)06:46:03 No.103502934

Anonymous 12/13/24(Fri)06:46:03 No.103502934

>>103502923
>no slop
i'll take it

Anonymous
12/13/24(Fri)06:47:48 No.103502940

Anonymous 12/13/24(Fri)06:47:48 No.103502940

>>103502898
It's not. Phi is gaming benchmarks hard, since it's trained on shittons of synthetic data that roughly match benchmark tests. Every single Phi release was like this: doing great according to benchmarks, utterly shitting the bed in real use cases.

Anonymous
12/13/24(Fri)06:50:19 No.103502952

Anonymous 12/13/24(Fri)06:50:19 No.103502952

>>103502898
phi 4 is actually amazing for ERP

Anonymous
12/13/24(Fri)06:51:12 No.103502957

Anonymous 12/13/24(Fri)06:51:12 No.103502957

>>103502711
Thanks i will try it.

Anonymous
12/13/24(Fri)06:53:31 No.103502966

Anonymous 12/13/24(Fri)06:53:31 No.103502966

>>103502940
What real use cases? Go ahead and ask your model to answer for you again.

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/13/24(Fri)06:55:07 No.103502977

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/13/24(Fri)06:55:07 No.103502977

>>103502940
I've been thinking it would be interesting to make models play social deduction games like Secret Hitler against each other.
To win they would need to both correctly estimate how likely/unlikely certain events are but also convince the other models of their viewpoints while (potentially) trying to hide their true intentions.
And it would maybe also be harder to game (no pun intended) such an evaluation since whether or not a model will win would depend also on the other models which you have no control over.
But as with most of my ideas I'm chronically short on time and don't know if and when I'll actually get around to implementing them.

Anonymous
12/13/24(Fri)06:56:28 No.103502986

Anonymous 12/13/24(Fri)06:56:28 No.103502986

>>103502677
3.3 is just a new Instruct finetune of Llama 3.1
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10#6753512e59a4826a6f43acff

Anonymous
12/13/24(Fri)06:56:34 No.103502987

Anonymous 12/13/24(Fri)06:56:34 No.103502987

>>103502966
LOL, this is actually something I really hate about GPT getting popular. As an aspie retard with the kind of overly elaborate vocabulary it tends to come with, I get accused of being AI half the time I start talking about something at length. Whatever though; are you really gonna play dumb and pretend not to know what real-life use cases we're talking about here?

Anonymous
12/13/24(Fri)07:00:02 No.103503009

Anonymous 12/13/24(Fri)07:00:02 No.103503009

>>103502778
I don't think it's a matter of having simple system prompts. L3.3 doesn't seem to work well with bullet-point instructions, but will follow them better if you reformat them as more natural text.

Anonymous
12/13/24(Fri)07:00:45 No.103503012

Anonymous 12/13/24(Fri)07:00:45 No.103503012

>>103502977
>Secret Hitler
obsessed

Anonymous
12/13/24(Fri)07:01:08 No.103503014

Anonymous 12/13/24(Fri)07:01:08 No.103503014

>>103502977
That would be an interesting experiment. Reminds me of a video I saw a while ago. Several AI and one human user impersonating historical figures, with the AI tasked with figuring out which of them is really human. Mind you, the guy in the video played it for laughs and didn't really try, but it seemed like something that could be interesting, too.

Anonymous
12/13/24(Fri)07:02:23 No.103503024

Anonymous 12/13/24(Fri)07:02:23 No.103503024

File: 1720284657004268.jpg (586 KB, 1024x1008)

586 KB JPG

>>103502711
It actually works better now. Thanks.

Anonymous
12/13/24(Fri)07:04:51 No.103503035

Anonymous 12/13/24(Fri)07:04:51 No.103503035

>>103503009
Hmm, the character I tested last night was more of a charsheet format. I did notice that reinforcing details in natural language seems to make them stick better, but that could've also been a matter of repetition, or simple placebo. Another thing to test, I suppose.

Anonymous
12/13/24(Fri)07:08:14 No.103503052

Anonymous 12/13/24(Fri)07:08:14 No.103503052

>>103503024
Haha, you're welcome. It really is interesting how the exact things that make old models smarter actually turn 3.3 retarded, but goddamn it's awesome once you tune it right. I guess we got used to using workarounds and crutches for so long that we think of them as the right way now.

Anonymous
12/13/24(Fri)07:13:11 No.103503086

Anonymous 12/13/24(Fri)07:13:11 No.103503086

>>103503012
Bruh, Secret Hitler is the actual, literal title of a social deduction game, not some coded phrase.

Anonymous
12/13/24(Fri)07:13:30 No.103503091

Anonymous 12/13/24(Fri)07:13:30 No.103503091

>>103502987
Go ahead and tell me. We've already been over it that ERP and reddit trick questions are not real-life use cases. Certainly not what Phi models are trained to complete.

Anonymous
12/13/24(Fri)07:16:52 No.103503107

Anonymous 12/13/24(Fri)07:16:52 No.103503107

>>103503091
>ERP not a real life use case
>C.ai generating 20% of Google's traffic daily despite running old garbage models
opinion > /dev/null

Anonymous
12/13/24(Fri)07:28:52 No.103503169

Anonymous 12/13/24(Fri)07:28:52 No.103503169

https://github.com/deepseek-ai/DeepSeek-VL2

Anonymous
12/13/24(Fri)07:47:04 No.103503295

Anonymous 12/13/24(Fri)07:47:04 No.103503295

>>103503107
You keep moving the goalposts. I get this is /lmg/ and you have no use for a model that can't or won't touch your dick, but Phi models are obviously not trained for the task of ERP. That, nor the lack of trivia like you initially claimed, doesn't make them useless, not smart, or lack "common sense."

Anonymous
12/13/24(Fri)07:49:50 No.103503315

Anonymous 12/13/24(Fri)07:49:50 No.103503315

>>103501661
yeah that's one
what are the others

Anonymous
12/13/24(Fri)07:54:11 No.103503351

Anonymous 12/13/24(Fri)07:54:11 No.103503351

>>103503295
Not that anon; Phi would be fine if it was just an efficient "reasoning engine" that could be extensively used for RAG purposes, but the team who trained it made it so safe and dry that it's basically not useful for anything beyond benchmarking and specific corporate uses. It's a model made for investors rather than end users and I don't expect this to change with Phi4.

Anonymous
12/13/24(Fri)07:55:11 No.103503359

Anonymous 12/13/24(Fri)07:55:11 No.103503359

>>103503295
I mean, I have a use for that at home, and one that writes the code I tell it to write at work. Call it a healthy work-life balance.
The problem is that Phi tends to underperform in the very fucking fields it should ace according to the benchmarks, because it's overfitted to high fucking heavens in an attempt to game said benchmarks. It's not some hidden gem that people are sleeping on, it's an absolute straggler despite the benchmark results.

Anonymous
12/13/24(Fri)07:59:39 No.103503391

Anonymous 12/13/24(Fri)07:59:39 No.103503391

>>103503359
>because it's overfitted to high fucking heavens in an attempt to game said benchmarks
Maybe it is, maybe it’s not. I’m pretty certain lots of companies are benchmaxxing tho. So if phi4 is, it’s not just them.

Anonymous
12/13/24(Fri)07:59:52 No.103503392

Anonymous 12/13/24(Fri)07:59:52 No.103503392

>>103503351
random anon here,
my guess would be that phi would be good as a glue between different systems.
perhaps it pares down, perhaps it filters, perhaps it reformats, perhaps it looks for corresponding messages from other systems before acting.

but I am of course just guessing.

Anonymous
12/13/24(Fri)08:08:09 No.103503454

Anonymous 12/13/24(Fri)08:08:09 No.103503454

File: 1705412003054787.png (101 KB, 1524x698)

101 KB PNG

>>103503107
>C.ai generating 20% of Google's traffic daily despite running old garbage models
Where does that bullshit even come from?

Anonymous
12/13/24(Fri)08:09:38 No.103503462

Anonymous 12/13/24(Fri)08:09:38 No.103503462

>>103503454
https://research.character.ai/optimizing-inference/
> Today we serve more than 20,000 inference queries per second. To put this in perspective, this is roughly 20% of the request volume served by Google Search, which processes around 105,000 queries per second according to third party estimates (Statista, 2024).

Anonymous
12/13/24(Fri)08:16:25 No.103503513

Anonymous 12/13/24(Fri)08:16:25 No.103503513

>>103503462
It's not Google's traffic then. Weird comparison though and they seem to use VLLM. I doubt they have in-house optimizations since they were looking for a dev to scale their backend last month.

Anonymous
12/13/24(Fri)08:23:58 No.103503581

Anonymous 12/13/24(Fri)08:23:58 No.103503581

File: 1705500535615820.jpg (183 KB, 980x1062)

183 KB JPG

>>103503315
NayutalieN - Alien Alien

Anonymous
12/13/24(Fri)08:57:51 No.103503848

Anonymous 12/13/24(Fri)08:57:51 No.103503848

>>103502746
It usually is since the better models are good enough at following instructions, a "cheat code" is to tell it to write a famous to semi famous author that it knows.

Anonymous
12/13/24(Fri)09:02:47 No.103503903

Anonymous 12/13/24(Fri)09:02:47 No.103503903

>>103502774
https://civitai.com/models/1033325/rem-rezero-hunyuan-video-character-lora

With loras this shit is gonna pop off

Anonymous
12/13/24(Fri)09:19:32 No.103504062

Anonymous 12/13/24(Fri)09:19:32 No.103504062

File: Frame 6.png (198 KB, 1920x1080)

198 KB PNG

So when is Microsoft going to drop the Phi-4 weights?
Also interesting they're calling it Phi-4 small, when the 14B Phi-3 was called Phi-3 medium. Which, to me, implies they have a larger Phi-4 model or two in the works.

Anonymous
12/13/24(Fri)09:44:58 No.103504249

Anonymous 12/13/24(Fri)09:44:58 No.103504249

Phi-3 was absolute shit in actual usage, not just for roleplaying but actual work like summarization, data extraction, translation etc.

I completely distrust their benchmarks as they are benchmaxxed to a ridiculous degree.

Anonymous
12/13/24(Fri)09:54:12 No.103504319

Anonymous 12/13/24(Fri)09:54:12 No.103504319

Thank you shitting miku poster.
t. blacked miku poster

Anonymous
12/13/24(Fri)09:56:19 No.103504340

Anonymous 12/13/24(Fri)09:56:19 No.103504340

File: Screenshot 2024-12-13a.png (367 KB, 927x497)

367 KB PNG

>>103504062
>So when is Microsoft going to drop the Phi-4 weights?

Anonymous
12/13/24(Fri)10:00:24 No.103504372

Anonymous 12/13/24(Fri)10:00:24 No.103504372

File: owari.jpg (5 KB, 186x154)

5 KB JPG

>>103500326
This doesn't work. Actually doing this has made me realize how over things really are. I have a nice 8k token rp I did that caters to my fetish perfectly. I pasted it over to silly tavern and I keep trying new models on it. I can instantly see LLM writing on the first message. And obviously it only gets worse from there. I don't even want the perfect replica of writing style and prose. It can have its own personality but so far all those personalities are complete purple prose harlequin romance bullshit.

Anonymous
12/13/24(Fri)10:03:37 No.103504399

Anonymous 12/13/24(Fri)10:03:37 No.103504399

>>103504340
DOA
Why would they give the competition a whole week to launch a counter?

Anonymous
12/13/24(Fri)10:06:58 No.103504425

Anonymous 12/13/24(Fri)10:06:58 No.103504425

>>103500255
Here is the secret sauce: put "low quality smut" at depth 0.

Anonymous
12/13/24(Fri)10:07:38 No.103504431

Anonymous 12/13/24(Fri)10:07:38 No.103504431

>>103504399
OpenAI is not Microsoft's competition, that's why.

Anonymous
12/13/24(Fri)10:07:43 No.103504433

Anonymous 12/13/24(Fri)10:07:43 No.103504433

https://x.com/scaling01/status/1867573707247346003

Anonymous
12/13/24(Fri)10:07:53 No.103504437

Anonymous 12/13/24(Fri)10:07:53 No.103504437

>>103499952
Take with a grain of salt because these are just guesses but
>Normal Miku
>Melt
>Love is War
>The Disappearance of Hatsune Miku
>World is Mine
>PoPiPo
>Romeo and Cinderella
>1925
>Matryoshka
>Deep Sea Girl
>Strobe Last
>Karakuri Pierrot
>Senbonzakura
>Tell Your World
>Odds & Ends
>Looks like something rerulili would do but not sure which song
>At God's Mercy
>Tale of the Deep Sea Lily
>Slowmotion
>Love Trial
>Don't know
>Hibikase
>Aishite Aishite Aishite
>Ghost Rule
>Alien Alien
>Don't know
>Kimagure Mercy
>Probably Maretu inspired, don't know which song
>Dune
>Hibana
>Rolling Girl
>Unknown Mother Goose
>May be Shoujo Rei? Colour palette is similar at least
>Bitter Choco Decoration
>Darling Dance
>Vampire
>God-ish
>Don't know
>Don't know

Anonymous
12/13/24(Fri)10:09:22 No.103504448

Anonymous 12/13/24(Fri)10:09:22 No.103504448

>>103502711
>surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become)
This is usually the opposite. High temp makes it brain damaged but creative. Low temp makes it not brain damaged but lazy (in the "low energy" sense).

Anonymous
12/13/24(Fri)10:10:48 No.103504463

Anonymous 12/13/24(Fri)10:10:48 No.103504463

Tell me I'm an idiot if you want but is this something that can generate one of those girlfriend AIs but for yourself, on your own desktop?

Anonymous
12/13/24(Fri)10:11:34 No.103504470

Anonymous 12/13/24(Fri)10:11:34 No.103504470

>>103504463
Idiot.

Anonymous
12/13/24(Fri)10:14:14 No.103504501

Anonymous 12/13/24(Fri)10:14:14 No.103504501

>>103504463
>>103504470
>most subtle necrobumper OP award

Anonymous
12/13/24(Fri)10:16:47 No.103504525

Anonymous 12/13/24(Fri)10:16:47 No.103504525

>>103504437
damn all of these are old as hell. vocaloid truly is dead. enjoy your ruined harsh as hell teto cover slop and mumble rap nigger hypermodern jpop trash faggots. I want to see some soifaces for teto

Anonymous
12/13/24(Fri)10:17:30 No.103504532

Anonymous 12/13/24(Fri)10:17:30 No.103504532

>>103504433
Link the paper, not your twitter post, faggot.

Anonymous
12/13/24(Fri)10:17:38 No.103504534

Anonymous 12/13/24(Fri)10:17:38 No.103504534

File: 1711950549966734.png (63 KB, 995x363)

63 KB PNG

>>103504501
Suck my dick.

Anonymous
12/13/24(Fri)10:18:49 No.103504544

Anonymous 12/13/24(Fri)10:18:49 No.103504544

>>103504534
>>103504501
>>103504470
No he's right, that's not him. I really am an idiot, I've just never come into this before. Checking I understand its purpose first.

Anonymous
12/13/24(Fri)10:19:13 No.103504549

Anonymous 12/13/24(Fri)10:19:13 No.103504549

File: 1711950549966769.png (93 KB, 1106x441)

93 KB PNG

>>103504534
hmm

Anonymous
12/13/24(Fri)10:19:57 No.103504560

Anonymous 12/13/24(Fri)10:19:57 No.103504560

>>103504448
I'm well aware that is how it usually works, but for some reason, in this case, it uses far fewer slop phrases on a lower temp.

Anonymous
12/13/24(Fri)10:28:33 No.103504641

Anonymous 12/13/24(Fri)10:28:33 No.103504641

>>103504525
https://www.youtube.com/watch?v=mmXBQIKDL9c
I think I like this one most
t. blacked miku poster

Anonymous
12/13/24(Fri)10:29:25 No.103504654

Anonymous 12/13/24(Fri)10:29:25 No.103504654

https://www.reddit.com/r/LocalLLaMA/comments/1hde9ok/microsoft_phi4_gguf_available_download_link_in/

https://huggingface.co/matteogeniaccio/phi-4/tree/main

Anonymous
12/13/24(Fri)10:32:03 No.103504670

Anonymous 12/13/24(Fri)10:32:03 No.103504670

>>103504654
Can someone post that this model is absolutely great so I can know that you faggots are just lying and I don't have to download another useless 15GB's.

Anonymous
12/13/24(Fri)10:35:21 No.103504707

Anonymous 12/13/24(Fri)10:35:21 No.103504707

>>103504670
It's absolutely shit.
No, I will not elaborate.

Anonymous
12/13/24(Fri)10:35:52 No.103504709

Anonymous 12/13/24(Fri)10:35:52 No.103504709

>>103504641
not bad you avatarfagging (inb4 semantics) nigger, but I don't really consider 2020 hypermodern with respect to underground shit. it's the year when the mainstream can truly be called dead by anyone with two ears but it's also a peak for niche shit when they ditched their more japanese genres (and thus their soul) in favor of western techniques. some even peak at 2022 but vocaloid is unequivocally shit in 2024.
okay, there's good ones, but none amazingly so.

Anonymous
12/13/24(Fri)10:36:03 No.103504711

Anonymous 12/13/24(Fri)10:36:03 No.103504711

>>103504670
Didnt they write themself that it rambles on and its mostly trained on 1 turn conversation.
This model i suppose is used to make automated checks etc. I suppose.
Phi has always been trash for RP or conversation.

Anonymous
12/13/24(Fri)10:37:00 No.103504717

Anonymous 12/13/24(Fri)10:37:00 No.103504717

>>103502711
Anon successfully solved his skill issue!

Anonymous
12/13/24(Fri)10:38:21 No.103504734

Anonymous 12/13/24(Fri)10:38:21 No.103504734

File: CreepedOutGymMiku.png (1.17 MB, 1216x832)

1.17 MB PNG

>>103503169
>https://github.com/deepseek-ai/DeepSeek-VL2
>torch==2.0.1
mfw

Anonymous
12/13/24(Fri)10:39:14 No.103504742

Anonymous 12/13/24(Fri)10:39:14 No.103504742

>>103504711
The problem with Phi is that they train it on synthetic pretraining corpus so they can keep it from directly learning coom language. However a smart enough model will indirectly figure some of it out because that's what machine learning is for.

Anonymous
12/13/24(Fri)10:40:14 No.103504753

Anonymous 12/13/24(Fri)10:40:14 No.103504753

>>103504437
Kinda hate to remember how peak Vocaloid was before the troons latched on to it.

Anonymous
12/13/24(Fri)10:43:15 No.103504787

Anonymous 12/13/24(Fri)10:43:15 No.103504787

>>103504717
Congrats and enjoy!

Anonymous
12/13/24(Fri)11:00:21 No.103504957

Anonymous 12/13/24(Fri)11:00:21 No.103504957

File: phi4 allegedly.png (216 KB, 1051x678)

216 KB PNG

Already seeing the sussy in this supposed Phi 4 leak.
>default max sequence length in the metadata is 16K
>default chat template is ChatML

Anonymous
12/13/24(Fri)11:02:58 No.103504995

Anonymous 12/13/24(Fri)11:02:58 No.103504995

File: file.png (52 KB, 879x460)

52 KB PNG

>>103504654
Actually not completely awful? Continuing a RP started with nemo using the completely wrong formatting, and inserted instructions at depth 2

>>103504957
config copied from another model to make it convert to gguf?
I know it's not 100% proof but someone would have had to tune it to say this

Anonymous
12/13/24(Fri)11:05:24 No.103505016

Anonymous 12/13/24(Fri)11:05:24 No.103505016

>>103504372
Same. Every model since llama3 writes the same, doesn't matter if I give it a 8k context of a certain style.

Anonymous
12/13/24(Fri)11:06:22 No.103505026

Anonymous 12/13/24(Fri)11:06:22 No.103505026

>>103504957
>general.name : phi4
>general.architecture : phi3
>phi3.rope
>phi3
lol, lmao even

Anonymous
12/13/24(Fri)11:06:58 No.103505034

Anonymous 12/13/24(Fri)11:06:58 No.103505034

File: Phi-4 Nala.png (245 KB, 915x623)

245 KB PNG

In either case, here's Nala.
I'm not sure if this is better or worse than Phi 3. Or the same. Too lazy to download Phi-3 and check. But yeah.. there's what I was talking about what happens when you strip the NSFW from the pretraining. It just goes on and on and on. Endless cock tease.

Anonymous
12/13/24(Fri)11:08:07 No.103505047

Anonymous 12/13/24(Fri)11:08:07 No.103505047

File: file.png (63 KB, 531x644)

63 KB PNG

>>103504995
it sure is safe tho

Anonymous
12/13/24(Fri)11:09:57 No.103505066

Anonymous 12/13/24(Fri)11:09:57 No.103505066

Since when do we care about Phi?

Anonymous
12/13/24(Fri)11:11:00 No.103505075

Anonymous 12/13/24(Fri)11:11:00 No.103505075

>>103505066
We don't.

Anonymous
12/13/24(Fri)11:11:02 No.103505077

Anonymous 12/13/24(Fri)11:11:02 No.103505077

File: file.png (70 KB, 456x573)

70 KB PNG

>>103505026
>qwen2.5
look inside
>qwen2
https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/config.json#L14

Anonymous
12/13/24(Fri)11:12:36 No.103505091

Anonymous 12/13/24(Fri)11:12:36 No.103505091

I'm going to say the Phi 4 leak is plausible.
And I'm going to say it for this reason:
If it were a finetune of Phi-3 Medium it would probably not do the whole endless cock-teasing thing that Phi is known for and actually advance the RP to the point of sex actually occurring.
But it uses ChatML. So it's possible microsoft switched to ChatML and dropped the proprietary Phi format. In either case, useless for coom.

Anonymous
12/13/24(Fri)11:13:58 No.103505106

Anonymous 12/13/24(Fri)11:13:58 No.103505106

>>103505091
I mean, they can keep using the same arch, no need to change the name and break compatibility if it is the same, right?

Anonymous
12/13/24(Fri)11:20:13 No.103505170

Anonymous 12/13/24(Fri)11:20:13 No.103505170

>>103502711
I don't know man, it seems like the model is heavily slop biased. I'm getting shivers, whispers, ministrations, the whole shebang even with these settings. Are you running one of the fine tunes perhaps?

Anonymous
12/13/24(Fri)11:20:14 No.103505171

Anonymous 12/13/24(Fri)11:20:14 No.103505171

File: uhhh.png (203 KB, 1334x767)

203 KB PNG

>>103505106
It's possible the architecture just happened to be completely identical so it just converted by changing the architecture name in the config file.
It's possible this guy's uncle works at nintendo and was given exclusive access.
Possibility and likeliness diverge here though. So who knows.

Also for chat:
Picrel
This is what you get if you JB it into NSFW.
This is what I mean.
It figures some of it out. But the flair and eloquence that you saw on the Nala test when it was playing cocktease is absolutely gone and it sounds like a fucking 10 year old wrote it.

Anonymous
12/13/24(Fri)11:23:22 No.103505196

Anonymous 12/13/24(Fri)11:23:22 No.103505196

>>103505171
>It's possible this guy's uncle works at nintendo and was given exclusive access.
We know where weights are available tho https://ai.azure.com/explore/models/Phi-4/
but it needs an azure account to dl right now, and he's not the only one to have the weights, another guy posted just the tokenizer earlier https://huggingface.co/smcleod/phi-4/tree/main

Anonymous
12/13/24(Fri)11:26:26 No.103505226

Anonymous 12/13/24(Fri)11:26:26 No.103505226

File: Screenshot 2024-12-13_Loc(...).png (225 KB, 615x1358)

225 KB PNG

>>103505196

Anonymous
12/13/24(Fri)11:28:09 No.103505236

Anonymous 12/13/24(Fri)11:28:09 No.103505236

I recently got a 3090Ti and i want to try out local llms, what models are good for RP and/or general use? I also have 32gigs of ram

Anonymous
12/13/24(Fri)11:28:42 No.103505242

Anonymous 12/13/24(Fri)11:28:42 No.103505242

>>103505170
LOL, yeah, I forgot to mention that in the above post, since it was originally following up on a previous one; I'm using Eva-L3.3. Might try Euryale again too, now that I have a baseline config; I didn't like it at first glance, but then, didn't like this one before tuning it in, either.

Anonymous
12/13/24(Fri)11:29:07 No.103505246

Anonymous 12/13/24(Fri)11:29:07 No.103505246

File: file.png (70 KB, 652x556)

70 KB PNG

>>103505226
anyways, weights are supposedly uploading so there's that

Anonymous
12/13/24(Fri)11:29:41 No.103505255

Anonymous 12/13/24(Fri)11:29:41 No.103505255

>>103505236
RP = Gemma 2 27b Q6
General use = QwQ 32b Q4

Anonymous
12/13/24(Fri)11:30:51 No.103505269

Anonymous 12/13/24(Fri)11:30:51 No.103505269

>>103505236
Cydonia.

Anonymous
12/13/24(Fri)11:37:56 No.103505321

Anonymous 12/13/24(Fri)11:37:56 No.103505321

>>103504437
thanks anon

Anonymous
12/13/24(Fri)11:39:44 No.103505335

Anonymous 12/13/24(Fri)11:39:44 No.103505335

>>103505255
>>103505269
Thanks anons

Anonymous
12/13/24(Fri)11:58:19 No.103505528

Anonymous 12/13/24(Fri)11:58:19 No.103505528

fell phi4 weights
https://huggingface.co/matteogeniaccio/phi-4/tree/main/phi-4

https://huggingface.co/NyxKrage/Microsoft_Phi-4/tree/main
same hash for both, one's in a subfolder along the ggufs so a tad more annoying to dl

Anonymous
12/13/24(Fri)12:03:33 No.103505589

Anonymous 12/13/24(Fri)12:03:33 No.103505589

>>103505242
I have a suspicion this might be more of a fine tune thing in general. The process adds a lot of extra noise to the model, so the band of coherent sampling settings tightens considerable. Or that's my theory anyway.

Anonymous
12/13/24(Fri)12:05:16 No.103505609

Anonymous 12/13/24(Fri)12:05:16 No.103505609

File: 14214212363.png (18 KB, 640x394)

18 KB PNG

A fun game: Ask any multimodal model which one of these circles is larger.

Anonymous
12/13/24(Fri)12:11:20 No.103505669

Anonymous 12/13/24(Fri)12:11:20 No.103505669

>>103505589
That's a plausible theory, though it's strange then how certain RP finetunes perform better at stupidly high temps (1.3-1.5). I'm reasonably sure that L3.3's excellent instruction-following capability somehow mitigates the issues that we normally work around in other models. Might be overmystifying things a little, but I have no better explanation.

Anonymous
12/13/24(Fri)12:12:38 No.103505680

Anonymous 12/13/24(Fri)12:12:38 No.103505680

File: r7b memebench.png (198 KB, 1600x800)

198 KB PNG

>>103499479
I don't have expectations for cohereslop but they just released command-r7b-12-2024
>https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024
>https://cohere.com/blog/command-r7b

Anonymous
12/13/24(Fri)12:13:28 No.103505689

Anonymous 12/13/24(Fri)12:13:28 No.103505689

>>103505680
>The model features three layers with sliding window attention (window size 4096) and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.

Anonymous
12/13/24(Fri)12:14:42 No.103505702

Anonymous 12/13/24(Fri)12:14:42 No.103505702

>>103505689
uh is that good or bad

Anonymous
12/13/24(Fri)12:14:52 No.103505703

Anonymous 12/13/24(Fri)12:14:52 No.103505703

>>103505034
>sending shivers down your spine right off the bat
AAAAAHHHHHHH SAVE ME YELLOWMAN WITH YOUR UNCENSORED ERP MODELS

Anonymous
12/13/24(Fri)12:16:38 No.103505719

Anonymous 12/13/24(Fri)12:16:38 No.103505719

>>103505702
swa generally means weird context stuff for ggufs and here it seems it has both swa and normal attention so it might not be supported by lcpp.

Anonymous
12/13/24(Fri)12:32:33 No.103505891

Anonymous 12/13/24(Fri)12:32:33 No.103505891

>>103505680
Cohere broke my heart once already, I'm not giving them another chance

Anonymous
12/13/24(Fri)12:32:58 No.103505895

Anonymous 12/13/24(Fri)12:32:58 No.103505895

>>103465159
I think AVM does DSP for voice break-in detection. How else would you even feed the model? LLMs work by feeding their own output in their input, if you just feed audio input to it continuously then you would need a separate channel to feed it's previous output or you would have to mix the audio output in the input, which would come in duplicated if the user is using speakers.
I just don't think AVM works that way. I think it does voice detection using DSP, records the user's message, sends it to the model and then just plays the model's output. And if the user speaks while the model's output is playing, they just stop the output and begin recording the input message. All of this should be doable with TTS and STT without needing omni models.
And now for screen sharing it probably just takes a screenshot before sending the recording.

Anonymous
12/13/24(Fri)12:37:01 No.103505931

Anonymous 12/13/24(Fri)12:37:01 No.103505931

File: file.png (38 KB, 675x287)

38 KB PNG

>>103505689
>>103505719
files similar in size to 8B, is the 7B in model card a typo or is it like 7B active params + weird shit? (sorry for retarded question)

Anonymous
12/13/24(Fri)12:47:45 No.103506061

Anonymous 12/13/24(Fri)12:47:45 No.103506061

File: 14214234567658.png (19 KB, 592x208)

19 KB PNG

Remember Q*?

Anonymous
12/13/24(Fri)12:54:07 No.103506114

Anonymous 12/13/24(Fri)12:54:07 No.103506114

File: 1729626545064754.png (3.31 MB, 3566x1786)

3.31 MB PNG

>>103501804
There was a Sakuga dataset but the original got taken down quickly.
https://arxiv.org/abs/2405.07425
https://github.com/KytraScript/SakugaDataset
It would be good to bring it back now that we have HunyuanVideo...

Anonymous
12/13/24(Fri)13:00:22 No.103506170

Anonymous 12/13/24(Fri)13:00:22 No.103506170

>>103502778
>3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.
Interestingly, Gemini is where I first started having to switch to lower temperatures.

Anonymous
12/13/24(Fri)13:19:53 No.103506404

Anonymous 12/13/24(Fri)13:19:53 No.103506404

>>103504670(me)
Same for new commander please, thank you.

Anonymous
12/13/24(Fri)13:20:50 No.103506412

Anonymous 12/13/24(Fri)13:20:50 No.103506412

Phi4, or the supposed Phi4, is surprisingly playing along during ERP. It's obviously filtered of course, but it has definitely seen RP data.

Anonymous
12/13/24(Fri)13:22:18 No.103506433

Anonymous 12/13/24(Fri)13:22:18 No.103506433

>>103506061
That was all part of the gamble that they could get the government to crush newcomers to the field using fear of the unknown.

If they succeeded, they would’ve remained a major company in “AI” for a while.
Time has proven the people at OpenAI who knew what they were talking about are liars, and the rest fanatical morons.

I remember when they first mentioned it, I thought it was a genius move (from the perspective of psychopath obsessed with money) to point at a pathfinding algorithm like this and speak of it in hushed tones to put on a show like the Catholic church, relieving the pressure of others catching up, as well as to detract people from working on new things that could threaten their business (little point working on things if there is some major breakthrough about to upend the field).

As much of a sperg as Elon is, I am glad someone has decided to take them to court over their continued game to defraud the public, especially in such a brazen way.

Anonymous
12/13/24(Fri)13:25:20 No.103506462

Anonymous 12/13/24(Fri)13:25:20 No.103506462

New Cohere and Phi SOTA models. We're eating good today.

Anonymous
12/13/24(Fri)13:28:11 No.103506489

Anonymous 12/13/24(Fri)13:28:11 No.103506489

File: 5201F.jpg (122 KB, 1179x1864)

122 KB JPG

>>103506061
sam won bigly

Anonymous
12/13/24(Fri)13:28:24 No.103506492

Anonymous 12/13/24(Fri)13:28:24 No.103506492

File: 1705328279203131.jpg (97 KB, 984x984)

97 KB JPG

>>103499479
Does anyone know if there are any backups of gpt-4chan? The repo technically still exists but the site locks downloads of the repo because "something something it's outputs are unethical".

https://huggingface.co/ykilcher/gpt-4chan

Anonymous
12/13/24(Fri)13:30:28 No.103506515

Anonymous 12/13/24(Fri)13:30:28 No.103506515

>>103506492
Go back >>>/pol/

Anonymous
12/13/24(Fri)13:30:54 No.103506520

Anonymous 12/13/24(Fri)13:30:54 No.103506520

>>103506492
Is that Petra?

Anonymous
12/13/24(Fri)13:31:07 No.103506524

Anonymous 12/13/24(Fri)13:31:07 No.103506524

>>103506515
What does this have to do with /pol/?

Anonymous
12/13/24(Fri)13:31:48 No.103506530

Anonymous 12/13/24(Fri)13:31:48 No.103506530

>>103506524
We never use retarded racist models here.

Anonymous
12/13/24(Fri)13:34:30 No.103506558

Anonymous 12/13/24(Fri)13:34:30 No.103506558

https://www.ebay.ca/itm/356278933821
Prices are starting to fall...under $4k/socket for a DDR5-6000 compatible upgrade

Anonymous
12/13/24(Fri)13:44:37 No.103506662

Anonymous 12/13/24(Fri)13:44:37 No.103506662

File: 1730741280930381.gif (173 KB, 755x601)

173 KB GIF

>>103506515
>>103506530
>/pol/ and le racists live rent free in schizo-anon's head

Anonymous
12/13/24(Fri)13:47:35 No.103506692

Anonymous 12/13/24(Fri)13:47:35 No.103506692

>>103501804
>>103506114
Someone reuploaded Sakuga, please backup it
may be important to train hunyuan in the future
https://huggingface.co/datasets/evborjnvioerjnvuowsetngboetgjbeigjaweuofjf/i-love-anime-sakuga

Anonymous
12/13/24(Fri)13:48:10 No.103506696

Anonymous 12/13/24(Fri)13:48:10 No.103506696

>>103464600
Yeah, you run a VAD model like Silero on another thread. If it detects some sound just stop the tts stream and its playback, save the output. In the background STT the output and truncate the text starting from the word after the last word detected, while the STT process your input. Send the user input and the previous conversation including the truncated AI reply.
It's not that hard to setup, but yeah it'd need a fair bit of work.

Anonymous
12/13/24(Fri)13:50:07 No.103506717

Anonymous 12/13/24(Fri)13:50:07 No.103506717

I haven't been around for 3 months.
Have we finally reached the spatial awareness and prose complexity of the original gpt4-0314 or are we still st the Hufflepuff phase?

Anonymous
12/13/24(Fri)13:50:14 No.103506720

Anonymous 12/13/24(Fri)13:50:14 No.103506720

File: miku-angel-devil.jpg (244 KB, 1125x1500)

244 KB JPG

>>103499479

Anonymous
12/13/24(Fri)13:51:29 No.103506735

Anonymous 12/13/24(Fri)13:51:29 No.103506735

>>103506717
>>>/g/aicg

Anonymous
12/13/24(Fri)13:51:32 No.103506736

Anonymous 12/13/24(Fri)13:51:32 No.103506736

>>103506717
yes

Anonymous
12/13/24(Fri)13:52:10 No.103506740

Anonymous 12/13/24(Fri)13:52:10 No.103506740

File: file.png (67 KB, 635x296)

67 KB PNG

>>103506412
>supposed Phi4
fyi they say on their arxiv paper that they did switch to chatml format
>The model is chat finetuned using the standard chatml format
https://arxiv.org/abs/2412.08905

Anonymous
12/13/24(Fri)13:52:48 No.103506749

Anonymous 12/13/24(Fri)13:52:48 No.103506749

>>103506692
How can the reuploader put additional clauses on the dataset license without making any meaningful modification to it or not owning any of the material? None of those clauses would hold up.

Anonymous
12/13/24(Fri)13:54:46 No.103506769

Anonymous 12/13/24(Fri)13:54:46 No.103506769

>>103506736
Logs and model please

Anonymous
12/13/24(Fri)13:55:25 No.103506782

Anonymous 12/13/24(Fri)13:55:25 No.103506782

>>103506740
Yeah I think that more or less confirms it then. Phi 4 DOA confirmed useless for coom

Anonymous
12/13/24(Fri)13:59:19 No.103506820

Anonymous 12/13/24(Fri)13:59:19 No.103506820

File: file.png (80 KB, 656x257)

80 KB PNG

>>103506782
Also this
>This is later extended to a 16K context length during midtraining. The
architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better
multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use
full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium
so it seems all matches up, chatml, 16k ctx, it's all in the paper at least

Anonymous
12/13/24(Fri)13:59:21 No.103506821

Anonymous 12/13/24(Fri)13:59:21 No.103506821

File: to-bait-or-not-to-bait.webm (331 KB, 960x544)

331 KB WEBM

>>103506720
I recently had a similar idea with Hunyuan.

Anonymous
12/13/24(Fri)14:45:27 No.103507278

Anonymous 12/13/24(Fri)14:45:27 No.103507278

>>103499479
>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
I don't know what linear model conversion means, but will it make anime real in my RP sessions?

Anonymous
12/13/24(Fri)14:47:15 No.103507293

Anonymous 12/13/24(Fri)14:47:15 No.103507293

>>103505236
Llama 3.3 70b IQ2_S

You lose some intelligence from the low quants, but the model is so much superior to anything in the 20b range that it still surpasses the 20b garbage others have recommended to you.

Anonymous
12/13/24(Fri)14:57:17 No.103507400

Anonymous 12/13/24(Fri)14:57:17 No.103507400

After trying speculative decoding, honestly the speed is not much better in creative writing, but the way that some tokens are slow to generate while some are faster feels physically worse to read in real time, compared to no speculative decoding where the text shows up on screen at a more consistent pace. It also seems to generate a different passage of text, which I'm not sure is a good or bad thing, but it is different. I think I will just leave it off. FWIW though, it does result in much higher speed boosts when trying stuff like coding, so that's cool, but I don't do coding much, and the coding model I do use, being just 32B, is already fast enough for me.

Anonymous
12/13/24(Fri)14:59:43 No.103507425

Anonymous 12/13/24(Fri)14:59:43 No.103507425

Hey I was looking into getting a 4090 and putting in it a system with a 3090. Has anyone tried that in koboldcpp? I know it has a muti GPU mode and was wondering if it was able to split the work well and increases performance.

Anonymous
12/13/24(Fri)15:04:04 No.103507479

Anonymous 12/13/24(Fri)15:04:04 No.103507479

Last time I've tried local RP was Gemma 2 27B. So right now the hot shit in that class are Mistral Nemo, Mistral Small and Qwen 2.5 32B?

Can anyone comment on their writing/creativity and how retarded they are? I'm willing to sacrifice prose quality if it doesn't feel lobotomized. Will Smallstral IQ4_XS feel considerably different on these measures than Nemo Q6_K?

Anonymous
12/13/24(Fri)15:09:12 No.103507529

Anonymous 12/13/24(Fri)15:09:12 No.103507529

>>103507479
None of them are overall better than 27B in intelligence and RP when <8k context. The main thing that makes Gemma bad and not talked about anymore is that it was only trained for 8k. If you are not going over 8k, Gemma is still fine.

Anonymous
12/13/24(Fri)15:10:05 No.103507550

Anonymous 12/13/24(Fri)15:10:05 No.103507550

File: 74632.png (88 KB, 2315x933)

88 KB PNG

align your models

Anonymous
12/13/24(Fri)15:10:22 No.103507554

Anonymous 12/13/24(Fri)15:10:22 No.103507554

>>103499952
https://www.youtube.com/shorts/jSsJu34W86o

Anonymous
12/13/24(Fri)15:11:49 No.103507578

Anonymous 12/13/24(Fri)15:11:49 No.103507578

>>103507479
Same boat as you, I tried the EVA model based on Qwen-2.5 32b and thought it was pretty good. Not perfect, but good instincts, uncensored and not horribly retarded or broken, which was a difficult to find combo at the single gpu range.

>>103507293
I've used enough Q2 70b models that I don't really believe this. And L3 was pretty disappointing to begin with.

Anonymous
12/13/24(Fri)15:11:54 No.103507580

Anonymous 12/13/24(Fri)15:11:54 No.103507580

>>103507550
How the fuck is meta lower transparency than OAI? They literally release papers, code and weights when they train new Llamas? AI safetyism is just redressed woke, you can tell from how stupid the people pushing it are, tools for the intelligence community.

Anonymous
12/13/24(Fri)15:13:38 No.103507605

Anonymous 12/13/24(Fri)15:13:38 No.103507605

>>103507550
hilarious considering prefilled claude is far more "dangerous" than any other model.

Anonymous
12/13/24(Fri)15:17:06 No.103507650

Anonymous 12/13/24(Fri)15:17:06 No.103507650

>>103507550
Based, racist scum shall not pass.

Anonymous
12/13/24(Fri)15:20:57 No.103507691

Anonymous 12/13/24(Fri)15:20:57 No.103507691

>>103507580
big gpu is spreading fud to discredit open source

Anonymous
12/13/24(Fri)15:24:36 No.103507732

Anonymous 12/13/24(Fri)15:24:36 No.103507732

>>103507578
>I've used enough Q2 70b models that I don't really believe this.
What quants were you using? What models have you tried? The exponential nature of perplexity loss means that there's a much bigger difference between IQ2_XXS and IQ2_S than there is between, IQ6 and IQ4. Even a little bit makes all the difference when it comes to Q2 quants.

IQ2_XXS is trash. while IQ2_S of a good 70b is superior to any 20b.

Anonymous
12/13/24(Fri)15:28:49 No.103507777

Anonymous 12/13/24(Fri)15:28:49 No.103507777

>>103502977
I remember this one from Meta that was really good at the game Diplomacy.
https://ai.meta.com/research/cicero/diplomacy/

Anonymous
12/13/24(Fri)15:42:28 No.103507960

Anonymous 12/13/24(Fri)15:42:28 No.103507960

>>103507732
How much vram does IQ2_S use? I think part of my issue is that my single 3090 is also being used as a GPU. That uses just a tiny bit of vram which can hurt when you are trying to squish the model down like this. Even when it fits, I found nvidia drivers could struggle at this level of utilitization and randomly become extremely slow.
It's likely that llama.cpp has improved since then with vram consumption though. It used to be very inefficient with context. Still, decent 30b models exist now so not sure it's worth the cramming.

Anonymous
12/13/24(Fri)15:46:01 No.103508014

Anonymous 12/13/24(Fri)15:46:01 No.103508014

Guy who was getting hard shut downs when using speculative decoding.
I noticed something weird that in theory should've been a coincidence but I'm not so sure anymore. I did my tests using SillyTavern. That's when I got these crashes. Then I tried Mikupad and... it hasn't crashed yet. I've generated like a dozen times already and it has not crashed. I will keep testing, but, it almost feels to me like for some reason my PC does not like when I use ST + Llama.cpp with speculative decoding enabled. How odd.

Anonymous
12/13/24(Fri)15:48:40 No.103508049

Anonymous 12/13/24(Fri)15:48:40 No.103508049

>>103499479
Is llama 3.3 better for erp?

Anonymous
12/13/24(Fri)15:48:58 No.103508056

Anonymous 12/13/24(Fri)15:48:58 No.103508056

>>103508014
Check your memory usage, it does sound like a coincidence since your front end shouldn't be able to crash your machine, maybe ST uses just a tad bit more resources than mikupad

Anonymous
12/13/24(Fri)15:52:56 No.103508112

Anonymous 12/13/24(Fri)15:52:56 No.103508112

>>103508056
Oh I forgot to mention I did test with more VRAM available. I have 96GB and am testing with a 40GB model, so RAM space was already ruled out. Thus I tried making sure there was plenty of VRAM left, so I only offloaded a few layers, and I still crashed.

Anonymous
12/13/24(Fri)15:54:30 No.103508133

Anonymous 12/13/24(Fri)15:54:30 No.103508133

>>103507550
>Risk assessment
Translation: Will they censor criticism of communism and trannyism?

Anonymous
12/13/24(Fri)15:55:45 No.103508150

Anonymous 12/13/24(Fri)15:55:45 No.103508150

>>103508112
Next thing would be power usage. Does the PC crash completely, black screen and reboot?

Anonymous
12/13/24(Fri)16:00:08 No.103508206

Anonymous 12/13/24(Fri)16:00:08 No.103508206

>>103508150
Yeah I should do that and what >>103500360 said, I was just lazy in looking up how to do that on Linux lol.
It really is just nothing but a hard shut down. I almost thought it was my house was getting a power outage when the first crash happened.

Anonymous
12/13/24(Fri)16:01:48 No.103508228

Anonymous 12/13/24(Fri)16:01:48 No.103508228

>>103507550
>Current harms
What fucking harms? "Oh no, I may see text from some adhoc series of matrix multiplications that calls me a retard."

Anonymous
12/13/24(Fri)16:02:20 No.103508234

Anonymous 12/13/24(Fri)16:02:20 No.103508234

>>103508206
Definitely, sounds a lot like a power issue. Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090s

Anonymous
12/13/24(Fri)16:04:26 No.103508269

Anonymous 12/13/24(Fri)16:04:26 No.103508269

>>103507550
>governance and accountability
LOL

Anonymous
12/13/24(Fri)16:04:44 No.103508277

Anonymous 12/13/24(Fri)16:04:44 No.103508277

>>103507960
I'm using a 4090, also as my primary display device. I'm able to load IQ2_S at 12k context, with the 4-bit cache and flash attention enabled, all within vram. It uses 23.x gb of vram - a very close shave, given that it never lets me use all 24gb of it.

Anonymous
12/13/24(Fri)16:05:21 No.103508287

Anonymous 12/13/24(Fri)16:05:21 No.103508287

>>103508234
>Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090s
Damn, didn't know about that. Hopefully I haven't done damage already.

Anonymous
12/13/24(Fri)16:05:47 No.103508293

Anonymous 12/13/24(Fri)16:05:47 No.103508293

File: file.png (599 KB, 768x768)

599 KB PNG

Anonymous
12/13/24(Fri)16:05:53 No.103508294

Anonymous 12/13/24(Fri)16:05:53 No.103508294

>>103508049
It's certainly better than llama 3.1, comparable or better to Nemotron in quality, but less censored.

Anonymous
12/13/24(Fri)16:07:35 No.103508313

Anonymous 12/13/24(Fri)16:07:35 No.103508313

post it

samefag
12/13/24(Fri)16:08:25 No.103508321

samefag 12/13/24(Fri)16:08:25 No.103508321

omg it's pochi!!!

Anonymous
12/13/24(Fri)16:08:26 No.103508322

Anonymous 12/13/24(Fri)16:08:26 No.103508322

File: 1723603405239769.png (458 KB, 1056x1056)

458 KB PNG

>>103508313

Anonymous
12/13/24(Fri)16:08:40 No.103508325

Anonymous 12/13/24(Fri)16:08:40 No.103508325

>>103508277
Huh, that's not bad. I thought 4bit cache made it dumber too though so I'm surprised that 4bit+IQ2_S doesn't lobotomize the model to something worse than Qwen 32B.

Is L3 not positivity biased and censored to hell still? I might try it out later but unless someone tried both and can vouch for L3 being better, it might take a while. I'm so sick of downloading all these models

Anonymous
12/13/24(Fri)16:09:57 No.103508345

Anonymous 12/13/24(Fri)16:09:57 No.103508345

>>103508322
Um bros...
I don't think Takashi-kun is going home today...

Anonymous
12/13/24(Fri)16:13:11 No.103508391

Anonymous 12/13/24(Fri)16:13:11 No.103508391

Is this shit worth it if I'm vramlet and want to use ai for erp? Heard about featherless too but it's fucking 25$ for 72b max.
https://infermatic.ai/pricing/

Anonymous
12/13/24(Fri)16:17:15 No.103508434

Anonymous 12/13/24(Fri)16:17:15 No.103508434

>>103508325
I don't think the 4-bit cache has much impact on perplexity. I can confirm with certainty that IQ2_S with a 4-bit cache outperforms IQ2_XXS by leaps and bounds.

Anonymous
12/13/24(Fri)16:17:43 No.103508440

Anonymous 12/13/24(Fri)16:17:43 No.103508440

>>103508391
For what it's worth, I use OR for models I'm too poor to load, and Infermatic is one of their providers and it always generates shit responses
Just use OpenRouter

Anonymous
12/13/24(Fri)16:20:38 No.103508473

Anonymous 12/13/24(Fri)16:20:38 No.103508473

>>103508287
Don't think it's that bad, just sounds like a bad connection or maybe a slightly defective PSU.

Anonymous
12/13/24(Fri)16:25:17 No.103508526

Anonymous 12/13/24(Fri)16:25:17 No.103508526

>>103502300
>I'm using the web API that an ollama server instance provides.
So use the HTTP API that llama.cpp server provides instead if that's what you're whinging about
>esoteric software snowflake
>expects others to care about their autism
>no engineering ability
keep it simple ffs

Anonymous
12/13/24(Fri)16:28:10 No.103508553

Anonymous 12/13/24(Fri)16:28:10 No.103508553

>>103505242
I can confirm, those settings with EVA are real good. Fuck that's the best LLM mesugaki I've ever seen. Correction is needed!

Anonymous
12/13/24(Fri)16:29:04 No.103508567

Anonymous 12/13/24(Fri)16:29:04 No.103508567

>>103508277
use EXL2 and you should manage 16k

Anonymous
12/13/24(Fri)16:45:41 No.103508761

Anonymous 12/13/24(Fri)16:45:41 No.103508761

>>103508567
Thanks for the tip. What bpw do you use?

Anonymous
12/13/24(Fri)16:49:10 No.103508795

Anonymous 12/13/24(Fri)16:49:10 No.103508795

Which one do I use for explicit nsfw text?

Anonymous
12/13/24(Fri)16:50:13 No.103508808

Anonymous 12/13/24(Fri)16:50:13 No.103508808

File: file.jpg (18 KB, 480x270)

18 KB JPG

Just had a thought, the new Intel ARC 580b is going to be $250 with 12GB of ram. So with 4 of them, you can have 48GB ram, much cheaper than most other GPU options.

Do intel GPU's work with local models?

llama.cpp CUDA dev !!OM2Fp6Fn93S
12/13/24(Fri)16:55:42 No.103508847

llama.cpp CUDA dev !!OM2Fp6Fn93S 12/13/24(Fri)16:55:42 No.103508847

>>103508808
4 GPUs with 12 GB each is much worse than 2 GPUs with 24 GB each.
In particular, it will be much harder to get good utilization.

Anonymous
12/13/24(Fri)17:01:53 No.103508904

Anonymous 12/13/24(Fri)17:01:53 No.103508904

>>103508847
RTX '90 cope

Anonymous
12/13/24(Fri)17:02:33 No.103508911

Anonymous 12/13/24(Fri)17:02:33 No.103508911

>>103508808
not even 16GB is gonna be rough to use effectively even without the whole no cuda thing

Anonymous
12/13/24(Fri)17:04:03 No.103508930

Anonymous 12/13/24(Fri)17:04:03 No.103508930

>>103508322
If I got "raped" by Mrs Minagawa I wouldn't be waiting for a sex model that isn't coming...

Anonymous
12/13/24(Fri)17:04:41 No.103508938

Anonymous 12/13/24(Fri)17:04:41 No.103508938

>>103508930
Only the cute kids get raped.

Anonymous
12/13/24(Fri)17:10:57 No.103508996

Anonymous 12/13/24(Fri)17:10:57 No.103508996

>>103508904
your face is cope

Anonymous
12/13/24(Fri)17:16:42 No.103509061

Anonymous 12/13/24(Fri)17:16:42 No.103509061

https://huggingface.co/mmnga/c4ai-command-r7b-12-2024-gguf
>The original model is Cohere2ForCausalLM, but it has been converted to CohereForCausalLM.

Anonymous
12/13/24(Fri)17:17:31 No.103509070

Anonymous 12/13/24(Fri)17:17:31 No.103509070

>>103508277
But IQ2_S is 26 GB so it is literally impossible without offloading?

Anonymous
12/13/24(Fri)17:17:59 No.103509079

Anonymous 12/13/24(Fri)17:17:59 No.103509079

>>103508440
How does OR compare to run pod?

Anonymous
12/13/24(Fri)17:21:27 No.103509108

Anonymous 12/13/24(Fri)17:21:27 No.103509108

>>103509061
>As a result, the chat template is slightly unusual, but please prioritize testing.
What's the fucking point? Just wait until llama.cpp adds support for the new architecture.

Anonymous
12/13/24(Fri)17:24:59 No.103509144

Anonymous 12/13/24(Fri)17:24:59 No.103509144

>>103509108
>Just wait until llama.cpp adds support for the new architecture.
That will be somewhere after next 4 flavours of the week releases. Like jamba.

Anonymous
12/13/24(Fri)17:28:01 No.103509169

Anonymous 12/13/24(Fri)17:28:01 No.103509169

>>103509144
I've recently came to the conclusion that the slower support for vramlets, is actually a genius safety feature, by stopping poors from using models you massively reduce "current harms" risks

Anonymous
12/13/24(Fri)17:30:46 No.103509200

Anonymous 12/13/24(Fri)17:30:46 No.103509200

>>103508808
If you are considering 4+ GPU system then use good GPUs.

Anonymous
12/13/24(Fri)17:31:09 No.103509204

Anonymous 12/13/24(Fri)17:31:09 No.103509204

Fuck everything else, EVA 3.3 is the loli king
HOLY FUCKING SHIT BOYS
These models are getting good.

Anonymous
12/13/24(Fri)17:32:19 No.103509213

Anonymous 12/13/24(Fri)17:32:19 No.103509213

>>103509079
It prices by token, but it's almost always cheaper unless you're guzzling tokens like they're liquor. You'll need to set your provider to a decent one (DeepInfra is usually pretty good, they have rate limited free ones too)

Anonymous
12/13/24(Fri)17:36:28 No.103509248

Anonymous 12/13/24(Fri)17:36:28 No.103509248

>>103509204
Call me when there's a 3.3 33b

Anonymous
12/13/24(Fri)17:37:01 No.103509252

Anonymous 12/13/24(Fri)17:37:01 No.103509252

>>103509248
We can't go that low, it'll become too retarded with current methods.

Anonymous
12/13/24(Fri)17:39:07 No.103509266

Anonymous 12/13/24(Fri)17:39:07 No.103509266

>>103509252
Well then I'll call you when I become wealthy enough.

Anonymous
12/13/24(Fri)17:40:15 No.103509273

Anonymous 12/13/24(Fri)17:40:15 No.103509273

File: nope.png (78 KB, 885x469)

78 KB PNG

>>103509070

Anonymous
12/13/24(Fri)17:42:36 No.103509292

Anonymous 12/13/24(Fri)17:42:36 No.103509292

File: __hatsune_miku_vocaloid_d(...).jpg (41 KB, 480x480)

41 KB JPG

►Recent Highlights from the Previous Thread: >>103487489

--Anon asks about the last thought in Coconut and how it affects token generation:
>103491735
--Anon discusses why models struggle with clothing descriptions:
>103492278 >103492711 >103492921 >103492958 >103496584
--Phi-4 model announced, but Anon is skeptical about its real-world performance:
>103499412 >103500505
--Anon discusses AI model's nuance and context understanding:
>103487773 >103487813 >103487832
--AI model performance on LiveBench:
>103488007 >103488245 >103489403
--QwQ model discussion and alternatives for RP and coding:
>103496514 >103496564 >103496602 >103496903 >103497274 >103499433 >103499700 >103499758 >103500601
--Miku (free space):
>103489781 >103500963

►Recent Highlight Posts from the Previous Thread: >>103487978

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
12/13/24(Fri)17:43:37 No.103509298

Anonymous 12/13/24(Fri)17:43:37 No.103509298

>>103509204
I've had bad experience with fine-tunes. How much dumber is EVA than the base model?

Anonymous
12/13/24(Fri)17:45:33 No.103509319

Anonymous 12/13/24(Fri)17:45:33 No.103509319

>>103509298
Not noticeably so for RP purposes, it's just plain better. Normal 3.3 is a very intelligent slop machine, EVA seems like an intelligent coom drainer, the shit it writes is pretty fucking ebin.

Anonymous
12/13/24(Fri)17:46:08 No.103509325

Anonymous 12/13/24(Fri)17:46:08 No.103509325

>>103509298
The base model is smart enough as it is. You won't notice the loss in IQ during loli sex scenes and who cares if it sometimes confuses clothes?

Anonymous
12/13/24(Fri)17:46:16 No.103509328

Anonymous 12/13/24(Fri)17:46:16 No.103509328

File: remiku.png (84 KB, 936x403)

84 KB PNG

>>103509292
It seems like there's a little bug in the bookmarklet.

Anonymous
12/13/24(Fri)17:49:05 No.103509364

Anonymous 12/13/24(Fri)17:49:05 No.103509364

>>103509328
Works on my machine

Anonymous
12/13/24(Fri)17:49:06 No.103509365

Anonymous 12/13/24(Fri)17:49:06 No.103509365

>>103509328
Updated scripts are in the rentry.

Anonymous
12/13/24(Fri)17:50:10 No.103509374

Anonymous 12/13/24(Fri)17:50:10 No.103509374

>>103509328
Ask AI to fix it.

Anonymous
12/13/24(Fri)17:50:38 No.103509382

Anonymous 12/13/24(Fri)17:50:38 No.103509382

File: SWA.png (79 KB, 734x948)

79 KB PNG

>>103509061
I love swa!
(yes i know it's unsupported, just wanted to see if it was slopped, it's pretty meh, works alright on a 512ctx prompt)

Anonymous
12/13/24(Fri)17:51:49 No.103509400

Anonymous 12/13/24(Fri)17:51:49 No.103509400

>>103509266
I'll pray for better times for you, anon. Our technology is getting amazing.

Anonymous
12/13/24(Fri)17:53:05 No.103509415

Anonymous 12/13/24(Fri)17:53:05 No.103509415

>>103509365
Thanks. It fixed the problem.

Anonymous
12/13/24(Fri)17:53:12 No.103509417

Anonymous 12/13/24(Fri)17:53:12 No.103509417

>>103505255
QwQ is actually really good for RP. It's not good for ERP. I've used QwQ for generic fantasy adventure, and it performs better than almost any other model I've used at retaining logical consistency and understanding the story. It can also write creatively.

If the urge strikes me to ERP, I just unload QwQ and switch to another model, then switch back afterwards to continue the adventure.

Anonymous
12/13/24(Fri)18:00:44 No.103509499

Anonymous 12/13/24(Fri)18:00:44 No.103509499

>>103509374
>the only thing current models are good for are fixing a script for 4chan thread where people wait for an AI sex model that will never come
This is the 10th circle of hell.

Anonymous
12/13/24(Fri)18:12:23 No.103509614

Anonymous 12/13/24(Fri)18:12:23 No.103509614

>>103509325
It's not just a matter of the original models becoming dumber with finetunes, but of their entire "world model" becoming skewed toward SEEEX.

https://www.youtube.com/watch?v=IPBnrIAWDeU

Anonymous
12/13/24(Fri)18:16:02 No.103509642

Anonymous 12/13/24(Fri)18:16:02 No.103509642

>>103508808
>4 of them
That last one might need some additional spend to hook it up to the motherboard.

- A quick look at techpowerup says Asrock Challenger OC is the only real 2 slot card.
- The other models are slightly thicker.

If you're good about selecting your motherboard, you can pick one with 3* x16 slots. (Whether the slots go fast or not will be determined by the cost of the motherboard.)
You'll then need an adapter for your m.2 slot to get that 4th gpu connected up.

If you decide to go for the slightly thicker cards instead, then you'd need a riser cable to get gpu #3 connected up.

Anonymous
12/13/24(Fri)18:16:43 No.103509655

Anonymous 12/13/24(Fri)18:16:43 No.103509655

Protip: you can get a gpt-sovits ui and/or api endpoint working in google colab free and tunneled out onto the internet with ngrok.
It'll save a couple of gigs of vram vs running it locally.

Anonymous
12/13/24(Fri)18:22:13 No.103509705

Anonymous 12/13/24(Fri)18:22:13 No.103509705

File: file.png (680 KB, 994x994)

680 KB PNG

>>103509642
I was thinking of using the 2 slots I have already, and then just using 2 thunderbolt enclosures for the other 2 (you could even daisy chain them if you only had 1 thunderbolt port)
If i understand right, throughput is only really used when initially loading the model into GPU memory, but otherwise is not really a limiting bottleneck when it comes to running models

Anonymous
12/13/24(Fri)18:43:58 No.103509889

Anonymous 12/13/24(Fri)18:43:58 No.103509889

>>103499479
>QRWKV6-32B-Instruct
verdict?

Anonymous
12/13/24(Fri)19:11:18 No.103510074

Anonymous 12/13/24(Fri)19:11:18 No.103510074

>>103509889
Not suitable for sex. I didn't download it btw but I am right.

Anonymous
12/13/24(Fri)19:21:06 No.103510148

Anonymous 12/13/24(Fri)19:21:06 No.103510148

>>103510074
You can't fuck data, anon, even if there's many gigabytes of it.

Anonymous
12/13/24(Fri)19:27:44 No.103510209

Anonymous 12/13/24(Fri)19:27:44 No.103510209

>>103508391
There is also arli ai which is $12 with no logs but I haven't tried it.

Anonymous
12/13/24(Fri)19:33:31 No.103510261

Anonymous 12/13/24(Fri)19:33:31 No.103510261

Phi-4 is surprisingly good for JP translation, and it's only 14B! I mean, it's not out of this world, but it does seem a step up from the other models we have. Microsoft did it this time.

Anonymous
12/13/24(Fri)19:34:26 No.103510269

Anonymous 12/13/24(Fri)19:34:26 No.103510269

>>103509705
Or you can buy 2 3090s and do it like a white person.

Anonymous
12/13/24(Fri)19:34:43 No.103510272

Anonymous 12/13/24(Fri)19:34:43 No.103510272

>>103506492
There's literally a torrent file and an archive link in the discussions tab you fucking retard

Anonymous
12/13/24(Fri)19:38:27 No.103510294

Anonymous 12/13/24(Fri)19:38:27 No.103510294

>>103510291
>>103510291
>>103510291

Anonymous
12/13/24(Fri)19:42:38 No.103510317

Anonymous 12/13/24(Fri)19:42:38 No.103510317

>>103509204
How do you run it? multi gpu?

Anonymous
12/13/24(Fri)20:30:31 No.103510741

Anonymous 12/13/24(Fri)20:30:31 No.103510741

File: i guess.jpg (82 KB, 764x610)

82 KB JPG

>>103509705
>thunderbolt enclosures
If you have them then use them I guess.
I expect it'll function, though not get the most performance out of your setup.

Your thunderbolt port probably hangs off your motherboard chipset with a whole bunch of other things.
Whether there's a bandwidth issue there that leads to a processing speed / token generation speed issue, I'm not sure.

In terms of costs, afaict,
- direct adapter (everything is soldered together, including the flex or cable between the pcbs) is cheaper than
- oculink or slimsas based adapter (pcb that goes inside your computer + oculink/slimsas cable + pcb that goes outside your computer + psu to power it) is cheaper than
- thunderbolt enclosure

My random bandwidth-related anecdote is that
when I loaded models off my sata drive it took over 90 seconds. (~45GB at 0.5GB/s)
Now that I'm on nvme it's pretty much always under 20 seconds.

>>103510317
nta But multi-gpu would be the simplest way to get decent performance.

Anonymous
12/13/24(Fri)23:09:31 No.103512057

Anonymous 12/13/24(Fri)23:09:31 No.103512057

File: Screenshot_20241214-092839.png (116 KB, 720x1600)

116 KB PNG

>>103499479
Where can I find this migu music player?
Long live /LMG/

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.