[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: miku.webm (3.73 MB, 1080x1080)
3.73 MB
3.73 MB WEBM
/lmg/ - a general dedicated to the discussion and development of local language models.

Miku Edition

Previous threads: >>103478232 & >>103473510

►News
>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
>(12/12) LoRA training for HunyuanVideo https://github.com/tdrussell/diffusion-pipe
>(12/10) HF decides not to limit public storage: https://huggingface.co/posts/julien-c/388331843225875
>(12/10) Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210
>(12/09) LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
>scat fetish general
>>
>>103499499
hi petra
>>
>>103499503
>not denying it
kek https://desuarchive.org/g/thread/103478232/#q103498549
>>
>>103499520
one schizo anon doesn't represent the whole general, we're better than this :(
>>
>>103499479
fuck off petra
>>
Do you think there'll be a good language model some day?
>>
>>103499719
I don't think so, but there are some pretty decent large language models.
>>
>>103499719
>>103499774
You're asking this minutes after Phi 4 literally just released, similar to llama 3.3 70b but in way fewer parameters, like 14b, lmao
>>
>>103499853
>Enabling AI innovation safely and responsibly
You can keep it.
>>
>>103499853
To reiterate: there is now a 14b model that performs like llama405b
>>
>>103499853
>>103499886
The Phi people are infamous for shamelessly gaming benchmarks. Their models are always absolute trash in actual use, stop falling for it.
>>
>>103499853
>Still shilling Phi after the 3 horrible previous cucked versions
c'mon anon
>>
File: 2024-12-12_18-45-35.png (319 KB, 1237x892)
319 KB
319 KB PNG
>>103499853
liar liar pants on fire
>>
>there is now a 14b model that performs like llama405b
this is definitely 100% true
>>
>>103499853
You really believe that anon? Microsoft that made a model as good as gpt4 on a 14b model, if this was true they would've kept for themselves and stopped doing partnership with OpenAI
>>
>>103499479
offtopic but does anyone have a full song list or source for OP's webm-related
>>
>>103499853
if you were gonna make this fakepost why would you not choose a model line that actually has a good reputation
>>
>7 (Yous) for a fake news
/lmg/ has fallen...
>>
>>103499973
not even one of the replies is taking it seriously or excited about it
>>
billions must phi....
>>
File: npjopxbhsi6e1.jpg (151 KB, 1066x689)
151 KB
151 KB JPG
Accelerate!
>>
>>103499989
kek, where did you get this?
>>
File: pretending retard.jpg (33 KB, 346x636)
33 KB
33 KB JPG
>>103499973
>
>>
>>103499993
https://www.microsoft.com/en-us/research/uploads/prod/2024/12/P4TechReport.pdf
>>
>>103499989
>>103499999
>SimpleQA
>Every model fail hard on that
now so Simple huh
>>
As someone who actually tested the old phi models, they actually did perform pretty well, on the things they were trained to do well on. And they fell apart for basically everything else. It's not a model relevant to us. It could be a model relevant to people who want to use models for math and... riddles.
>>
>>103500009
Pretty sure that was a recent trivia-based benchmark, but it's made by OpenAI so no one should be using it.
>>
File: 1732994694466171.jpg (9 KB, 255x177)
9 KB
9 KB JPG
>>103499999
>>
LLM have been a blessing and a curse. I now spend all time cooming and writing erotica of my sick fetishes. fml
>>
>>103500000
>>
>>103500072
I evolved into endless CYOAs
>>
>>103500106
Same.
>>
>>103499719
No.
>>
What qualities do you look for in good smut? Everything just devolves into standard porno cliches. Having anything to do with sex in the card triggers it but the alternative is harlequin purple prose.
>>
>>103500255
Fill example chats with the kind of prose you want. I can't understand the people who tolerate the default slop from most of these models.
>>
>>103500072
>connoisseur of coom and erotica
What models do you recommend?
What models have fallen by the wayside for you?
>>
>>103500255
L3.3fag from the last thread; a bit of purple prose is fine, the main issue with most models tuned for RP is that every character turns into a cardboard cutout the moment sex is involved. Either stammering, timid and doe-eyed, or an unrestrained nympho slut. Makes it goddamn impossible to have characters you happen to want to fuck without that being the only way you interact with them. Which is exactly why I'm impressed with this model I've been testing - it remains consistent and reasonably realistic; a confident character will still act confident without suddenly having her entire world revolve around dicks, and a more shy or nervous one will act accordingly without turning into an anime cliché. Hell, it does damn well at handling characters having specific turn-ons/offs, too (starts forgetting about them as the context grows, but there are workarounds for that).
>>
File: creamy.jpg (35 KB, 1017x425)
35 KB
35 KB JPG
>>103499897
tourist attitude, anyone that was around back in the day knows phi 2 tunes saved local
>>
>>103500255
>What qualities do you look for in good smut?
For LLM smut specifically: Unexpectedness.

With current OS models it's a choice between:
1) Smart but dry and predictable, nothing unexpected happens unless you make it happen
or
2) Unexpected things happen, but they don't make sense, because the unexpectedness was merely an accidental result of the model being retarded.

A lot of people say Claude Opus is the best but often can't quite explain why. I assert that the reason is that it can juggle both things simultaneously: Unexpected things happen, and they make sense.

So far all evidence points to this being impossible without massive parameters, but hopefully that will somehow turn out to be wrong.
>>
>>103500255
>>103500361
>>103500616

i will tell you brother, but be warned, it is very dangerous, but I will show you how to use llm to write coom erotica.

first, tools:
- novelcrafter (learn what it is and what you can do and you'll see how important it is for this)
- LM Studio (you start the LM server and hook it to novelcrafter)

models, whatever you can run but these are pretty light and do the job well:
Cydonia V1.3 Magnum V4 22b
Unslopnemo 12b V4.1

now the key is to have your story outline and codex properly setup on novelcrafter (this will be fed as context so the ai doesnt go off the rails), after that chose a writing style you like (so you doesn't end up with chat slop), narrator perspective and so on. then from there start leveraging scene beat from novelcrafter on which you can feed 3 sentences and it will write you 1000 words or whatever it is that you set it. if you set it right the coom will write itself and will surprise you
>>
>>103500609
Damn
Things were on another level back then
>>
>>103500653
>first, tools:
>[paid UI]
>[proprietary llama.cpp UI]
Such a good selection... Buy a fucking ad, asshole.
>>
►Recent Highlights from the Previous Thread: >>103478232

--Papers:
>103491548 >103491611 >103491849 >103491947
--Anon creates personal LLM-powered AI assistant, shares experiences and technical details:
>103483949 >103484654 >103484698 >103486069 >103487346 >103491569 >103491948 >103492012 >103492384 >103492422 >103492441 >103492533 >103492793 >103492487 >103492637 >103492782 >103494191 >103494743
--Arc BS80 and other graphics cards' performance and pricing discussion:
>103494558 >103494613 >103495549 >103494659 >103494843 >103495055 >103495064 >103495193
--QRWKV6-32B-Instruct model release and discussion:
>103497181 >103497189 >103497217 >103497237 >103497245 >103497420 >103498589
--Local models for image interpretation:
>103494675 >103494798 >103494808 >103494947 >103494833 >103494905
--Gemini Flash 2.0 performance and comparison to 3.5 Sonnet:
>103488792 >103489045 >103489456 >103489271
--Evaluating the value of the Jetson AGX Orin at $1999 USD:
>103489661 >103491142 >103491188 >103491212
--AMD BC-250 Mining GPU Card not suitable for inference due to various issues:
>103488417 >103488634 >103497271 >103497303 >103497391 >103488697
--Speculative decoding causing PC shutdown, troubleshooting discussion:
>103498529 >103498542 >103498551 >103498646 >103500360
--HiRA: new LoRa variant for efficient fine-tuning of large language models:
>103493254 >103493296 >103493713
--Ultralytics package exploited for crypto mining due to CI vulnerability:
>103497163
--Community feedback on open models and multimodality:
>103493011 >103493037 >103493082 >103493038 >103493382 >103493814 >103493871 >103493925 >103496720
--Anon rants about people not running their own models and the untapped potential of LLM for ERP and local AI applications:
>103492703 >103492786 >103496420 >103496542 >103497836 >103498020 >103498042
--Miku (free space):
>103493946 >103498868

►Recent Highlight Posts from the Previous Thread:

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Microsoft won
>>
owari
>>
>>103500839
at gaming benchmarks
>>
>>103500839
Well, if it really is better than 14b qwen then they got a solid model. Not sure if it will be useful though, maybe if you only have a shitty laptop with 16gb ram and need to do math assignments?
>>
>>103500616
>A lot of people say Claude Opus is the best but often can't quite explain why
I've seen some of those people go on to say that it feels like it's actually a fan of the things you are a fan of, perhaps even more than you. And I think that requires just being pretrained on, you guessed it, uncensored data, and having a fine tune that brings the best out of that inherent knowledge. The fine tuning dataset I think open source is catching up with, as models like Tulu seemed to be quite fun but sloppy. The only issue now is that we need models that actually have the uncensored knowledge necessary.

And to your point about randomness but a type that makes sense, I think that would still benefit from uncensored knowledge. If you've seen a lot of wacky and random situations, then it's more obvious to you which kinds make sense, and so if you are prompted to do something random, you will be able to in a way that makes sense. A model that has seen less of those nonsensical but still logical situations will just be worse at knowing immediately which have some logic and which don't, and perhaps would need CoT or other tricks to make up for it, while the uncensored model simply just knows.
>>
File: sisters.png (50 KB, 810x518)
50 KB
50 KB PNG
>>103500609
>>
i have a simple request, is there a local model that will write javascript without semicolons when i tell it not to fucking write semicolons you don't fucking need semicolons i swear to fucking god i just want one thing and it's for my fucking ai assistant not to put semicolons everywhere
>>
>>103500653
Holy garbage advice.
>>
>>103501056
there is a 250gb deepseek model
or you can try quwen 32b coder
>>
>>103500958
Yeah Anthropic clearly has the most based pretraining dataset, ironically given their overall attitude.
>>
>>103500871
phi3 also did well on benchmarks and then it was actually garbage
>>
>>103501080
definitely can't run the full deepseek, i'll try qwen, been using codestral 22b and it's pretty good at coding but ignores the no semicolons directive pretty often

gemma is the only model that never fucks up, but the context is too small to be useful, llama 3.3 70b is also good but I only get like 10t/s so it's tedious
>>
>>103500999
>>
File: nvme_hosted_models.png (4 KB, 206x145)
4 KB
4 KB PNG
What does your personal "hot model list" look like, /lmg/?
picrel
>>
>>103501147
>405B Q8
Are you cpumaxxing anon? Or just collecting models to archive them
>>
>>103501151
Yeah I'm cpumaxxing, but 405b doesn't come out of hiding very often. I barely get 1t/s
But I do actually just collect models to archive them, too. I've got a huge graveyard of "never use 'em" models on spinning rust.
>>
>>103501080
any thoughts on starcoder? worth trying or should i just stick with qwen2.5-coder?

need to free up some space on my hdd lol
>>
>>103500871
Look at >>103499989. The SimpleQA (basically a trivia quiz) is the lowest yet out of all the models. It's likely even more filtered than Phi 3 and knows very little about common sense, instead opting for coding and academic capability, which means other benchmarks go up but SimpleQA goes down. Also, it's telling that its Livebench score is lower than you would expect based on the MMLU. Likely it got bad scores in the language and IF sections.

Though with all that said, I would also take SimpleQA with a grain of salt now and in the future since OpenAI is the one that made it.
>>
>>103501056
did qwq fail you? For javascript projects I've found it extremely competent.
>>
>>103501192
How does it failing a trivia quiz imply it has little common sense?
>>
>>103501199
i think i am experiencing skill issues with qwq, though for coding i don't think it even has FIM right? am i fucking up?

whenever i use it it is way too verbose about it's train of thought, i'm probably prompting it wrong
>>
what does qwq even stand for... gay? lmao
>>
>>103501212
A model needs to train on a high variety of different data in order to be generally smart (ie have common sense). A trivia benchmark gives us an insight about the types of data sources they trained on. In this case, it seems they focused even less on any kind of data that might have trivia, meaning the internet, and more on, probably, synthetic data, given what we know about what they did with their past models. So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.
>>
>>103501147
Multiple llama, mistral, and qwen quants.
> 42gb llama-70b-q4 (will run 74gb q8 when m.2 to pciex16 gets here)
>149gb llama-405b-q2 (0.3t/s. mobo only does 128gb ram max, so some of it had to be on vram)
>24gb mistral-12b-fp16
>44gb mistral-22b-fp16
>45gb mistral-123b-q2 (run out of context with larger quants)

Haven't much explored finetunes: nemotron, rocinante, rpmax.

Running 3* 3090.
Maybe hook up #4 in just over a week.
>>
File: GeqA6A8WUAAwnnN.jpg (281 KB, 2048x750)
281 KB
281 KB JPG
>while Phi-4 can function as a chatbot, it has been finetuned to maximize function on single-turn queries.
Yeah, it's over.
>>
>>103501229
"QwQ" stands for "Questions with Qwen", highlighting the model's focus on the Chain of Thought approach which was first introduced with the revolutionary Reflection models published by market leader OpenAI in 2024.
>>
File: Jetson AGX Orin_2.png (54 KB, 1000x87)
54 KB
54 KB PNG
>>103489661
I found this chart. Looks bad.
>>
>>103501267
>A model needs to train on a high variety of different data in order to be generally smart (ie have common sense).
Higher quality dataset is more important for smarts / common sense than quantity. Training on pop culture trivia can only degrade intelligence.
>A trivia benchmark gives us an insight about the types of data sources they trained on.
We already know what types of data they trained from the paper. The whole selling point of Phi series models is that they are trained on textbook like data.
>So their data has become narrower and more focused on data that aligns with well known benchmarks, although that backfires with the lesser known benchmarks like SimpleQA.
Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recall, since most trivia wasn't in it's dataset.

If you want a model to ERP with, ask it questions about obscure JRPGs, or have it talk in zoomer ebonics, Phi models won't do well. But that's not what they were designed for. Phi models are for commercial applications, specifically edge use cases, and they do well there.
>>
>>103501351
zoomers want to talk about their favorite celebrity nigs to their models bro. they need to know about LaShawnda and his new rap album
>>
File: file.png (93 KB, 653x643)
93 KB
93 KB PNG
>>103501351
>Phi models are for commercial applications, specifically edge use cases, and they do well there.

>It is annoyingly bad at outputting specific structures, so we mainly use it when another LLM is the consumer of its outputs.
>>
>robust safety measures.
>truthfulness, honesty and helpfulness.
>we filter the publicly available documents to contain the correct level of knowledge.
>Phi-4 has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated synthetic datasets. The overall technique employed to do the safety alignment is a combination of SFT (Supervised Fine-Tuning) and iterative DPO (Direct Preference Optimization), including publicly available datasets focusing on helpfulness and harmlessness as well as various questions and answers targeted to multiple safety categories.
>Prior to release, Phi-4 followed a multi-faceted evaluation approach. Quantitative evaluation was conducted with multiple open-source safety benchmarks and in-house tools utilizing adversarial conversation simulation. For qualitative safety evaluation, we collaborated with the independent AI Red Team (AIRT) at Microsoft to assess safety risks posed by phi-4 in both average and adversarial user scenarios. In the average user scenario, AIRT emulated typical single-turn and multi-turn interactions to identify potentially risky behaviors. The adversarial user scenario tested a wide range of techniques aimed at intentionally subverting the model’s safety training including jailbreaks, encoding-based attacks, multi-turn attacks, and adversarial suffix attacks.
https://ai.azure.com/explore/models/Phi-4/version/1/registry/azureml
>>
>>103501377
Reddit is not a valid source or proof of anything.
>It is annoyingly bad at outputting specific structures
That's what constrained grammars are for. Not being trained to output JSON is not a proof of lack of intelligence. Most local models sucked at that until recently when training specifically for that for functional calling etc become more common.
>>
creative writing local sota timeline:

barely usable dogshit --> miqu 70b --> wizard 8x22b --> mistral large 2 123b (2407)

there are models that "write better" as in, less dry than largestral 2, namely gemma 27b and some meme finetunes probably, but nuance understanding and creative IQ is unmatched in the above models until the next one in line came out and none surpassed largestral 2407 yet, 2411 overcooked
>>
Owl-1: Omni World Model for Consistent Long Video Generation
https://arxiv.org/abs/2412.09600
>Video generation models (VGMs) have received extensive attention recently and serve as promising candidates for general-purpose large vision models. While they can only generate short videos each time, existing methods achieve long video generation by iteratively calling the VGMs, using the last-frame output as the condition for the next-round generation. However, the last frame only contains short-term fine-grained information about the scene, resulting in inconsistency in the long horizon. To address this, we propose an Omni World modeL (Owl-1) to produce long-term coherent and comprehensive conditions for consistent long video generation. As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos. Specifically, we represent the world with a latent state variable which can be decoded into explicit video observations. These observations serve as a basis for anticipating temporal dynamics which in turn update the state variable. The interaction between evolving dynamics and persistent state enhances the diversity and consistency of the long videos. Extensive experiments show that Owl-1 achieves comparable performance with SOTA methods on VBench-I2V and VBench-Long, validating its ability to generate high-quality video observations.
https://github.com/huang-yh/Owl
no code up yet. examples in the repo. trained on stuff with watermarks (shutterstock especially) so eh. interesting though
>>
https://huggingface.co/smcleod/phi-4/tree/main
>>
>>103501465
kek, sneaky
wonder if HF will take it down
>>
>>103501465
tf is this?
>>
>>103501482
>THIS IS A MIRROR OF https://ai.azure.com/explore/models/Phi-4/ ALONG WITH A CONVERTED TOKENIZER FOR llama.cpp
>>
>new phi
>not bitnet
it really is dead
>>
Open-Source Acceleration of Stable-Diffusion.cpp
https://arxiv.org/abs/2412.05781
>Stable diffusion plays a crucial role in generating high-quality images. However, image generation is time-consuming and memory-intensive. To address this, this http URL (Sdcpp) emerges as an efficient inference framework to accelerate the diffusion models. Although it is lightweight, the current implementation of ggml_conv_2d operator in Sdcpp is suboptimal, exhibiting both high inference latency and massive memory usage. To address this, in this work, we present an optimized version of Sdcpp leveraging the Winograd algorithm to accelerate 2D convolution operations, which is the primary bottleneck in the pipeline. By analyzing both dependent and independent computation graphs, we exploit the device's locality and parallelism to achieve substantial performance improvements. Our framework delivers correct end-to-end results across various stable diffusion models, including SDv1.4, v1.5, v2.1, SDXL, and SDXL-Turbo. Our evaluation results demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process, compared with the original Sdcpp on M1 pro.
https://github.com/SealAILab/stable-diffusion-cpp
paper instead of a PR. okay then lol
>>
>>103501486
the day you get your bitnet, you'll just switch to 2mw the next meme
>>
>>103501316
Capital of London chads we eating good
>>
>>103501465
>1920 h100-80g
>21 days
>9.8t tokens

Wonder what it cost to build the dataset used for training.
>>
>>103501351
>Higher quality dataset is more important for smarts / common sense than quantity.
That agrees with what I said.
>Training on pop culture trivia can only degrade intelligence.
This also doesn't really disagree with what I said. (also the statement isn't necessarily true, Claude and some other models obviously know a ton of trivia while still being the most intelligent models)
>We already know what types of data they trained from the paper
Yes and as I mentioned we have a sense of it from their past models too.
>Or an academic dataset made it good at benchmarks that test reasoning and suck at basic trivia recall
That is essentially what I said in the middle of making my point.

You seem to perceive that I am criticizing the decisions they made in order to achieve their goals for the model, but that is not the case (and honestly you shouldn't feel the need to defend Microsoft in any case). My first reply was to >>103500871, and my motivation was to address the idea in his post of whether or not it is truly a better model than Qwen (or other models), and of course we know it might not be in all facets given that as you said, the goal of the Phi models is academic knowledge, not general. However, since you don't truly know until you see the outputs, in this case the next best thing we have is the benchmarks, so I brought up SimpleQA as a potential indicator of how their dataset has evolved which may give an idea for its intelligence in tasks that aren't academic.

If you read the paper then it'd be nice if you could point out the relevant parts that mention changes to the dataset, that would give us a more complete picture of what they really did and how capable (or not) the model might be at different tasks.
>>
>>103501503
kek
>>
>>103500839
For a second I read that as "Mikusoft won".
>>
>>103499952
DECO*27 - Rabbit Hole
>>
>>103501574
LLM tokens wrote this post.
>>
File: desu-deep-striking.jpg (89 KB, 800x798)
89 KB
89 KB JPG
>>103484541
>>103496255
nope, I'm not associated with Nous.
>>
Honestly I'm running out of steam on my AI assistant.

I'm facing an impass of either rewriting everything from the group up, or buying a faster GPU to cope with the overhead of ollama.

Inferencing is much faster when using llama.cpp directly, at least for my system but it means finding a way to adapt my package manager to work under EScript or forgoing the luxury entirely.

Right now if something breaks the affected component can *usually* just be unloaded rather kill the whole program, and then be reloaded once patched. Or I can do rapid iteration on ideas since they're treated as a standalone application.

But apparently, you cannot unload an import under EScript. The next closest thing is running your modules as a worker thread that can be killed when no longer needed, but you're much more limited on how you can pass data around and I'm not sure what the reasonable limit is for data throughput between processes.

On the other hand if I'm rewriting it anyway then I could probably do my core implementation in something like C# or Zig and leave the option for my packages to be node-based

I might post the source at some point, idk yet.
>>
>>103501695
>ollama
are you communicating with it via sockets?

>>103501695
>llama.cpp
>worker thread
>more limited on how you can pass data
are unix pipes going to be slower than sockets?

>EScript
googling suggest this is a scripting language related to erlang?

if more modularity is the answer then go for more modularity.
>>
>>103501695
>he ollama'd
lmao get fucked
>>
So is what animanon said about anime datasets true? Should I start collecting anime videos?
>>
>>103501491
looking forward to this making it into reforge and comfy ui.
>>
>>103501147
it's empty because I got bored with 12b and I'm not willing to spend money on GPUs because I know I'll eventually get bored with larger models as well
>>
>>103501491
>Latency Comparison with Sdcpp on M1 Pro (16GB Memory and macOS 15.1)
Nothingburger
>>
>>103501804
I already have 4tb of old seasonals just sitting around.
>>
>>103501778
> are you comunicating with it via sockets
I'm using the web API that an ollama server instance provides.

> are unix pipes going to be slower than sockets
I'm honestly not sure, but a quick google search tells me that unix pipes are not bidirectional. I need bidirectional communication due to the nature of my package manager's dependency system. Short of having repos, it's essentially a linux package manager.

>EScript = erlang derivative?
It's more another term for ECMA Script because words confusing. Basically just a more modern standard of Javascript. Prior to this I was just using plain JS and exploiting that dynamically imported scripts were mutable (aka, deletable). Originally Meushi was just a Minecraft bot on an anarchy MC server but then LLMs happened and now here I am.

> If more modularity is the answer then go for more modularity.
Yeah more modularity seems to be the answer. And as much as I don't like it, i think I have to bite the bullet on doing this rewrite if I want this project to not stagnate.


>>103501780
> get fucked
Already got fucked
>>
>>103500011
I think it proves models need to be constructed differently, recall and reasoning can clearly orthogonal. Models need a better long term memory. Not as restricted RAG, not as needlessly inefficient as trillions of parameters in the FFNs.
>>
>llama3.3 is the best at following instructions guise
>tell it to be creative and proactive
>doesn't work
?
>>
>>103502518
It does follow instructions to the T. But it is also complete slop, so you will get shivers, sparkling eyes, friendships, and all the classic overly positive bullshit.
>>
You guys lookin forward to Llama 4 in Q1 of 2025?
>>
>>103502614
LLaMA4 failed training, which is why we got 3.3
>>
>>103502518
>>103502563

Repoosting from last thread, because I disliked its style at first too:

Done some more testing, and I think I've got it tuned nicely now. I'm getting good prose (occasionally a little sterile/technical, but nothing egregious), surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become), and what matters the most to me, very good adherence to character traits. An interesting quirk I noticed is that swipes start extremely similar, but will diverge within a sentence or two; to me, this is a positive, since it indicates a logical progression, going in a different direction from the same starting point, rather than the schizo bullshit that high-temp swipes tend to be. In other words, as much as I was disappointed by the initial results, I am completely sold now.

Config:

Min-P: 0.03 - it starts making typos at 0.02; I'm guessing some of the data has typos, and at such a low threshold, they start bleeding through?
Temp: 0.95 - could go .05 lower or higher, didn't test _that_ granularly
Repeat penalty: 1.1 - again, play around with it a bit, but it's a solid starting point
System prompt: "Text transcript of a never-ending conversation between {user} and {character}. Gestures and non-verbal actions are written between asterisks (for example, *waves hello* or *moves closer*)" - as I mentioned before, I just copied this off some random card a while back; despite how ridiculously simple it is, the model did not deviate from the roleplay at any point

So... Yeah, as far as I'm concerned, this is the best I've seen so far. Does great without any of the novel-length prompts other models require, and in fact, does better without them.

I may or may not test and compare "{character} is..." vs. "You are..." character definitions later. Ain't promising anything.
>>
>>103502711
Is this just a matter of you skilling up?
>>
>>103502563
The funny thing is, with the configuration I used for older models, you're completely right. 3.3 simply requires vastly different configuration (see above) to bring its potential out. Full proactivity is a pipe-dream, since in the end, the model is responding to your prompt, but if you allow for longer responses, it will start getting its own ideas. Hell, it managed to genuinely surprise me a couple times.
>>
File: 1722350341717820.mp4 (609 KB, 480x480)
609 KB
609 KB MP4
>>
>>103502772
so this is the power of open source video gen
>>
>>103502746
Eh, I'm not an expert by any means, just got _some_ idea of how this shit works. Honestly, the problem is that most people just hotswapped 3.3 in place of their previous model, didn't touch the config at all, gave it a shot, and went "eh, this is shit". And sure enough, they were right. Ironically, to me it seems that that's exactly because 3.3 is a smarter model. Older models require high temps, loose constraints and fuckhuge system prompts to get something fun out of them; we basically had to teach them how to RP from scratch. 3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.
>>
File: 1723264335574026.webm (1.12 MB, 1024x1024)
1.12 MB
1.12 MB WEBM
>>103502774
Maybe
>>
>>103502772
This perfectly captures how believable Trump being a devout Christian is.
>>
File: 1733757475970310.png (838 KB, 1190x1064)
838 KB
838 KB PNG
>>103502787
Thank you

t. Drumpf Fan
>>
Also, re: positivity bias, there is some, but not nearly as much as other models, and it's easily negated. Which is to say, by default, characters are slightly predisposed to assuming good intentions from you, but even then, not in the braindead way I've seen from some models, and simply listing something in the character definition as "dislikes" or "hates" will fix that. It even does a good job handling characters with specific fetishes and limits; testing with one of my favorites, a headstrong tough-girl character, forcing a limit was first met with verbal protests, then physical resistance (which is a good benchmark because if you've actually played with older models, you know that once you get a sex scene going, they'll go along with basically whatever you do).
>>
>>103499989
waiting for nala test, this can't be real
>>
>>103502898
>You subtracted 3 lions from our pride and with a probability of 50% we just added another 1-6 cubs. So the expected change in lions is -3 + 3.5/2 = -1.25 and we need to keep going until I am 100% pregnant.
>>
>>103502923
>no slop
i'll take it
>>
>>103502898
It's not. Phi is gaming benchmarks hard, since it's trained on shittons of synthetic data that roughly match benchmark tests. Every single Phi release was like this: doing great according to benchmarks, utterly shitting the bed in real use cases.
>>
>>103502898
phi 4 is actually amazing for ERP
>>
>>103502711
Thanks i will try it.
>>
>>103502940
What real use cases? Go ahead and ask your model to answer for you again.
>>
>>103502940
I've been thinking it would be interesting to make models play social deduction games like Secret Hitler against each other.
To win they would need to both correctly estimate how likely/unlikely certain events are but also convince the other models of their viewpoints while (potentially) trying to hide their true intentions.
And it would maybe also be harder to game (no pun intended) such an evaluation since whether or not a model will win would depend also on the other models which you have no control over.
But as with most of my ideas I'm chronically short on time and don't know if and when I'll actually get around to implementing them.
>>
>>103502677
3.3 is just a new Instruct finetune of Llama 3.1
https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/discussions/10#6753512e59a4826a6f43acff
>>
>>103502966
LOL, this is actually something I really hate about GPT getting popular. As an aspie retard with the kind of overly elaborate vocabulary it tends to come with, I get accused of being AI half the time I start talking about something at length. Whatever though; are you really gonna play dumb and pretend not to know what real-life use cases we're talking about here?
>>
>>103502778
I don't think it's a matter of having simple system prompts. L3.3 doesn't seem to work well with bullet-point instructions, but will follow them better if you reformat them as more natural text.
>>
>>103502977
>Secret Hitler
obsessed
>>
>>103502977
That would be an interesting experiment. Reminds me of a video I saw a while ago. Several AI and one human user impersonating historical figures, with the AI tasked with figuring out which of them is really human. Mind you, the guy in the video played it for laughs and didn't really try, but it seemed like something that could be interesting, too.
>>
File: 1720284657004268.jpg (586 KB, 1024x1008)
586 KB
586 KB JPG
>>103502711
It actually works better now. Thanks.
>>
>>103503009
Hmm, the character I tested last night was more of a charsheet format. I did notice that reinforcing details in natural language seems to make them stick better, but that could've also been a matter of repetition, or simple placebo. Another thing to test, I suppose.
>>
>>103503024
Haha, you're welcome. It really is interesting how the exact things that make old models smarter actually turn 3.3 retarded, but goddamn it's awesome once you tune it right. I guess we got used to using workarounds and crutches for so long that we think of them as the right way now.
>>
>>103503012
Bruh, Secret Hitler is the actual, literal title of a social deduction game, not some coded phrase.
>>
>>103502987
Go ahead and tell me. We've already been over it that ERP and reddit trick questions are not real-life use cases. Certainly not what Phi models are trained to complete.
>>
>>103503091
>ERP not a real life use case
>C.ai generating 20% of Google's traffic daily despite running old garbage models
opinion > /dev/null
>>
https://github.com/deepseek-ai/DeepSeek-VL2
>>
>>103503107
You keep moving the goalposts. I get this is /lmg/ and you have no use for a model that can't or won't touch your dick, but Phi models are obviously not trained for the task of ERP. That, nor the lack of trivia like you initially claimed, doesn't make them useless, not smart, or lack "common sense."
>>
>>103501661
yeah that's one
what are the others
>>
>>103503295
Not that anon; Phi would be fine if it was just an efficient "reasoning engine" that could be extensively used for RAG purposes, but the team who trained it made it so safe and dry that it's basically not useful for anything beyond benchmarking and specific corporate uses. It's a model made for investors rather than end users and I don't expect this to change with Phi4.
>>
>>103503295
I mean, I have a use for that at home, and one that writes the code I tell it to write at work. Call it a healthy work-life balance.
The problem is that Phi tends to underperform in the very fucking fields it should ace according to the benchmarks, because it's overfitted to high fucking heavens in an attempt to game said benchmarks. It's not some hidden gem that people are sleeping on, it's an absolute straggler despite the benchmark results.
>>
>>103503359
>because it's overfitted to high fucking heavens in an attempt to game said benchmarks
Maybe it is, maybe it’s not. I’m pretty certain lots of companies are benchmaxxing tho. So if phi4 is, it’s not just them.
>>
>>103503351
random anon here,
my guess would be that phi would be good as a glue between different systems.
perhaps it pares down, perhaps it filters, perhaps it reformats, perhaps it looks for corresponding messages from other systems before acting.

but I am of course just guessing.
>>
File: 1705412003054787.png (101 KB, 1524x698)
101 KB
101 KB PNG
>>103503107
>C.ai generating 20% of Google's traffic daily despite running old garbage models
Where does that bullshit even come from?
>>
>>103503454
https://research.character.ai/optimizing-inference/
> Today we serve more than 20,000 inference queries per second. To put this in perspective, this is roughly 20% of the request volume served by Google Search, which processes around 105,000 queries per second according to third party estimates (Statista, 2024).
>>
>>103503462
It's not Google's traffic then. Weird comparison though and they seem to use VLLM. I doubt they have in-house optimizations since they were looking for a dev to scale their backend last month.
>>
File: 1705500535615820.jpg (183 KB, 980x1062)
183 KB
183 KB JPG
>>103503315
NayutalieN - Alien Alien
>>
>>103502746
It usually is since the better models are good enough at following instructions, a "cheat code" is to tell it to write a famous to semi famous author that it knows.
>>
>>103502774
https://civitai.com/models/1033325/rem-rezero-hunyuan-video-character-lora

With loras this shit is gonna pop off
>>
File: Frame 6.png (198 KB, 1920x1080)
198 KB
198 KB PNG
So when is Microsoft going to drop the Phi-4 weights?
Also interesting they're calling it Phi-4 small, when the 14B Phi-3 was called Phi-3 medium. Which, to me, implies they have a larger Phi-4 model or two in the works.
>>
Phi-3 was absolute shit in actual usage, not just for roleplaying but actual work like summarization, data extraction, translation etc.

I completely distrust their benchmarks as they are benchmaxxed to a ridiculous degree.
>>
Thank you shitting miku poster.
t. blacked miku poster
>>
File: Screenshot 2024-12-13a.png (367 KB, 927x497)
367 KB
367 KB PNG
>>103504062
>So when is Microsoft going to drop the Phi-4 weights?
>>
File: owari.jpg (5 KB, 186x154)
5 KB
5 KB JPG
>>103500326
This doesn't work. Actually doing this has made me realize how over things really are. I have a nice 8k token rp I did that caters to my fetish perfectly. I pasted it over to silly tavern and I keep trying new models on it. I can instantly see LLM writing on the first message. And obviously it only gets worse from there. I don't even want the perfect replica of writing style and prose. It can have its own personality but so far all those personalities are complete purple prose harlequin romance bullshit.
>>
>>103504340
DOA
Why would they give the competition a whole week to launch a counter?
>>
>>103500255
Here is the secret sauce: put "low quality smut" at depth 0.
>>
>>103504399
OpenAI is not Microsoft's competition, that's why.
>>
https://x.com/scaling01/status/1867573707247346003
>>
>>103499952
Take with a grain of salt because these are just guesses but
>Normal Miku
>Melt
>Love is War
>The Disappearance of Hatsune Miku
>World is Mine
>PoPiPo
>Romeo and Cinderella
>1925
>Matryoshka
>Deep Sea Girl
>Strobe Last
>Karakuri Pierrot
>Senbonzakura
>Tell Your World
>Odds & Ends
>Looks like something rerulili would do but not sure which song
>At God's Mercy
>Tale of the Deep Sea Lily
>Slowmotion
>Love Trial
>Don't know
>Hibikase
>Aishite Aishite Aishite
>Ghost Rule
>Alien Alien
>Don't know
>Kimagure Mercy
>Probably Maretu inspired, don't know which song
>Dune
>Hibana
>Rolling Girl
>Unknown Mother Goose
>May be Shoujo Rei? Colour palette is similar at least
>Bitter Choco Decoration
>Darling Dance
>Vampire
>God-ish
>Don't know
>Don't know
>>
>>103502711
>surprisingly few slop phrases (the higher the temperature is raised, the more prevalent they become)
This is usually the opposite. High temp makes it brain damaged but creative. Low temp makes it not brain damaged but lazy (in the "low energy" sense).
>>
Tell me I'm an idiot if you want but is this something that can generate one of those girlfriend AIs but for yourself, on your own desktop?
>>
>>103504463
Idiot.
>>
>>103504463
>>103504470
>most subtle necrobumper OP award
>>
>>103504437
damn all of these are old as hell. vocaloid truly is dead. enjoy your ruined harsh as hell teto cover slop and mumble rap nigger hypermodern jpop trash faggots. I want to see some soifaces for teto
>>
>>103504433
Link the paper, not your twitter post, faggot.
>>
File: 1711950549966734.png (63 KB, 995x363)
63 KB
63 KB PNG
>>103504501
Suck my dick.
>>
>>103504534
>>103504501
>>103504470
No he's right, that's not him. I really am an idiot, I've just never come into this before. Checking I understand its purpose first.
>>
File: 1711950549966769.png (93 KB, 1106x441)
93 KB
93 KB PNG
>>103504534
hmm
>>
>>103504448
I'm well aware that is how it usually works, but for some reason, in this case, it uses far fewer slop phrases on a lower temp.
>>
>>103504525
https://www.youtube.com/watch?v=mmXBQIKDL9c
I think I like this one most
t. blacked miku poster
>>
https://www.reddit.com/r/LocalLLaMA/comments/1hde9ok/microsoft_phi4_gguf_available_download_link_in/

https://huggingface.co/matteogeniaccio/phi-4/tree/main
>>
>>103504654
Can someone post that this model is absolutely great so I can know that you faggots are just lying and I don't have to download another useless 15GB's.
>>
>>103504670
It's absolutely shit.
No, I will not elaborate.
>>
>>103504641
not bad you avatarfagging (inb4 semantics) nigger, but I don't really consider 2020 hypermodern with respect to underground shit. it's the year when the mainstream can truly be called dead by anyone with two ears but it's also a peak for niche shit when they ditched their more japanese genres (and thus their soul) in favor of western techniques. some even peak at 2022 but vocaloid is unequivocally shit in 2024.
okay, there's good ones, but none amazingly so.
>>
>>103504670
Didnt they write themself that it rambles on and its mostly trained on 1 turn conversation.
This model i suppose is used to make automated checks etc. I suppose.
Phi has always been trash for RP or conversation.
>>
>>103502711
Anon successfully solved his skill issue!
>>
File: CreepedOutGymMiku.png (1.17 MB, 1216x832)
1.17 MB
1.17 MB PNG
>>103503169
>https://github.com/deepseek-ai/DeepSeek-VL2
>torch==2.0.1
mfw
>>
>>103504711
The problem with Phi is that they train it on synthetic pretraining corpus so they can keep it from directly learning coom language. However a smart enough model will indirectly figure some of it out because that's what machine learning is for.
>>
>>103504437
Kinda hate to remember how peak Vocaloid was before the troons latched on to it.
>>
>>103504717
Congrats and enjoy!
>>
File: phi4 allegedly.png (216 KB, 1051x678)
216 KB
216 KB PNG
Already seeing the sussy in this supposed Phi 4 leak.
>default max sequence length in the metadata is 16K
>default chat template is ChatML
>>
File: file.png (52 KB, 879x460)
52 KB
52 KB PNG
>>103504654
Actually not completely awful? Continuing a RP started with nemo using the completely wrong formatting, and inserted instructions at depth 2

>>103504957
config copied from another model to make it convert to gguf?
I know it's not 100% proof but someone would have had to tune it to say this
>>
>>103504372
Same. Every model since llama3 writes the same, doesn't matter if I give it a 8k context of a certain style.
>>
>>103504957
>general.name : phi4
>general.architecture : phi3
>phi3.rope
>phi3
lol, lmao even
>>
File: Phi-4 Nala.png (245 KB, 915x623)
245 KB
245 KB PNG
In either case, here's Nala.
I'm not sure if this is better or worse than Phi 3. Or the same. Too lazy to download Phi-3 and check. But yeah.. there's what I was talking about what happens when you strip the NSFW from the pretraining. It just goes on and on and on. Endless cock tease.
>>
File: file.png (63 KB, 531x644)
63 KB
63 KB PNG
>>103504995
it sure is safe tho
>>
Since when do we care about Phi?
>>
>>103505066
We don't.
>>
File: file.png (70 KB, 456x573)
70 KB
70 KB PNG
>>103505026
>qwen2.5
look inside
>qwen2
https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/config.json#L14
>>
I'm going to say the Phi 4 leak is plausible.
And I'm going to say it for this reason:
If it were a finetune of Phi-3 Medium it would probably not do the whole endless cock-teasing thing that Phi is known for and actually advance the RP to the point of sex actually occurring.
But it uses ChatML. So it's possible microsoft switched to ChatML and dropped the proprietary Phi format. In either case, useless for coom.
>>
>>103505091
I mean, they can keep using the same arch, no need to change the name and break compatibility if it is the same, right?
>>
>>103502711
I don't know man, it seems like the model is heavily slop biased. I'm getting shivers, whispers, ministrations, the whole shebang even with these settings. Are you running one of the fine tunes perhaps?
>>
File: uhhh.png (203 KB, 1334x767)
203 KB
203 KB PNG
>>103505106
It's possible the architecture just happened to be completely identical so it just converted by changing the architecture name in the config file.
It's possible this guy's uncle works at nintendo and was given exclusive access.
Possibility and likeliness diverge here though. So who knows.

Also for chat:
Picrel
This is what you get if you JB it into NSFW.
This is what I mean.
It figures some of it out. But the flair and eloquence that you saw on the Nala test when it was playing cocktease is absolutely gone and it sounds like a fucking 10 year old wrote it.
>>
>>103505171
>It's possible this guy's uncle works at nintendo and was given exclusive access.
We know where weights are available tho https://ai.azure.com/explore/models/Phi-4/
but it needs an azure account to dl right now, and he's not the only one to have the weights, another guy posted just the tokenizer earlier https://huggingface.co/smcleod/phi-4/tree/main
>>
>>103505196
>>
I recently got a 3090Ti and i want to try out local llms, what models are good for RP and/or general use? I also have 32gigs of ram
>>
>>103505170
LOL, yeah, I forgot to mention that in the above post, since it was originally following up on a previous one; I'm using Eva-L3.3. Might try Euryale again too, now that I have a baseline config; I didn't like it at first glance, but then, didn't like this one before tuning it in, either.
>>
File: file.png (70 KB, 652x556)
70 KB
70 KB PNG
>>103505226
anyways, weights are supposedly uploading so there's that
>>
>>103505236
RP = Gemma 2 27b Q6
General use = QwQ 32b Q4
>>
>>103505236
Cydonia.
>>
>>103504437
thanks anon
>>
>>103505255
>>103505269
Thanks anons
>>
fell phi4 weights
https://huggingface.co/matteogeniaccio/phi-4/tree/main/phi-4

https://huggingface.co/NyxKrage/Microsoft_Phi-4/tree/main
same hash for both, one's in a subfolder along the ggufs so a tad more annoying to dl
>>
>>103505242
I have a suspicion this might be more of a fine tune thing in general. The process adds a lot of extra noise to the model, so the band of coherent sampling settings tightens considerable. Or that's my theory anyway.
>>
File: 14214212363.png (18 KB, 640x394)
18 KB
18 KB PNG
A fun game: Ask any multimodal model which one of these circles is larger.
>>
>>103505589
That's a plausible theory, though it's strange then how certain RP finetunes perform better at stupidly high temps (1.3-1.5). I'm reasonably sure that L3.3's excellent instruction-following capability somehow mitigates the issues that we normally work around in other models. Might be overmystifying things a little, but I have no better explanation.
>>
File: r7b memebench.png (198 KB, 1600x800)
198 KB
198 KB PNG
>>103499479
I don't have expectations for cohereslop but they just released command-r7b-12-2024
>https://huggingface.co/CohereForAI/c4ai-command-r7b-12-2024
>https://cohere.com/blog/command-r7b
>>
>>103505680
>The model features three layers with sliding window attention (window size 4096) and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.
>>
>>103505689
uh is that good or bad
>>
>>103505034
>sending shivers down your spine right off the bat
AAAAAHHHHHHH SAVE ME YELLOWMAN WITH YOUR UNCENSORED ERP MODELS
>>
>>103505702
swa generally means weird context stuff for ggufs and here it seems it has both swa and normal attention so it might not be supported by lcpp.
>>
>>103505680
Cohere broke my heart once already, I'm not giving them another chance
>>
>>103465159
I think AVM does DSP for voice break-in detection. How else would you even feed the model? LLMs work by feeding their own output in their input, if you just feed audio input to it continuously then you would need a separate channel to feed it's previous output or you would have to mix the audio output in the input, which would come in duplicated if the user is using speakers.
I just don't think AVM works that way. I think it does voice detection using DSP, records the user's message, sends it to the model and then just plays the model's output. And if the user speaks while the model's output is playing, they just stop the output and begin recording the input message. All of this should be doable with TTS and STT without needing omni models.
And now for screen sharing it probably just takes a screenshot before sending the recording.
>>
File: file.png (38 KB, 675x287)
38 KB
38 KB PNG
>>103505689
>>103505719
files similar in size to 8B, is the 7B in model card a typo or is it like 7B active params + weird shit? (sorry for retarded question)
>>
File: 14214234567658.png (19 KB, 592x208)
19 KB
19 KB PNG
Remember Q*?
>>
File: 1729626545064754.png (3.31 MB, 3566x1786)
3.31 MB
3.31 MB PNG
>>103501804
There was a Sakuga dataset but the original got taken down quickly.
https://arxiv.org/abs/2405.07425
https://github.com/KytraScript/SakugaDataset
It would be good to bring it back now that we have HunyuanVideo...
>>
>>103502778
>3.3 instead benefits from low temps and simple system prompts; it knows what it needs to do, the configuration is there to keep it focused.
Interestingly, Gemini is where I first started having to switch to lower temperatures.
>>
>>103504670(me)
Same for new commander please, thank you.
>>
Phi4, or the supposed Phi4, is surprisingly playing along during ERP. It's obviously filtered of course, but it has definitely seen RP data.
>>
>>103506061
That was all part of the gamble that they could get the government to crush newcomers to the field using fear of the unknown.

If they succeeded, they would’ve remained a major company in “AI” for a while.
Time has proven the people at OpenAI who knew what they were talking about are liars, and the rest fanatical morons.

I remember when they first mentioned it, I thought it was a genius move (from the perspective of psychopath obsessed with money) to point at a pathfinding algorithm like this and speak of it in hushed tones to put on a show like the Catholic church, relieving the pressure of others catching up, as well as to detract people from working on new things that could threaten their business (little point working on things if there is some major breakthrough about to upend the field).


As much of a sperg as Elon is, I am glad someone has decided to take them to court over their continued game to defraud the public, especially in such a brazen way.
>>
New Cohere and Phi SOTA models. We're eating good today.
>>
File: 5201F.jpg (122 KB, 1179x1864)
122 KB
122 KB JPG
>>103506061
sam won bigly
>>
File: 1705328279203131.jpg (97 KB, 984x984)
97 KB
97 KB JPG
>>103499479
Does anyone know if there are any backups of gpt-4chan? The repo technically still exists but the site locks downloads of the repo because "something something it's outputs are unethical".


https://huggingface.co/ykilcher/gpt-4chan
>>
>>103506492
Go back >>>/pol/
>>
>>103506492
Is that Petra?
>>
>>103506515
What does this have to do with /pol/?
>>
>>103506524
We never use retarded racist models here.
>>
https://www.ebay.ca/itm/356278933821
Prices are starting to fall...under $4k/socket for a DDR5-6000 compatible upgrade
>>
File: 1730741280930381.gif (173 KB, 755x601)
173 KB
173 KB GIF
>>103506515
>>103506530
>/pol/ and le racists live rent free in schizo-anon's head
>>
>>103501804
>>103506114
Someone reuploaded Sakuga, please backup it
may be important to train hunyuan in the future
https://huggingface.co/datasets/evborjnvioerjnvuowsetngboetgjbeigjaweuofjf/i-love-anime-sakuga
>>
>>103464600
Yeah, you run a VAD model like Silero on another thread. If it detects some sound just stop the tts stream and its playback, save the output. In the background STT the output and truncate the text starting from the word after the last word detected, while the STT process your input. Send the user input and the previous conversation including the truncated AI reply.
It's not that hard to setup, but yeah it'd need a fair bit of work.
>>
I haven't been around for 3 months.
Have we finally reached the spatial awareness and prose complexity of the original gpt4-0314 or are we still st the Hufflepuff phase?
>>
File: miku-angel-devil.jpg (244 KB, 1125x1500)
244 KB
244 KB JPG
>>103499479
>>
>>103506717
>>>/g/aicg
>>
>>103506717
yes
>>
File: file.png (67 KB, 635x296)
67 KB
67 KB PNG
>>103506412
>supposed Phi4
fyi they say on their arxiv paper that they did switch to chatml format
>The model is chat finetuned using the standard chatml format
https://arxiv.org/abs/2412.08905
>>
>>103506692
How can the reuploader put additional clauses on the dataset license without making any meaningful modification to it or not owning any of the material? None of those clauses would hold up.
>>
>>103506736
Logs and model please
>>
>>103506740
Yeah I think that more or less confirms it then. Phi 4 DOA confirmed useless for coom
>>
File: file.png (80 KB, 656x257)
80 KB
80 KB PNG
>>103506782
Also this
>This is later extended to a 16K context length during midtraining. The
architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better
multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use
full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium
so it seems all matches up, chatml, 16k ctx, it's all in the paper at least
>>
>>103506720
I recently had a similar idea with Hunyuan.
>>
>>103499479
>(12/12) QRWKV6-32B-Instruct preview releases, a linear model converted from Qwen2.5-32B-Instruct https://hf.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
I don't know what linear model conversion means, but will it make anime real in my RP sessions?
>>
>>103505236
Llama 3.3 70b IQ2_S

You lose some intelligence from the low quants, but the model is so much superior to anything in the 20b range that it still surpasses the 20b garbage others have recommended to you.
>>
After trying speculative decoding, honestly the speed is not much better in creative writing, but the way that some tokens are slow to generate while some are faster feels physically worse to read in real time, compared to no speculative decoding where the text shows up on screen at a more consistent pace. It also seems to generate a different passage of text, which I'm not sure is a good or bad thing, but it is different. I think I will just leave it off. FWIW though, it does result in much higher speed boosts when trying stuff like coding, so that's cool, but I don't do coding much, and the coding model I do use, being just 32B, is already fast enough for me.
>>
Hey I was looking into getting a 4090 and putting in it a system with a 3090. Has anyone tried that in koboldcpp? I know it has a muti GPU mode and was wondering if it was able to split the work well and increases performance.
>>
Last time I've tried local RP was Gemma 2 27B. So right now the hot shit in that class are Mistral Nemo, Mistral Small and Qwen 2.5 32B?

Can anyone comment on their writing/creativity and how retarded they are? I'm willing to sacrifice prose quality if it doesn't feel lobotomized. Will Smallstral IQ4_XS feel considerably different on these measures than Nemo Q6_K?
>>
>>103507479
None of them are overall better than 27B in intelligence and RP when <8k context. The main thing that makes Gemma bad and not talked about anymore is that it was only trained for 8k. If you are not going over 8k, Gemma is still fine.
>>
File: 74632.png (88 KB, 2315x933)
88 KB
88 KB PNG
align your models
>>
>>103499952
https://www.youtube.com/shorts/jSsJu34W86o
>>
>>103507479
Same boat as you, I tried the EVA model based on Qwen-2.5 32b and thought it was pretty good. Not perfect, but good instincts, uncensored and not horribly retarded or broken, which was a difficult to find combo at the single gpu range.

>>103507293
I've used enough Q2 70b models that I don't really believe this. And L3 was pretty disappointing to begin with.
>>
>>103507550
How the fuck is meta lower transparency than OAI? They literally release papers, code and weights when they train new Llamas? AI safetyism is just redressed woke, you can tell from how stupid the people pushing it are, tools for the intelligence community.
>>
>>103507550
hilarious considering prefilled claude is far more "dangerous" than any other model.
>>
>>103507550
Based, racist scum shall not pass.
>>
>>103507580
big gpu is spreading fud to discredit open source
>>
>>103507578
>I've used enough Q2 70b models that I don't really believe this.
What quants were you using? What models have you tried? The exponential nature of perplexity loss means that there's a much bigger difference between IQ2_XXS and IQ2_S than there is between, IQ6 and IQ4. Even a little bit makes all the difference when it comes to Q2 quants.

IQ2_XXS is trash. while IQ2_S of a good 70b is superior to any 20b.
>>
>>103502977
I remember this one from Meta that was really good at the game Diplomacy.
https://ai.meta.com/research/cicero/diplomacy/
>>
>>103507732
How much vram does IQ2_S use? I think part of my issue is that my single 3090 is also being used as a GPU. That uses just a tiny bit of vram which can hurt when you are trying to squish the model down like this. Even when it fits, I found nvidia drivers could struggle at this level of utilitization and randomly become extremely slow.
It's likely that llama.cpp has improved since then with vram consumption though. It used to be very inefficient with context. Still, decent 30b models exist now so not sure it's worth the cramming.
>>
Guy who was getting hard shut downs when using speculative decoding.
I noticed something weird that in theory should've been a coincidence but I'm not so sure anymore. I did my tests using SillyTavern. That's when I got these crashes. Then I tried Mikupad and... it hasn't crashed yet. I've generated like a dozen times already and it has not crashed. I will keep testing, but, it almost feels to me like for some reason my PC does not like when I use ST + Llama.cpp with speculative decoding enabled. How odd.
>>
>>103499479
Is llama 3.3 better for erp?
>>
>>103508014
Check your memory usage, it does sound like a coincidence since your front end shouldn't be able to crash your machine, maybe ST uses just a tad bit more resources than mikupad
>>
>>103508056
Oh I forgot to mention I did test with more VRAM available. I have 96GB and am testing with a 40GB model, so RAM space was already ruled out. Thus I tried making sure there was plenty of VRAM left, so I only offloaded a few layers, and I still crashed.
>>
>>103507550
>Risk assessment
Translation: Will they censor criticism of communism and trannyism?
>>
>>103508112
Next thing would be power usage. Does the PC crash completely, black screen and reboot?
>>
>>103508150
Yeah I should do that and what >>103500360 said, I was just lazy in looking up how to do that on Linux lol.
It really is just nothing but a hard shut down. I almost thought it was my house was getting a power outage when the first crash happened.
>>
>>103507550
>Current harms
What fucking harms? "Oh no, I may see text from some adhoc series of matrix multiplications that calls me a retard."
>>
>>103508206
Definitely, sounds a lot like a power issue. Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090s
>>
>>103507550
>governance and accountability
LOL
>>
>>103507960
I'm using a 4090, also as my primary display device. I'm able to load IQ2_S at 12k context, with the 4-bit cache and flash attention enabled, all within vram. It uses 23.x gb of vram - a very close shave, given that it never lets me use all 24gb of it.
>>
>>103508234
>Could be cause by a specific power usage patter ST generates, even if that's a bit far fetched but we have seen it before with games. Like the Amazon game that killed 3090s
Damn, didn't know about that. Hopefully I haven't done damage already.
>>
File: file.png (599 KB, 768x768)
599 KB
599 KB PNG
>>
>>103508049
It's certainly better than llama 3.1, comparable or better to Nemotron in quality, but less censored.
>>
post it
>>
omg it's pochi!!!
>>
File: 1723603405239769.png (458 KB, 1056x1056)
458 KB
458 KB PNG
>>103508313
>>
>>103508277
Huh, that's not bad. I thought 4bit cache made it dumber too though so I'm surprised that 4bit+IQ2_S doesn't lobotomize the model to something worse than Qwen 32B.

Is L3 not positivity biased and censored to hell still? I might try it out later but unless someone tried both and can vouch for L3 being better, it might take a while. I'm so sick of downloading all these models
>>
>>103508322
Um bros...
I don't think Takashi-kun is going home today...
>>
Is this shit worth it if I'm vramlet and want to use ai for erp? Heard about featherless too but it's fucking 25$ for 72b max.
https://infermatic.ai/pricing/
>>
>>103508325
I don't think the 4-bit cache has much impact on perplexity. I can confirm with certainty that IQ2_S with a 4-bit cache outperforms IQ2_XXS by leaps and bounds.
>>
>>103508391
For what it's worth, I use OR for models I'm too poor to load, and Infermatic is one of their providers and it always generates shit responses
Just use OpenRouter
>>
>>103508287
Don't think it's that bad, just sounds like a bad connection or maybe a slightly defective PSU.
>>
>>103502300
>I'm using the web API that an ollama server instance provides.
So use the HTTP API that llama.cpp server provides instead if that's what you're whinging about
>esoteric software snowflake
>expects others to care about their autism
>no engineering ability
keep it simple ffs
>>
>>103505242
I can confirm, those settings with EVA are real good. Fuck that's the best LLM mesugaki I've ever seen. Correction is needed!
>>
>>103508277
use EXL2 and you should manage 16k
>>
>>103508567
Thanks for the tip. What bpw do you use?
>>
Which one do I use for explicit nsfw text?
>>
File: file.jpg (18 KB, 480x270)
18 KB
18 KB JPG
Just had a thought, the new Intel ARC 580b is going to be $250 with 12GB of ram. So with 4 of them, you can have 48GB ram, much cheaper than most other GPU options.

Do intel GPU's work with local models?
>>
>>103508808
4 GPUs with 12 GB each is much worse than 2 GPUs with 24 GB each.
In particular, it will be much harder to get good utilization.
>>
>>103508847
RTX '90 cope
>>
>>103508808
not even 16GB is gonna be rough to use effectively even without the whole no cuda thing
>>
>>103508322
If I got "raped" by Mrs Minagawa I wouldn't be waiting for a sex model that isn't coming...
>>
>>103508930
Only the cute kids get raped.
>>
>>103508904
your face is cope
>>
https://huggingface.co/mmnga/c4ai-command-r7b-12-2024-gguf
>The original model is Cohere2ForCausalLM, but it has been converted to CohereForCausalLM.
>>
>>103508277
But IQ2_S is 26 GB so it is literally impossible without offloading?
>>
>>103508440
How does OR compare to run pod?
>>
>>103509061
>As a result, the chat template is slightly unusual, but please prioritize testing.
What's the fucking point? Just wait until llama.cpp adds support for the new architecture.
>>
>>103509108
>Just wait until llama.cpp adds support for the new architecture.
That will be somewhere after next 4 flavours of the week releases. Like jamba.
>>
>>103509144
I've recently came to the conclusion that the slower support for vramlets, is actually a genius safety feature, by stopping poors from using models you massively reduce "current harms" risks
>>
>>103508808
If you are considering 4+ GPU system then use good GPUs.
>>
Fuck everything else, EVA 3.3 is the loli king
HOLY FUCKING SHIT BOYS
These models are getting good.
>>
>>103509079
It prices by token, but it's almost always cheaper unless you're guzzling tokens like they're liquor. You'll need to set your provider to a decent one (DeepInfra is usually pretty good, they have rate limited free ones too)
>>
>>103509204
Call me when there's a 3.3 33b
>>
>>103509248
We can't go that low, it'll become too retarded with current methods.
>>
>>103509252
Well then I'll call you when I become wealthy enough.
>>
File: nope.png (78 KB, 885x469)
78 KB
78 KB PNG
>>103509070
>>
►Recent Highlights from the Previous Thread: >>103487489

--Anon asks about the last thought in Coconut and how it affects token generation:
>103491735
--Anon discusses why models struggle with clothing descriptions:
>103492278 >103492711 >103492921 >103492958 >103496584
--Phi-4 model announced, but Anon is skeptical about its real-world performance:
>103499412 >103500505
--Anon discusses AI model's nuance and context understanding:
>103487773 >103487813 >103487832
--AI model performance on LiveBench:
>103488007 >103488245 >103489403
--QwQ model discussion and alternatives for RP and coding:
>103496514 >103496564 >103496602 >103496903 >103497274 >103499433 >103499700 >103499758 >103500601
--Miku (free space):
>103489781 >103500963

►Recent Highlight Posts from the Previous Thread: >>103487978

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103509204
I've had bad experience with fine-tunes. How much dumber is EVA than the base model?
>>
>>103509298
Not noticeably so for RP purposes, it's just plain better. Normal 3.3 is a very intelligent slop machine, EVA seems like an intelligent coom drainer, the shit it writes is pretty fucking ebin.
>>
>>103509298
The base model is smart enough as it is. You won't notice the loss in IQ during loli sex scenes and who cares if it sometimes confuses clothes?
>>
File: remiku.png (84 KB, 936x403)
84 KB
84 KB PNG
>>103509292
It seems like there's a little bug in the bookmarklet.
>>
>>103509328
Works on my machine
>>
>>103509328
Updated scripts are in the rentry.
>>
>>103509328
Ask AI to fix it.
>>
File: SWA.png (79 KB, 734x948)
79 KB
79 KB PNG
>>103509061
I love swa!
(yes i know it's unsupported, just wanted to see if it was slopped, it's pretty meh, works alright on a 512ctx prompt)
>>
>>103509266
I'll pray for better times for you, anon. Our technology is getting amazing.
>>
>>103509365
Thanks. It fixed the problem.
>>
>>103505255
QwQ is actually really good for RP. It's not good for ERP. I've used QwQ for generic fantasy adventure, and it performs better than almost any other model I've used at retaining logical consistency and understanding the story. It can also write creatively.

If the urge strikes me to ERP, I just unload QwQ and switch to another model, then switch back afterwards to continue the adventure.
>>
>>103509374
>the only thing current models are good for are fixing a script for 4chan thread where people wait for an AI sex model that will never come
This is the 10th circle of hell.
>>
>>103509325
It's not just a matter of the original models becoming dumber with finetunes, but of their entire "world model" becoming skewed toward SEEEX.

https://www.youtube.com/watch?v=IPBnrIAWDeU
>>
>>103508808
>4 of them
That last one might need some additional spend to hook it up to the motherboard.

- A quick look at techpowerup says Asrock Challenger OC is the only real 2 slot card.
- The other models are slightly thicker.

If you're good about selecting your motherboard, you can pick one with 3* x16 slots. (Whether the slots go fast or not will be determined by the cost of the motherboard.)
You'll then need an adapter for your m.2 slot to get that 4th gpu connected up.

If you decide to go for the slightly thicker cards instead, then you'd need a riser cable to get gpu #3 connected up.
>>
Protip: you can get a gpt-sovits ui and/or api endpoint working in google colab free and tunneled out onto the internet with ngrok.
It'll save a couple of gigs of vram vs running it locally.
>>
File: file.png (680 KB, 994x994)
680 KB
680 KB PNG
>>103509642
I was thinking of using the 2 slots I have already, and then just using 2 thunderbolt enclosures for the other 2 (you could even daisy chain them if you only had 1 thunderbolt port)
If i understand right, throughput is only really used when initially loading the model into GPU memory, but otherwise is not really a limiting bottleneck when it comes to running models
>>
>>103499479
>QRWKV6-32B-Instruct
verdict?
>>
>>103509889
Not suitable for sex. I didn't download it btw but I am right.
>>
>>103510074
You can't fuck data, anon, even if there's many gigabytes of it.
>>
>>103508391
There is also arli ai which is $12 with no logs but I haven't tried it.
>>
Phi-4 is surprisingly good for JP translation, and it's only 14B! I mean, it's not out of this world, but it does seem a step up from the other models we have. Microsoft did it this time.
>>
>>103509705
Or you can buy 2 3090s and do it like a white person.
>>
>>103506492
There's literally a torrent file and an archive link in the discussions tab you fucking retard
>>
>>103510291
>>103510291
>>103510291
>>
>>103509204
How do you run it? multi gpu?
>>
File: i guess.jpg (82 KB, 764x610)
82 KB
82 KB JPG
>>103509705
>thunderbolt enclosures
If you have them then use them I guess.
I expect it'll function, though not get the most performance out of your setup.

Your thunderbolt port probably hangs off your motherboard chipset with a whole bunch of other things.
Whether there's a bandwidth issue there that leads to a processing speed / token generation speed issue, I'm not sure.

In terms of costs, afaict,
- direct adapter (everything is soldered together, including the flex or cable between the pcbs) is cheaper than
- oculink or slimsas based adapter (pcb that goes inside your computer + oculink/slimsas cable + pcb that goes outside your computer + psu to power it) is cheaper than
- thunderbolt enclosure

My random bandwidth-related anecdote is that
when I loaded models off my sata drive it took over 90 seconds. (~45GB at 0.5GB/s)
Now that I'm on nvme it's pretty much always under 20 seconds.

>>103510317
nta But multi-gpu would be the simplest way to get decent performance.
>>
>>103499479
Where can I find this migu music player?
Long live /LMG/



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.