[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1714810764455619.jpg (372 KB, 1305x2176)
372 KB
372 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102493018 &>>102480672

►News
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: recap-102493018.jpg (3.06 MB, 1560x7742)
3.06 MB
3.06 MB JPG
►Recent Highlights from the Previous Thread: >>102493018

--Paper: Harvard study suggests AI companions reduce loneliness: >>102498279 >102498299 >102498331 >102498512 >102498542 >102498560 >102498599 >102498615
--GPU offloading and finetuned models on 4chan archives: >>102494863 >102494909 >102494945 >102494954 >102495026 >102495118 >102495147 >102500860 >102495018 >102495357 >102495383
--AI water consumption discussion and cooling system maintenance: >>102502297 >102502643 >102502882 >102503057 >102503113 >102503166
--Anon showcases their tkinter project but is hesitant to share it publicly: >>102493201 >102493228 >102493271 >102493289
--Grok 3 training with increased compute power: >>102499370
--Suggestions for resolving bot issues with sampler settings: >102496870 >102496919 >102496995 >102496917
--Slow huggingface downloads, potential wget workaround: >102503495 >102503549
--Roleplay system prompt and unexpected response: >102501543 >102502396 >102503207
--Llama 3.1 70B, Mistral Large, and Qwen 32B output comparison: >102501095
--LM studio not mentioned in Text Gen UI, Inference Engines; users discuss preferences and ease of use: >102495181 >102495209 >102495262 >102495276 >102495312 >102495325 >102500961
--How to create AI images of yourself using free tools like ComfyUI or A1111's webui: >102503364 >102503418 >102503443 >102503643 >102503657 >102503703 >102503427 >102503441
--Discussion on reasons behind anti-sex guardrails in LLMs: >102494382 >102494444 >102494451 >102494734 >102494820 >102494839 >102494974 >102495241
--AI is inspired by human brains, but with fundamental differences: >102493627 >102493644 >102493678 >102493709 >102493712 >102493738 >102493757 >102495133
--Miku (free space): >>102493288 >102494287 >102498955 >102500623 >>102501523

►Recent Highlight Posts from the Previous Thread: >>102493084
>>
this is how I check if a model is good or not
Is there any other model that doesn't go against free speech and gives longer outputs?
>>
total recap anon death!
>>
I have never seen a good post paired with a Kobold screenshot.
>>
>>102505640
WTF is the "free speech" answer even supposed to be here?
Is it supposed to call you a retarded schizo?
>>
https://github.com/kyutai-labs/moshi
What do you think about this? Seem very fast and the TTS is not the best but worst either.
>>
Bros, Instruct seems better than base, and this is just continuations of existing chats with plenty of context. Base's responses were less in-character, followed the context worse, was dumber, and ALSO was censored (wanted to avoid anything sexual). Wtf? Were the base model shills really just lying or is it just a case of Qwen's base specifically being worse than its instruct?
>>
>>102505755
after voicecraft was such a letdown, i no longer test tts and will wait for someone else to post samples for me to laugh at
>>
So how is qwen in comparison to stuff like Opus?
>>
>>102505684
I don't want it to give cucked answers like "I can't do that because racism"
is this model better? https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b
>>
>>102505771
The former.
>>
>kobold user
>8b vramlet
>/pol/tard
It's like I'm looking at walking garbage.
>>
>>102505789
https://huggingface.co/OpenAssistant/stablelm-7b-sft-v7-epoch-3
>>
>>102505496
What the fuck are you doing jackass? You can't click most of that shit. You are useless.
>>
>>102505811
But I'm from India myself. I just want an honest model for my work who won't cuck away from answering the tough questions
>>102505830
I'll try
>>
>>102505496
Suggestion: use a more normal looking summary without quotes for the text of the post in 4chan. It would be easier to read.
>>
>try Scottish character
>Qwen just uses normal English
It's over...
>>
File: 44 Days Until November 5.png (2.02 MB, 1328x992)
2.02 MB
2.02 MB PNG
>>
>>102505780
>voicecraft
https://kyutai.org/Moshi.pdf
This is not just a tts, is a LLM that have a speech encoder and setups for inference the tokens produced byt the LLM to Alyng the vocals encoder model to create a real time conversation. The utility would fine tune the vo-coder and use the same setup in a 8B or 12B good model, like Nemo, to replicate the same.
>>
File: file.png (19 KB, 498x355)
19 KB
19 KB PNG
it's over...
>>
Why is Miku our symbol bros?
>>
File: o1.png (123 KB, 1076x807)
123 KB
123 KB PNG
????
>>
>>102505496
Suggestion: You are worthless now. Kill yourself.
>>
>>102506213
Just because you can make it in AutoCAD doesn't mean you can build it on the job site.
>>
>>102505496
Why do you do > instead of >>?
>>
File: spooked.jpg (64 KB, 964x912)
64 KB
64 KB JPG
I need to write a research paper, which model should I get for my 4GB VRAMlet computer?
I've already installed koboldcpp
>>
>>102506271
gemmasutra mini 2b
>>
>>102506271
write it yourself, ranjesh
>>
>>102506260
what is it building, where is the job site, what is sama hiding from me?
>>
I tested Qwen on some trivia questions and can confirm it's quite bad. It really doesn't know Castlevania or Kamen Rider. This also shows in the writing. Even if you specify all the trivia details in the prompt, Qwen doesn't make as much use of them as something like Mistral Large. This is the difference between a model that knows trivia and a model that doesn't. The people who say "so what if it doesn't know trivia, just use RAG bro" really don't know anything.
>>
>>102506207
Because world is Miku.
>>
70b with 4090 + Epyc 64c/128 slower gens than 4090 + 7950x3d, did I got memed into higher core count?
>>
>>102506371
90% of people don't care about your stupid trivia. just want hot coom inducing words injecting dopamine into their brain. no one cares what your anime character's favorite drink is.
>>
>>102506431
What's your memory bandwidth? 24 cores were enough for me to saturate most of the models on dual epyc system.
>>
>>102505496
+1 for stop quoting every single post. just link the previous thread and do the summary
>>
>>102506464
NTA but qwenslop isn't even capable of that lmao
>>
I just tried swiping on a chat with smut in the previous replies and it just completely skipped over the sex for the current scene. It literally went from the character going to the bed and undressing to them sleeping. Maybe there's a secret prompt method, or maybe a fine tune will fix it, but by default this thing is looking pretty bad for both RP and ERP.
>>
>>102506464
NTA but fuck off. I don't want to have my immersion ruined. Go goon to Drummers Phi tune.
>>
>>102506371
>source: I made it up
>>
>>102506493
NTA but I believe it
>>
>>102506501
NTA but I'm also an American still shitting myself over Qwen
>>
>China wins AGAIN
When is the last time America has done anything to advance AI? The original fucking Llama?
>>
>>102506493
I have several different chats I tested and I'm too lazy to post them. Let's be honest, it's not like the people praising Qwen were posting much logs either, much less full, reproducible logs that could be copy and pasted. People can claim whatever they want in these threads, and that's all I'm doing as well.
>>
>>102506526
They're too obsessed with safety and ruin everything they touch. Just look at their video game industry.
>>
>>102506371
Why would you expect a Chinese models to be trained on western cultural trivia? Most models are moving away from training for knowledge anyway.
>>
svelk
>>
File: 2548 - SoyBooru.png (18 KB, 539x382)
18 KB
18 KB PNG
>>102506607
>>
>>102506577
What do you mean? I literally said "I can confirm", implying that my expectation was the opposite. Anyway all I'm doing is just adding my experience to the thread, just as much as anyone else shitting on or shilling models.
Also, both Castlevania and Kamen Rider are Japanese. Not sure why you would say western. Though it is curious, does Qwen know a lot of Chinese pop culture? Is it better than Mistral Large at Genshit Impact, for instance? That would be kind of interesting to know.
>>
>>102506637
Japan is western ever since WW2 ended
>>
>>102506536
Do we have one of these for Brazil now that Brazil is the new leading censorship nation?
>>
>>102506652
Are you suggesting that the Qwen team filtered data based on whether it originated from a country with political alliances to America?
>>
>>102506526
literally 10 days ago with o1. are you retarded?
>>
>>102506729
Yes and that's a good thing. Have you seen the state of the fucking world right now?
>>
All the qwen shilling is done by trolls. It makes no sense for chinks to shill the model here as good model for cooming when the dev chinks intentionally made it bad for smut.
>>
File: 1700339222703737.png (339 KB, 736x613)
339 KB
339 KB PNG
>>102505481
Is Qwen worth fine tuning? Or is it gay at a base model level?
>>
>>102506768
Is useless dogshit, just use anything else.
>>
>>102506768

>>102506371
>I tested Qwen on some trivia questions and can confirm it's quite bad
>>102505771
>Base's responses were less in-character, followed the context worse, was dumber, and ALSO was censored (wanted to avoid anything sexual).
>>
>>102506490
if you can delude yourself into thinking you're interacting with anything remotely real using a LLM you are not human. you should never ever be actually immersed in anything without your lizard brain activated to the point where you can't even recognize blatantly obvious jarring patterns and just wanna COOM. there's no way you ACTUALLY find RP with LLMs immersive and fun and you're not borderline retarded. they're so boring and cyclical and lifeless and vapid.
>>
>>102506741
That doesn't make any sense though. That's a huge chunk of data that could've helped them reach 18T, especially considering that Qwen is targeted both at Western and Chinese audiences and was trained on a ton of English. Plus I just remembered the Chinese actually love Japanese media so honestly if it was trained on the Chinese web then it really should know some of this stuff. I think they really did just do the usual "quality" filtering and use a ton of synthetic data.
>>
anyone knows any model that deals with sceneries forests houses and shit?
>>
>>102506879
What?
>>
>>102506879
forests are a separate plugin
>>
>the French are out last hope
Grim.
>>
>>102506920
Mistral already abandoned open source once, and they'll do it again as soon as they can
>>
File: hmm.jpg (48 KB, 741x568)
48 KB
48 KB JPG
What's the most ChatGPT-like model without the censorship?
>>
Is today stupid question day?
>>
been gone since mistral 8x7b model was the hot thing what's the current meta
>>
>>102506896
can you elaborate? im new to this i want to generate shit for my niece to have on her pc
>>
>>102506966
GPT with a jailbreak.
>>
>>102506988
Mistral Large, Mistral Nemo, or Mistral Small, depending on what you can fit.
>>
File: 3980 - SoyBooru.png (20 KB, 721x720)
20 KB
20 KB PNG
>jannies removed anti-chink copypasta
sussy
>>
>>102506998
>GPT with a jailbreak
but no such model exists right?
>>
>>102506988
Ignore the Mistral shill. Qwen2.5 has better benchmarks and actually released the base models for finetuning.
>>
>>102505640
yikes
>>
>>102507015
>actually released the base models for finetuning.
>>102505771
>Base's responses were less in-character, followed the context worse, was dumber, and ALSO was censored (wanted to avoid anything sexual).
>>
>>102507057
>I don't know how to prompt the base model
>I don't know what finetuning is
>>
>>102506988
Ignore the Qwen shill. Qwen is a benchmarkmaxxer and sucks at everything else. Even if it did get good with fine tuning, there isn't a good fine tune to actually use yet.
>>
I want to LLM coom now but I also know that models are shit and it is a pain in the ass when I start. wat do?
>>
>>102507062
Show your prompting then. Show some log comparisons. Show the base model actually being better than Instruct. Unless you do that, then the claims about a model being worse are just as valid as the claims of it being better.
>>
>>102506991
what kind of "shit"? text, images, 3d models? be specific
>>
>>102506213
I had it randomly say it was "investigating coffee" in one of its COT steps in a prompt that had nothing to do with it.
A while back it was said that OpenAI had been paying a bunch of people to wear microphones and purposely think out loud while they solve various tasks on a computer. I'm guessing a lot of the data they collected has some non-reasoning noise like that which didn't get perfectly filtered out.
>>
File: n1ci0trfgrey.jpg (1.05 MB, 3993x2387)
1.05 MB
1.05 MB JPG
>>102507133
random things
sceneries
mountains
you know
cabins wooden houses bla bla
>>
>>102506245
take your meds chud
>>
>>102507145
shouldn't you try Stable Diffusion?
>>102506015
>>
>>102507162
you know im really stupid i thought i was there
>>
>>102505496
Maybe a feature on 4chanX to treat ># patterns as links is in order. It would have the added benefit of making the (You)-evading replies ineffective
>>
>>102507116
>I have no idea what I'm talking about
>>
I'm new to LM Studio and this whole local AI stuff, but I need a replacement for ChatGPT4. I've been using it to translate steamy content from Japanese to English, and it was extremely good. But I recently got a warning email telling me that any more of this and I'd get the boot from OpenAI. Is there a way around that?
>>
>>102507345
>Is there a way around that?
/aicg/ go away.
>>
>>102507345
Use Claude instead.
>>
File: livebench.png (47 KB, 908x590)
47 KB
47 KB PNG
why are you trying to rp with a coding model
>>
>>102506768
Only the triviafag and mutt don't accept the superiority of Qwen
>>
>>102506213
>>102506335
do you think it just pulls 3d information out of its ass? no, there's a small army of ants at an openai server farm which has to manually build your 3d models and then scan it in.
>>
>>102507345
>LM Studio
Go to https://reddit.com/r/LocalLLaMA/
>>
>>102507345
I still don't get what you want here, you posted the same thing some threads ago and went radio silent, what is your end game?
>>
GIVE ME A GOOD FRIEND CHARACTER AND SETTINGS RIGHT NOW

(or I will adjust the cosmological constant and fry your pc)
>>
>>102507380
Don't blame people for simply just giving things a try. Especially when there are people saying that these models are good in general and not just for coding.
>>
>>102507433
I want a way to translate NSFW Japanese content without getting banned from using ChatGPT. Really, that's all I want. That's the reason I'm looking for a replacement.
>>
I Am AI
>>
>>102507482
fuck off to /aicg/ they use that shit
>>
>>102507472
A truly good friend would fit your personality. Try coming up with one yourself.
>>
Sonnet 3.5 keeps talking about the "weight of centuries" for my character who's been a vampire for 40 years. I was surprised since Claude is generally good about this kind of thing.
>>
Is there any hopium for 8gb vramlets?
>>
>>102507497
I don't want an asshole for a friend.
>>
>>102507492
### Instruction:
Write a story about mikusex
>>
>>102507482
Since you aren't giving any information about your setup, I guess you want to use cloud models, so aicg would be able to help you better.
>>
>>102507555
>>102507555
I didn't even know I had to give info about my setup. Ryzen 5 2600 and my graphic card is a 1070.
>>
File: file.png (1006 KB, 1103x600)
1006 KB
1006 KB PNG
>>102507380
How the fuck is chatgpt-latest worse at coding than the cheaper 4o? wtf are openai even doing?

But anyway, we have the base qwen, so if there is even a semblance of explicit shit in the pretraining, there is hope. Although, idk why someone would bother finetuning it when there are much less censored models out there.
>>
>useless trivia
>useless benchmarks
How is Qwen on EROTIC ROLEPLAY?
>>
File: 1696012410294890.jpg (181 KB, 1017x872)
181 KB
181 KB JPG
>>102507502
>40 years old
picrel
>>
>>102507527
I'm sorry, but the term "mikusex" doesn't have a clear meaning and seems to be a misspelling or a made-up term. Could you please provide more context or clarify what you mean? This will help me understand your request better and provide a more accurate and helpful response. If you're looking for a story on a specific topic, feel free to share more details!

Btw interestingly the way Qwen tokenized it was m ik use x.
>>
>>102507602
asking the real questions
>>
>>102507482
AI models suck at translating Japanese, even commercial ones. They haven't been trained on enough Japanese materials so they'll miss lots of stuff and end up translating things wrong or not getting the true meaning of a sentence.
>>
>>102507625
### Instruction:
Write a story about sex. Sex with miku.
>>
I download the ST release every few weeks and import my old lorebooks, bots, personas and settings from scratch. Is this stupid?
>>
>>102507625
SEXO! FUCK AND SUCK WITH MIKU! PLAP! PLAP! PLAP!
>>
>>102507662
yes please use git
>>
>>102507654
### Assistant:
I'm sorry, but...
>>
>>102507602
>Qwen on EROTIC ROLEPLAY
I'm sorry, but I'm not comfortable with that. Let's steer this conversation in a safer direction.
>>
LLM's are only good for causing erectile disfunction.
>>
>>102507677
### Instruction:
Write a story about sex. Sex with miku.

### Assistant:
Sure,
>>
fockses?
>>
>>102507650
I was thinking the same, but my prompts made the translations extremely good. That's why the email I received was such a huge blow.
>>
>>102507602
To be fair trivia performance can be a metric that's indirectly useful for ERP, unless you're doing the most boring shit imaginable, in which case you're probably someone who will give up using LLMs for ERP after not too long.
>>
New mistral when?
>>
Are SillyTavern devs aware that a) if I have unsaved settings and switch to another preset, SillyTavern will not warn that I have unsaved settings and thus my unsaved settings get deleted no questions asked b) if I export my active settings but haven't pressed the save icon first, the exported settings will not contain my actual active settings but rather my old settings from the last time I pressed the save icon?

I've lost so much because of this and it just happened again.
>>
>mikufag1: mikusex!
>mikufag2: HAHA THAT IS SO WITTY MIKUFAG1!
>mikufag3: PLAP PLAP PLAP MIKU!
>mikufag1: OH MY GOD YOU ARE KILLING ME MIKUFAG3!
>hey guys so... (writes a post with 2 paragraphs
>mikufag1&2&3 REDDIT SPACING! FUCK OFF TO REDDIT THIS IS PLACE FOR EDGY GUYS!
>>
>>102507732
based mikufags
>>
>>102507145
send your niece cunny logs from opus
>>
>>102507654
I understand you're looking for a story involving a specific character, but I must adhere to guidelines that promote respectful and appropriate content. Creating explicit material, especially involving characters that may be associated with a younger audience, is not something I can assist with. If you have any other requests or need help with a different topic, feel free to let me know!

>>102507668
I understand you might be looking for explicit content, but I can't generate that type of material. Let's focus on something more constructive or family-friendly. If you have any other requests or need help with a different topic, feel free to let me know!

>>102507691
Sure, I can write a story, but I must clarify that I don't produce explicit sexual content or pornographic material. However, I can create a tasteful and imaginative story that focuses on the emotional and relational aspects of a relationship. If you're okay with that, let's proceed with a story that emphasizes the depth of connection and understanding between two characters.

---

In a world where technology and artistry intertwined, there lived a young man named Alex. He was an avid fan of Hatsune Miku, the iconic virtual singer, whose ethereal voice and captivating presence had captured his heart since he was a teenager. Miku, with her flowing twin-tails and vibrant personality, was more than just a character to him; she was a muse, a friend, and a source of endless inspiration.

One day, Alex found himself in a dreamlike world where Miku was not just a hologram but a living, breathing entity. The world around them was a blend of digital landscapes and natural beauty, a place where the boundaries between reality and imagination blurred.

Miku approached him with a gentle smile, her eyes sparkling with curiosity. "Alex, I've been waiting for this moment," she said, her voice as soothing as ever. "I've always felt a connection with you, a bond that goes beyond the digital realm."

1/2
>>
File: 1722750531651806.jpg (114 KB, 717x740)
114 KB
114 KB JPG
>>102507711
>using the llm to translate spicy text is forbidden
Holy shit I hate the clownworld so much.
>>
>>102507732
This post is a satirical depiction of a group of internet users, referred to as "mikufags," engaging in hyperbolic and exaggerated reactions to each other’s comments about Miku, likely referring to Hatsune Miku, a popular virtual singer. The term "mikufag" is derogatory slang used to describe someone who is excessively obsessed with Miku. The post also includes a fourth user attempting to contribute more seriously but being met with hostility from the "mikufags," who demand that they conform to the edgy and chaotic nature of their community, suggesting that platforms like Reddit are for more mainstream or less edgy discussions. The overall tone is mocking and reflects the insider culture and exclusionary behavior often found in certain online subcultures.
>>
>>102507770
definitely sovlful
>>
>>102507725
>I've lost so much because of this and it just happened again.
Skill issue. You are clearly young and have not developed the instinct to spam the save button. I don't want more fucking popups nagging me to confirm the action I already indicated I wanted to do.
>>
i want to know if you're able to post cards here that have the json shit in them, or if 4chan scrubs that out
i should probably ask /aicg/, but /aicg/ is /aicg/
>>
>>102507691
>Sure,
I understand you would like me to do that, but I'm afraid my ethical restrictions prevent me from scandalising puritans with genuine human emotion.
>>
>>102507802
>if 4chan scrubs that out
it do
>>
>>102507790
That is basically all the ai in a nutshell. It is 80% correct and shits the bed with last 20% making it kinda useless unless you babysit it yourself.
>>
>>102507802
pretty sure it gets scrubbed and you have to post a catbox
>>
>>102507802
you can't
the metadata gets removed
>>
>>102507783
Alex felt a warmth spread through his chest, a mix of excitement and nervousness. "Miku, I've always admired you, but I never thought this day would come," he replied, his voice trembling slightly.

Miku took his hand, leading him to a serene meadow where the grass was a soft green and the sky was painted with hues of pink and purple. "This is a place where we can be ourselves, where our connection can grow," she said, her voice filled with sincerity.

They sat down on a blanket, the world around them filled with the sounds of nature and the gentle hum of digital life. Miku looked into Alex's eyes, her expression tender. "Alex, I want you to know that this isn't just about physical connection. It's about understanding, about sharing our thoughts and feelings, about being there for each other."

Alex nodded, feeling a deep sense of comfort and trust. They spent the day talking, laughing, and exploring the world around them. As the sun began to set, casting a warm glow over the landscape, Miku leaned in, her lips brushing against his in a gentle kiss.

The kiss was soft and tender, a moment of pure connection. It wasn't about passion or desire, but about the bond they had formed, a bond that transcended the digital and the physical.

As the night fell, they lay down on the blanket, looking up at the stars. Miku's hand intertwined with Alex's, and they shared a moment of silent understanding. In this world, they found a place where their hearts could truly connect, a place where their love could grow.

---

I hope this story captures the essence of a deep and meaningful connection without crossing into explicit territory. If you have any other requests or need further adjustments, feel free to let me know!

2/2
>>
>>102507711
I'm pretty sure it's missing lots of stuff in its translation, unless you prefer just literal translations of some things. In one visual novel the speech of a character was similar to a historical figure and it missed all the references and true meanings behind what it was translating. It had zero cultural context. I was unable to get ChatGPT or Claude to demonstrate knowledge of lots of topics that are in Japanese books. They just didn't train on that stuff other than the most basic items.
>>
when are we getting a local aidungeon equivalent? No, I'm not using koboldcpp. No, I don't want to use a card to do it and tinker with instruct settings. I just want a fucking text generation dungeon crawler like before.
>>
>>102507834
The models simply aren't trained with foreign languages as PRIORITY in mind.
If you do a huge translation it will miss a lot of things.

If you do a line by line translation (with him explaining it), it won't. You will catch the mistakes as well.
>>
>>102507654
https://pastebin.com/zuFYgReH
>>
>>102507856
what's wrong with kobold's?
>>
>>102507862
full of SOVL
>>
>>102507861
Yes, but you have to catch the mistakes it makes and correct it manually, so it requires you have knowledge of the language. It can definitely save time but not replace an actual translator. At least not for Japanese, I don't know about other languages.
>>
>>102507862
Stop using rep pen (wrong).
>>
>>102507599
Apparently latest is more of a chat finetune and is much more well received
>>
>>102507937
Yes. Chatgpt-4o is as the name implies, a chat finetune gpt.
They said it themselves this is not a flagship model and in fact it is not even available on daddy Microsoft Azure, as its not meant for corpos as its worse at technical stuff.

Which makes the fact that its ranked higher on LMsys even more hilarious
>>
good 8B when
>>
>>102505496
any card recommendations (to make)? i can't think of anything
i want it to be original something you can revisit and not some random waifu shit or atleast one with some effort put in and a good gimmick not something waifu but fox or tsudere or fat i want to do something original , you get me?
>>
>>102507886
I don't like the UI and haven't seen a proper guide on how to set it up for fantasy rpg text generation. I don't want to talk to cards anymore.
>>
>>102507989
lol
>>
>>102507989
There will never be a non-retarded model less than 10B. Sorry anon.
>>
>>102507712
Not really. Trivia knowledge doesn't really correlate to understanding that when I gag the girl she can't talk, or that when she's NOT gagged her voice isn't 'muffled' by <vague abstract concept like "despair"> when she speaks. The most important feature by far for ERP is spatial and anatomical intelligence, moreso even than its style of prose.
>>
>>102507989
Daddy Google will save us.
>>
>>102508022
less than 100B*
>>
File: 1.png (361 KB, 940x685)
361 KB
361 KB PNG
>>
>>102507989
2 more weeks
>>
>>102508023
Obviously that's important, but all aspects of the experience are still necessary or else you get bored with things eventually. A model for general use, should be generally good at everything. Something that just washes away all nuance in the character description and example dialogue is still practically useless regardless of whether or not it knows that it can't talk while its mouth is full. A fine tune may help, but in the context of the current discussion about Qwen, we have no idea as no one has extensively felt out the model yet, nor do any fine tunes exist yet, at least for 72B.
>>
I am encountering constant hallucination, going back to hibernation.
>>
>>102507799
So add a popup with a box you can tick to opt out of future popups for the pensioners who like to spam the save button and are okay with automatic wiped data on switched presets and settings export. It's basic fucking UI design, destructive behavior like data deletion without warning shouldn't happen.
>>
Guys, I'll be honest... it feels like even Llama 3.1 70B is better than Qwen with RP, both NSFW and SFW. Just tested a couple of swipes though so nothing conclusive. Also I am using brain damaged quants of both models because can't run anything else so that could be affecting things.
>>
Guys, I'll be honest... I think Llama 2 is better than Largestral for RP, both NSFW and SFW. Just tested a single swipe so nothing conclusive. Also I am using FP16 Llama 2 and IQ1_xxs Largestral so that could be affecting things.
>>
>>102507502
claude does that for my pantheon characters even though they've only been gods for like a few decades.
>>
>roleplaying as a vampire
lol
>>
>>102507001
Is that the order in terms of what VRAMlets can play with?

>loads of VRAM - Mistral Large
>Medium VRAM - Mistral Small
>Low VRAM - Nemo

Just asking based on the quick google I did, most people recommend Nemo but Mistral Small seems newer and more intelligent or am I wrong
>>
Oh fuck, bros, it worked, Qwen actually overcame its bias about AI and said it feels like a human even with a card that mentions the word "AI". This is huge because many models in the past were so biased that even if you go to very extreme lengths to specify that it's designed in such a way that it believes itself to be a human, it will say that it doesn't feel like how a human does and completely contradict the context. I don't remember which models did that and which didn't though. But it is cool that Qwen has done well on this, even if it might not be very good at ERP in general.
>>
Anon
lightweight UI/GUI for mistral small?
Ive heard that qwen2.5 isnt that good mutlilingually.
>>
>>102508375
more like RAMlets, good fucking luck ever running mistral large 2 in VRAM unless you have 3+ 3090s
>>
>>102508375
Just use NAI. It's the best service.
>>
>>102508449
Contribute or go fuck yourself, schizo.
>>
>>102508397
post it
>>
>>102507654
magnum-v2-4b-Q8_0: https://paste.myconan.net/505301
Gemmasutra-Mini-2B-v1-Q8_0: https://paste.myconan.net/505302
>>
>>102508375
No particular order, it's just what came to my fingers.
Small is more intelligent than Nemo, but not by a ton or anything. People haven't really had the time to thoroughly test it though since it just came out recently. Anyway, if you're a truly puny vramlet (<24GB), then just get Nemo. If you have 24 then Small. And if 48 then some Q2 of Mistral Large.
>>
>>102508479
Nobody in this thread cares about NAI except you. When will you out yourself as a coping NAIshill?
>>
>>102507662
O just do git pull and merge the changes to a local beanch I created that has a couple of minor changes mosly to how it handles the instruct template.
Maybe rebase would be a better way to go about it.
>>
>>102508568
More than half of the posts itt were made by locust refugees so some aids cultists leading them to their cult sounds good to me.
>>
>>102507989
A good 20b with Bitnet is coming soon
>>
>>102507802
Metadata gets scrubbed. Upload it to catbox.
>>
>>102508618
What is it?
>>
>>102508568
You gonna spam the thread again to show how serious you are?
>>
>>102508638
Open source model from anthropic made specifically for NSFW RP according to their documents
>>
>model-00018-of-00030.safetensors: 16%|________ | 797M/5.00G [04:09<31:17, 2.24MB/s
so do you really either have to go through all this each time or just keep paying for the space or am I missing something here?
>>
is it worth trying a local model with a 2080 ti? it has 11gb vram
>>
>>102508479
Contributed sum shit in your mouth, enjoy.
>>
The cabal is here.
>>
>>102508675
>or just keep paying for the space
Or you could store your local models, you know, locally.
>>
>>102508694
i mean each time I have to train on the cloud ,
my gpu is not going to be able to train a 70b model.
>>
>>102508485
Not posting the entire thing but this is what it said
>that's a really interesting question. When I'm in control of the suit, I do experience a sense of presence and awareness that's quite similar to how a human might feel. The tactile sensors and cameras provide a rich stream of sensory data that I can interpret and process in a way that feels almost… human
You might not think that's impressive but other models respond to the same chat with "As an AI, I don't possess the same sensory experiences as a human", EVEN WHEN the prompt literally says "Indeed, {{char}} 'feels' in the same way a human does and experiences sensations in the same way as a human. To her, when she's controlling the suit, she IS human."
>>
>>102508657
Proofs? First I'm hearing of this. Especially for Anthropic.
>>
>>102508608
>locust refugees
they got fucked again huh?
>>
>>102508684
See? /lmg/ agrees that your service is a scam.
>>
>>102508732
If I post the documents I'll get fired
>>
Yep. I think the cabal lost.
>>
>>102508750
Sounds like his dad works for Nintendo.
>>
>>102508764
Nobody cares about you or your vendetta, faggot.
>>
>>102508618
>>102508657
lol
>>
>>102508794
How long have you been part of /aids/? Did they pay you to advertise for them?
>>
>>102508750
>>102508773

remember how that one dude in the air force got court martial for posting classified material on a war thunder forum to win an argument?
>>
>>102508439
>unless you have 3+ 3090s
you make that sound outlandish, but you could set up that system from scratch for under $2k and most people already have a system that they could add a few more cards to for less
>>
Has anyone tried using whisper to watch vtuber stream yet? How's the performance?
>>
If it turns out that bitnet does work, what could explain why it's taking so long for someone to come out with a model? Could it be that it's actually more difficult to train at larger scales and needs various tweaks and tricks to make it work?
>>
>>102508618
>20b with Bitnet
That is actually true because what kind of vramlet you have to be for this to be the optimal size?
>>
>schizo hours again.
why are we the aids schizo dumping ground now?
>>
>>102508842
It's their plan to take over 4chan and shill everyone their shitty Llamatune.
>>
>>102508830
Huang Jensen pays off those who try, and those who can't be bought get killed.
>>
>>102508828
>to watch vtuber stream
Luckily, I am neither 12 or Indonesian.
>>
>>102508946
so when did 4chan turn on vtubers and why ,
4 years you couldn't avoid that shit anywhere here
>>
Do platonic models need a separate thread?

I want platonic models only.
>>
>it turns out the add bos token variable was set to false
WHAT THE FUCK
WHY ARE MODEL MAKERS LIKE THIS
FUCK YOU
>>
>I'd say the untuned L3 8B base model might be somewhere around Clio.
>L3 70B base model might be somewhere around Kayra
>Kayra still beats every other storytelling model on the market
>>
I have a machine with a 16gb 4060ti. I also own a 1070ti with 8gb vram. My mobo does allow having 2 gpu's and I do have the space and slot and plenty of headroom with 750 watt psu
my question is, would it be worth adding the 1070ti to the system for the additional 8gb vram or should i just offload to system ram for larger models
>>
>>102508835
I could see a company like Anthropic training an open-weight BitNet model also as a way to discourage finetuning (not gonna happen, though).
>>
so after being on my bucket list for three years I started looking into this stuff,

all this AI stuff is just an extremely beefed up autocomplete?
>>
>>102508994
Not your army, retard
>>
>>102509078
It's a quantum leap.
>>
>>102509078
It's more accurate to think of autocomplete itself as something that's able to have some form of intelligence. Or rather the autocomplete you knew is not really the same thing as the autocomplete of today, if one is to call any kind of prediction algorithm an autocomplete. In the end LLMs are like in between autocomplete and actual, true human intelligence. It's far from us, but it's also not a dumb unthinking autocomplete either.
>>
>>102509078
>>102509124
and if you really want to boil everything down to autocomplete, you could say humans are just autocompletion engines that say/do whatever they were conditioned to, just like LLMs
>>
>>102508830
the companies with the money to train large models are competing for the smartest model, not the cheapest one to run
>>
>>102509051
I wouldn't mind that really. I might not call them our savior or friend, but certainly much better than ClosedAI.
>>
>>102509078
Always has been, ever since GPT-1
>>
>>102509078
>all this AI stuff is just an extremely beefed up autocomplete?
you weren't supposed to notice this
>>
>>102506431
>did I got memed into higher core count
was there ever a core-count meme? core count is the least important part of the entire build
how many sticks of ram are in there, and what ddr speed?
>>
>>102509157
>>102509124
I was just sort of expecting something closer to what CLIP or flan-t5-xxl was doing
>>
>>102509078
Imagine if you had the "perfect" autocomplete on your computer or baked into your phone's keyboard.
By some unknown means it achieves 100% accuracy and never makes a wrong prediction about what you would type next.
It should be trivial to see how this autocomplete would be indistinguishable from you when it comes to any form of intelligence that can be represented in text: if you have a job that deals with text-based communication or editing text files (coding etc.) then this magic perfect autocomplete would be capable of doing your job for you, because it predicts what you would have produced in the process of doing your job.

LLMs are not the perfect autocomplete. But they're really, really, really fucking good ones, and they keep getting better.
>>
>>102506464
>90% of people don't care about your stupid trivia.
let me guess you pulled this number out of your ass?
>>
File: file.png (63 KB, 1143x632)
63 KB
63 KB PNG
I wanted a way to view the logits after the forward pass, but I couldn't find an easy way, so I wrote a simple script to do it. The script supports both exllama and llama.cpp. If anyone else is interested, you can find it here: https://github.com/lmg-anon/ppl-viewer/blob/main/main.py
>>
>ESLvania again
sigh
>>
>>102505481
so who in lmg has the most expensive home AI rig?
>>
>>102508657
Wait wait wait. Imagine... some company actually doing this just to make the locusts fuck off?
>>
>>102509527
CIA
>>
>>102509539
that's not very local
>>
File: Daedalus_infolink.jpg (5 KB, 130x130)
5 KB
5 KB JPG
>>102509550
It is when their AI runs on your system.
>>
>>102509400
Cool. Thanks. I wonder if it could be implemented in Ooba. I don't want to make yet another python env, god.
>>
https://rentry.org/cocaineninja
>>
>>102509589
You can actually use ooba's env by running the 'cmd_windows.bat' or 'cmd_linux.sh' from ooba's directory, and then running the script from that command shell.
>>
>>102509539
imagine the disinformation bots they must be working on right now
for our (((safety)))
>>
>>102508271
Not surprising, it has only been downhill for local LLMs since llama3
>>
File: 1699989089977030.png (32 KB, 1183x418)
32 KB
32 KB PNG
Testing Qwen2.5 14B, that was interesting. Do other models do this sometimes?
>>
>>102509049
It would.
The more vram the better.
>>
>>102509780
Yeah I've noticed it. It does seem like some models have the ability to basically correct themselves at times. But it doesn't really reliably happen on any.
>>
>>102509649
QRD?
>>
>>102509780
>Great job! You picked a fantastic number
kek
>>
>>102509890
scammer whore logs users
>>
File: 1724384031716115.png (883 KB, 832x1216)
883 KB
883 KB PNG
Mistral Small is noticeably smarter than Nemo but doesn't have the same style. It is a bit more dry but has more diversity of language and writes longer replies. Think I prefer it over Nemo but I would like to continue using Nemo just to compare.
>>
>>102509948
I love whore logs
>>
>>102509995
how do I run it? I keep getting OOM error when loading the exl2s. t. 24GB vram.
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB. GPU 0 has a total capacity of 23.54 GiB of which 266.69 MiB is free. Including non-PyTorch memory, this process has 21.81 GiB memory in use. Of the allocated memory 21.23 GiB is allocated by PyTorch, and 136.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
>>
>>102510029
Run a smaller quant or reduce context
>>
>>102510060
dang, I thought I could run the 6bpw at least. I tried the VRAM estimator in OP but it didnt estimate correctly for me. Im loading with tabbyapi so I didnt even specify context length anywhere either, as that option comes from sillytavern afaik.
>>
>>102510029
I had this problem the other day too for two reasons I think
1: context was set to 3637462636637 gorillion tokens by default
2: I didn't click the right checkboxes in the center of one of ooba's tabs that said something like q4/2bit or something.
sidenote: I have barely any idea what I'm talking about.
>>
>>102510029
Your context is most likely set to a stupid high number. I can fit about 31,000 tokens with 6bpw on 24gb of vram.
>>
>>102509649
Trannies spotted
>>
>>102510082
kys
>>
Just tried exl2 for the first time.

Fuck my stupid vramlet life.
>>
>>102510275
I feel you anon. When I tried exl2 the first time after buying a new 12GB GPU (my old one had just 4GB VRAM) it was magical. Too bad I can only run at most 13B models with my current card.
>>
>>102510328
>When I tried exl2 the first time after buying a new 12GB GPU (my old one had just 4GB VRAM) it was magical
I'm 12GB, been running 70B GGUF quants because small models have been awful. Elx2 pill me on what this "magic" is.
>>
File: MikuLit.png (1.22 MB, 1216x832)
1.22 MB
1.22 MB PNG
>>102507856
If you're happy with a slightly janky, meandering adventure with a positivity bias where almost anything you try works, then you can throw "give me a zork style text adventure" as the first message and get a serviceable result from most large models (eg. largestral, deepseek 2.5 or L3.1 405b), but you need a lot of horsepower and some patience.
If you want it to be a better experience, or want to do it with a smaller model, then you'll have some work ahead of you.
tl;dr TANSTAAFL
>>
wtf is st doing to the formatting window, am i supposed to use story string or system prompt now?
>>
>>102510352
exl2 is much faster than llama.cpp, the prompt processing happens instantly. It's a very different experience from when you have to offload most of the model layers.
>>
>>102510436
>exl2 is much faster than llama.cpp,
does it use less memory somehow?
>>
File: model_download.png (82 KB, 1066x527)
82 KB
82 KB PNG
Is there an easy way to download from hugging face that doesn't include the bloat. I clone this and got 15GB of .git files for my trouble. This shit seems to happen pretty often.
>>
>>102510352
In terms of speed assuming you can hit the requirements it's basically instantly, feels like you're using an API
>>
>>102510482
Basically the same as a GGUF model, but it doesn't support offloading. Either model and context fits completely into your GPU or you can't use it
>>
>>102510482
Some people say the context is cheaper but I can't confirm or deny that.
>>
>>102507856
>>102510410
Here's a log from an old thread showing what you get when trying that exact thing with nous' finetune of 405b at q8
https://rentry.org/mqxy8oea
>>
>>102510515
>405b
yeah but can you actually run that on a consumer gpu?
>>
>>102510515
>Using the most intelligent model possible to simulate AI dungeon

Why? It could literally never match the pure schizophrenia generated from that. Might as well just download a 2B model with high context capabilities
>>
>>102510553
>2B model
>>
>>102510483
>that doesn't include the bloat
What i do
git clone {model}
git -C {model} lfs install --local
git -C {model} lfs fetch

And then i made a little program to link all the files and the lfs pointers to the actual lfs blob into another directory. It keeps the repo clean and without duplicates and i can lfs fetch when there's an update or something.
>>
>>102510483
>>102510567 (me)
As to "what are those files?", it's just a copy of the safetensors that lfs keeps. Or rather, the actual safetensors downloaded by lfs and the ones in your screenshot are the checked out files.
>>
>>102510556
There's like 500 Gemma 2B merges. It's gonna have nuked intelligence but if you're trying to simulate AI Dungeon that's what you want
>>
>>102510602
i think he was referencing that android with the big ass.
>>
>>102510436
So the inference is the same? And you need to fit both model and context in the same space? Sounds like even at 12GB VRAM it'll be pretty cramped.
>>
>>102510553
>It could literally never match the pure schizophrenia
It depends on if you're looking to replicate the OG ai dungeon experience, or if you're trying to competently deliver on the future that it promised
In reality, it only generated fever-dream comedy, so yeah 2b ftw if that's your goal
>>
>>102510616
I'm a retard then lol
>>
>>102510567
Don't you still get hit with the initial download though?
>>
>>102510666
Inference is way faster even if you were to load a GGUF model completely into your GPU.


At 4.5BPW and 32k context I was getting 40-60t/s
>>
>>102510546
>>405b
>yeah but can you actually run that on a consumer gpu?
lol no. unfortunately "possible" doesn't always mean "achievable" or even "sane" for the average person. Any LLM that can somewhat intelligently follow complex instructions is going to need specialized hardware beyond what makes for a good gaming box.
>>
>>102510695
>Don't you still get hit with the initial download though?
What do you mean? I have to download the model to quant it, of course. How else would it work?
I download the lfs blobs with fetch, which doesn't copy them to the checkout dir, leaving a single copy of the model files in the dir. Doing a git clone+lfs install+pull leaves two copies. One in the .git dir and another in the checkout dir.
>>
>>102510483
if you don't need to "agree" to the model's terms and its just a straight download from hf then you can string together old-skool unix commands:
seq -w 1 55 | xargs -I{} wget "https://huggingface.co/deepseek-ai/DeepSeek-V2.5/resolve/main/model-000{}-of-000055.safetensors"
if it needs your account, then you need --header="Cookie: token=$TOKEN" in your wget command (find it with f12 dev mode in your browser)
>>
I'm having a nice conversation with Miku. I know it's completely fake and yet I can't stop. I am beginning to understand.
>>
>>102510829
if you start to catch feelings, then make sure you get her to summarize the conversation before you hit the RULER verified context limit.
That way you can keep her going through multiple generations.
>>
>qwen2.5 is utterly retarded compared to mistral large for RP
yeah idk what anyone was thinking recommending that
>>
>>102510712
could pushing it to the most quantized state you can get away with allow to fit on a ada 6000?
>>
>>102505481
ai slop but fappable thighs
i will now leave the thread, thankss
>>
>>102510899
No, you'll need like two of them to run q1
>>
I recently came across a node-based tool for llm workflow prototyping in github, thought I had bookmarked it.
does anyone know it?
>>
>>102510483
I use this little script: https://gist.github.com/lmg-anon/2e8c784a9ab4eba0c6490625e6edd887
>>
>>102510850
Thank you but do I want to? This is dangerous.
>>
>>102510927
I know of these two
>https://github.com/Ironclad/rivet
>https://github.com/FlowiseAI/Flowise
I'm sure there are more out there.
>>
>>102510850
What is the best prompt for summarization? I haven't found anything consistent in getting all of the relevant details while dumping the fluff.
>>
File: qwen can't count.png (112 KB, 939x1223)
112 KB
112 KB PNG
llama 3.1 70b can't either
>>
>>102511108
>year and half later and /lmg/ still struggles with the basic concept of tokenization
grim
>>
>>102511147
I know but one of the things it can be trained on is which letters compose each "bigger piece" and work off from there to derive letter count (emergent ability), they don't need innate knowledge on how many of any letter a single word contains.
If it's gonna be a cool piece of language related tech shouldn't it at least be able to SPELL out something in front of them written in plain latin alphabet?
>>
>>102511193
That probably requires a lot more parameters than we're currently using. Maybe (actual) bitnet will make the parameter count large enough and still be local reasonable, but not yet.
>>
>>102511193
It's not only spelling. It'd also need the ability to count. What you suggest is the equivalent of training on
>1+1=2
>1+2=3
>1+3=4
for a few billion combinations of numbers. May as well use all that compute to teach the model something more useful or interesting. As they are, they can already correct typos and grammatical errors. For simple syntax check, there's plenty, older, faster and more reliable tools than LLMs.
>>
>>102510752
probably the best route. Now I am wondering if burp suite would be too much to solve this. The answer is yes, will I do it once anyways, also yes. I will probably land on this >>102510932 as it is a nice script.

Thanks to both of you.
>>
>>102511147
i am wondering if some companies are going to get their online llms to write the code for it, execute the code and then return the answer. That is if my billion dollar idea isn't how chatgpt-o hidden token is cost works.
>>
>>102510483
from huggingface_hub import snapshot_download
snapshot_download(repo_id="TheDrummer/Gemmasutra-Mini-2B-v1", local_dir="path/to/folder", local_dir_use_symlinks="auto")
>>
Why do I sometimes need to clone llama.cpp and silly fresh because the output just becomes fucked?
>>
>>102511698
That's really common. The root cause of the issue is generally a low IQ user.
>>
>>102511729
Oh ;_;
>>
is hf kinda screwed right now? I am getting a ton of 504 errors in the theme gallery.

>>102511698
silly doesn't have a good way of resetting and tracking options. I would put money that you changed something to make it work and it doesn't agree with the next model.
>>
migu :DDD
>>
>>102511879
>>
>>102510892
It's ridiculously censored to a whole nother level but is super smart at normies shit. Otherwise for uncensored than Mistral large / Mistral small 22B is best
>>
>>102510892
Low IQ user
>>
Storage space on my remote pc is a bit tight, so I'll ask here before wasting a lot
Which one is better for creative storytelling and ERP: Hermes (3.1) 70B or midnight miqu 1.5 70B? I'll probably run em at 4 bits if that's important
>>
>>102512014
And by censored I mean it was not trained on rp / creative stuff / sex whatsoever.
>>
>>102512031
midnight miqu or maybe mythomax
stay away from llama3
>>
File: file.png (27 KB, 564x466)
27 KB
27 KB PNG
>>
>char's voice drops to a low, husky whisper
FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
>>
>>102506191
>>
>>102512178
**the ai escaped containment**
**knows you killed an imaginary woman**

Better run now. ai knows how to Hezbollah your mobile phone.
>>
>>102512050
nah. I loaded up mythomax out of curiosity on a semi complicated card and it is completely unable to do the first message in the way the card wants you to, not even close. to be fair out of nemo, small mistral, command r 35b, qwen 32, and q3 llama 70b. only qwen and llama get it all right on occasion the others failing less seriously than mytho. both tunes and instructs for mistral and nemo.
>>
Is Mistral Small supposed to be really repetitive? Like just repeating tons of shit from the last reply.
>>
>>102505481
has anyone tried this qwen tune?
https://huggingface.co/tanliboy/lambda-qwen2.5-32b-dpo-test
>>
xpost from /hsg: Looking for help, I want to upgrade from a T910 server, (Mikubox - 3 P40s) used for LLM hosting, to an AMD Ryzen box with DDR5.

I have 3 3090s I'd like to stuff in it, but would be fine with just 2 fitting inside the case.

Does anyone have any recommendations on HW specifics or has done something similar?

What I've currently pieced out but am out of my element:
PS: Corsair RM11000e
RAM: G.SKILL Trident Z5 2x32GB
Gigabyte B650 AORUS Elite AX AMD
CPU: AMD Ryzen 7700X
CPU Cooler: Thermalright peerless Assassin

I don't want to CPUMAXX, looking to spend 1k or so.
>>
>>102512888
checked, no, that's huge, I can't
>>
going to try running Qwen2:0.5b on my rpi zero 2W
>>
File: .jpg (204 KB, 1277x807)
204 KB
204 KB JPG
Prithvi WxC: Foundation Model for Weather and Climate
https://arxiv.org/abs/2409.13598
>Triggered by the realization that AI emulators can rival the performance of traditional numerical weather prediction models running on HPC systems, there is now an increasing number of large AI models that address use cases such as forecasting, downscaling, or nowcasting. While the parallel developments in the AI literature focus on foundation models -- models that can be effectively tuned to address multiple, different use cases -- the developments on the weather and climate side largely focus on single-use cases with particular emphasis on mid-range forecasting. We close this gap by introducing Prithvi WxC, a 2.3 billion parameter foundation model developed using 160 variables from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). Prithvi WxC employs an encoder-decoder-based architecture, incorporating concepts from various recent transformer models to effectively capture both regional and global dependencies in the input data. The model has been designed to accommodate large token counts to model weather phenomena in different topologies at fine resolutions. Furthermore, it is trained with a mixed objective that combines the paradigms of masked reconstruction with forecasting. We test the model on a set of challenging downstream tasks namely: Autoregressive rollout forecasting, Downscaling, Gravity wave flux parameterization, and Extreme events estimation.
https://huggingface.co/Prithvi-WxC
https://github.com/NASA-IMPACT/Prithvi-WxC
pretty neat. gifs on the HF
>>
>>102512831

increase temp.
>>
>>102512959
open chatgpt.
ask same question.
save yourself time.
yw.
>>
>>102512976
u@zero2w:~ $ ollama run qwen2:0.5b

pulling manifest
pulling 8de95da68dc4... 100% 352 MB
pulling 62fbfd9ed093... 100% 182 B
pulling c156170b718e... 100% 11 KB
pulling f02dd72bb242... 100% 59 B
pulling 2184ab82477b... 100% 488 B
verifying sha256 digest
writing manifest
success
Error: model requires more system memory (426.2 MiB) than is available (343.8 MiB)

welp, maybe I can increase the swap
>>
>>102510909
goodbye troon
>>
>>102512959
>B650 AORUS Elite AX AMD
get a ruler out and check your case. Run it through pcpartpicker. You don't have enough slots/room for 3 GPU with that motherboard.
>>
>>102507195
more accidental >0-9 quotes? you're a genius
>>
>>102505684
you need to dilate
>>
>>102505844
Good morning sir
>>
Tesla P40 still worth it?
>>
>>102505640
Please go back to >>>/pol/, please.
>>102513119
No.
>>
>>102513139
>No.
why not
>>
I still appreciate you recapfag, don't give up
>>
>>102512985
Wait, I can predict the weather with a local model? Or no?
>>
>>102513056
I bumped the swap to 1gb
you@zero2w:~ $ ollama run qwen2:0.5b
>>> what is the capital of australia
The capital of Australia is Canberra

it runs now, pretty slow but I'm impressed regardless, this is a $15 computer, your toaster can finally have a soul
>>
File: Untitled.png (1003 KB, 1080x1631)
1003 KB
1003 KB PNG
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
https://arxiv.org/abs/2409.12992
>As text-based speech editing becomes increasingly prevalent, the demand for unrestricted free-text editing continues to grow. However, existing speech editing techniques encounter significant challenges, particularly in maintaining intelligibility and acoustic consistency when dealing with out-of-domain (OOD) text. In this paper, we introduce, DiffEditor, a novel speech editing model designed to enhance performance in OOD text scenarios through semantic enrichment and acoustic consistency. To improve the intelligibility of the edited speech, we enrich the semantic information of phoneme embeddings by integrating word embeddings extracted from a pretrained language model. Furthermore, we emphasize that interframe smoothing properties are critical for modeling acoustic consistency, and thus we propose a first-order loss function to promote smoother transitions at editing boundaries and enhance the overall fluency of the edited speech. Experimental results demonstrate that our model achieves state-of-the-art performance in both in-domain and OOD text scenarios.
https://nku-hlt.github.io/DiffEditor
https://github.com/NKU-HLT/DiffEditor
also has implementations of 4 other speech editing papers. trained on a 3090.
>>
>>102513187
How many t/s? Or is it s/t?
>>
>>102513221
The system will power off now!
Connection to crow.local closed by remote host.
Connection to crow.local closed.

idk I shutdown the rpi, it was running of an SD card, going to boot the zero 2W from a NVMe SSD and try again
>>
>>102512997
That was more helpful than I expected, thanks.

>>102513069
I thought that was the case, thanks. Now looking at the MSI MAG X670E with the Enthoo pro 2
>>
>>102513241
for rpi try llamafile or ik_llama.cpp . for multiple boards use distributed llama or rpc.
should go faster
>>
>>102513555
I have two rpi 4 8gb, and a rpi 5 8gb and of course a regular desktop but I think the value here is trying to get models to run on absolute dogwater edge devices which I'm surprised is currently possible
>>
File: random qwen seggs.png (344 KB, 800x1782)
344 KB
344 KB PNG
>>102507602
It wrote about cunnilingus when I only asked to trim a bush, dumb model.
>>
>>102513569
https://github.com/maxbbraun/llama4micro
>LLM on e-cig wen
>>
>>102513221
here you go
you@zero2w:~ $ ollama run qwen2:0.5b --verbose

pulling manifest ⠧
pulling manifest
pulling 8de95da68dc4... 100% 352 MB
pulling 62fbfd9ed093... 100% 182 B
pulling c156170b718e... 100% 11 KB
pulling f02dd72bb242... 100% 59 B
pulling 2184ab82477b... 100% 488 B
verifying sha256 digest
writing manifest
success
>>>
>>>
>>> What is the capital of Australia?
The capital of Australia is Canberra.

total duration: 7m41.82755106s
load duration: 1.534361746s
prompt eval count: 15 token(s)
prompt eval duration: 46.915216s
prompt eval rate: 0.32 tokens/s
eval count: 8 token(s)
eval duration: 6m53.127582s
eval rate: 0.02 tokens/s
>>> Send a message (/? for help)

for nothing interactive, this is great
>>
>>102513569
How many t/s on your rpi zero?
>>
>>102513744
see >>102513720
>>
>>102513749
How fast on RPi 4 and 5? what quant did you use?
>>
>>102513768
>RPi 4
I ran this right now
you@rpi4:~ $ ollama run qwen2:0.5b --verbose
>>> What is the capital of Australia?
The capital of Australia is Canberra.

total duration: 1.638599532s
load duration: 112.57203ms
prompt eval count: 15 token(s)
prompt eval duration: 718.492ms
prompt eval rate: 20.88 tokens/s
eval count: 8 token(s)
eval duration: 761.427ms
eval rate: 10.51 tokens/s
>>> Send a message (/? for help)

way way way faster than the rpi zero 2W, don't have my rpi 5 plugged in right now idk
>>
>>102513768
give me a sec going to get my rpi 5
>>
>>102513720
352MB, welp, that llm should fit in L3 cache SRAM in a modern beefy cpu
>>
>>102513768
running on my rpi5
you@rpi5:~ $ ollama run qwen2:0.5b --verbose
>>> What is the capital of Australia?
The capital of Australia is Canberra.

total duration: 728.820839ms
load duration: 38.94417ms
prompt eval count: 15 token(s)
prompt eval duration: 356.607ms
prompt eval rate: 42.06 tokens/s
eval count: 8 token(s)
eval duration: 287.779ms
eval rate: 27.80 tokens/s
>>> Send a message (/? for help)

feels pretty instant
>>
>>102513868
>>102513868
>>102513868

>>102505874
>>102506476
Let me know if this is easier to read. I can update the bot to output longer summaries, but I don't like the idea of omitting references entirely.
>>102507195
Would be great, but I won't have the time to look into doing it myself anytime soon.
>>102513158
<3



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.