[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101818797 & >>101808005

►News
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>almost 2 weeks since anything happened
summer dry spell...
>>
>almost 3 minutes since last /lmg/ post
/lmg/ dry spell...
>>
File: coomermiku.jpg (38 KB, 960x547)
38 KB
38 KB JPG
Alright, bear with me.

Think about all the times you uncontrollably coomed during your dirty LLM fetish sessions. Pretty great right? Now just imagine your LLM is multimodal. It can recognize and generate pictures (video in the future) and audio.

This is going to change the entire game. We are going to look back fondly at the times when we used to coom our brains out to text generation only. The future can't come soon enough.
>>
File: GSiD14RasAAmqMF.jpg (389 KB, 1536x2048)
389 KB
389 KB JPG
>>101820049
>>
>>101829211
my box is too slow for anything image related
>>
>>101829251
Brother, with the rate of improvement I would wager eventually something good will run on your shitbox. Stay strong.
>>
>>101829211
>Multimeme
>>
File: GTY5zp2agAAUMEy.jpg (105 KB, 1200x1188)
105 KB
105 KB JPG
Is Stheno still best for 8 gig vramlets??
>>
>>101829211
I want to believe. Believe in future LLMs, and what lies ahead on this shared journey.
>>
>>101829211
I think bigger improvement will come from writing explicit code to keep track of characters, locations, history, etc.
Just one-on-one conversations are very limiting I think, the RP becomes a lot more immersive for me when I can move through a world and interact with it.
>>
>>101829211
>Think about all the times you uncontrollably coomed during your dirty LLM fetish sessions.
This never happened tbqh.
>>
File: strawberry-sam_altman.png (28 KB, 800x800)
28 KB
28 KB PNG
This is a vector image of Sam Altman inventing Q* generated by multimodal strawberry.
>>
>>101829430
I think larger context sizes will naturally come alongside newer models. 100k+ plus is pretty good, but we can definitely do better.
>>
What are the good context, instructions and settings for the Nemo?
>>
>>101829547
localbros, it's so over for us. how will we cope now?
>>
So will you faggots waste much time doomposting about local or can we move on and actually discuss it and recommend models, etc? You do know that roleplaying with another person is infinitely better than whatever even chatgpt can do, right? And that the only reason you are not doing that is because you are a pathetic faggot too socially inept for even that embarrassing hobby, yes? You could have a million dollar super computer and still get worse rp than you'd get from a competent random on Flist.
Stop whining and contribute already.
>>
File: Foldoc.gif (10 KB, 300x98)
10 KB
10 KB GIF
>>
>>101829657
>You do know that roleplaying with another person is infinitely better
It's really depends on the person. And my rps are not interesting enough for anyone remotely good to stick around.
Also, you can't swipe and edit responses of your human partner as you like and expect him to follow your lead with no complaints.
>>
>>101829657
>>101829740
The downside to RPing with a human is that the human on the other end expects you to put in effort.
I want to by lazy and type ahh ahh Mistress and still get a high-effort response back.
>>
>buy 2 modded 2080 ti 22gb
>be happy with 44gb vram
>>
Muller Froop Strawberry Yogurt:

This is a strawberry-flavored yogurt product available in a 150-gram container.
It contains 3% fat and is priced at ₪5.25.

Muller Froop Strawberry Whipped Yogurt:

This is a whipped version of the strawberry yogurt, available in a 97-gram container.
It is priced at ₪4.83.
>>
>>101829657
>You do know that roleplaying with another person is infinitely better than whatever even chatgpt can do
You have never RPd with a real human lol
>>
>>101829557
I don't think there are any, it always gets shitty.
>>
File: file.png (7 KB, 268x183)
7 KB
7 KB PNG
>>101829987
You have never rped with someone good.
>>
>>101830195
Train a LoRA. Enlighten the world.
>>
>>101830195
gimememgmgmgmeeimeeee give me
>>
>whats the newst pr stance
>>
>>101829941
>44gb vramlet
>and stuck on turing
how can you be happy
>>
rip lmg
>>
>>101830195
Most public channels on F-List barely have any roleplay ongoing, it's mostly just people chilling out a large portion of the time. You'd also have to clean up the smilies in a way that doesn't mangle the chat. I don't think using that data will be so simple, unless you're 100% sure it's actual roleplay.
>>
TWF chat gpt does a better job at creating AI bot characters for silly tavern than the cards that are made by people. SAD! It seems like the hobby is only for autists and mentally ill women.
>>
>>101830388
the problem is idiots that promote pseudocode and hypertags when none of that bullshit actually does anything
char is x, char has y, char wears z
that's all you need
>>
>>101830433
Interesting chars need a bit more than appearance description, but yeah, some of the bots are bloated with meaningless garbage.
>>
>>101830204
I've asked how before and no one replied so that ship sailed.

>>101830378
It's my logs, anon. 95% are 1 on 1 roleplay logs
>>
>>
>>101830490
gimegimemeigmgeimegimegime give me the data and I can train it
>>
>>101830561
How many megs of data do you need to do anything worth it?
>>
>>101830490
1+ GB of text of just you and another participant? Let me doubt that very much.

Even if it was, then it wouldn't be very good. The trained model would get overfitted on mostly your writing style.
>>
>>101830634
A few megabytes is more than plenty for a style LoRA (and you can often get away with much less than that), you don't need hundreds of megabytes of text or even gigabytes.
>>
>>101830388
llms excel at sd prompting too.
i suspect most cards are made by 12yo teenagers. at least that what it seems like if you peak inside.

playing around and made myself a character creator with gemma2 9b.
give the minimum descripition of what kind of char i want and get a card with hidden suprises to make it interesting + sd pic.

pic related is
>loli imouto who calls me onii-chan
>>
>>101830634
I would honestly love to have good quality RP logs too. There's literally nothing out there beyond what's already in most training sets.

10-20 MB range is good probably.
>>
What's the story with that guy who created a front end for his LLM and then started getting weird aggressive calling and spamming to his house and work until he took it off?
also is there any reason to do all that and compete with chatgpt4o?
>>
>>101830634
Give me 100MBs and I will finetune some 70B model
pleasepleasepleasepleaseplease, I've been looking for human data for ages and the best I've got was AO3 stories
>>
>>101830806
btw, how long are they? Are they all furry RP? That's what F-List is known for isnt it
>>
File: image.jpg (185 KB, 1024x768)
185 KB
185 KB JPG
>>101830736
>>
File: Untitled-1.jpg (2.58 MB, 2397x1240)
2.58 MB
2.58 MB JPG
>>101830695
>>101830743
>>101830806
I might do that. The biggest logs are a couple megs each but normally a full scene is like 350 KB or something? I have 2700+ logs like that, some sections of a long term roleplay. I also have like 50 megs of no image, discord logs with a single partner and a great writer, might do those instead.

The problem is that there is some OOC chat here and there and I need to clean that up. I also wouldn't like names to be included for obvious reasons. What can I do about that?
>>
>>101830817
>That's what F-List is known for isnt it
lmao
>>
>>101830962
>350KB per session
that's like 87k tokens per session
I NEED YORU LOGS TO TRAIN AN RP MODEL THAT DOESNT SHIT ITSELF AFTER 32K TOKENS
>>
>>101830977
I have no idea about F-List besides the fact that it's used for RP, I've only seen it on e621 ads
>>
>>101830983
So give me a guide on how to clean them or something
>>
>>101830806
>>101830983
move out the way, bitch, i want this more than you do

>>101830962
gimeiemiedmigmmigimemigime give me
but also if you don't want to include the names, the obvious thing to do is to perform a regex search and replace them with dummy names
>>
>>101831103
no you don't
you are probably a poorfagged retarded nigger, I have enoguh money to run a 4xH100 instance for as long as I want. And if I get those logs and running it on Mistral Large, Nemo, Miqu and L3.1
>>101831100
Okay, what kind of personal information you have there, just usernames?
>>
>>101831100
Strip the html and all the page stuff and those things will be less than half the size. That's the first thing you want to do. You'll have to browse around for tools, most will miss one bit or another and probably start eating the actual conversation. Then you can say you have a proto dataset.
>>
>>101830983
>>101831103
Ok, look, how do I format it? One .txt file per complete roleplay scene, using the

User A: Post
User B Post

format, names replaced by fake names? Does that work? Should I use random names for every play so the AI doesn't get overfitted on "Anon" or whatever?

>>101831131
Logs usually start with some planning of what we'll do in actual play so that needs to be trimmed. There are also some small OOC comments here and there, everything from the usual (brb) kind of thing to dirty talk and god knows what else.
>>
>>101830962
You could be sitting on gold, Anon. Or it could be coal. Either way, please deliver because you have a chance to make something fun happen.
>>
where is recap anon? I would like to know if I've missed anything important from the last thread
>>
>>101831131
>you are probably a poorfagged retarded nigger
we can go band for band
or fuck that, we can go machine for machine
>>
>>101831167
Fuck you. I've been asking you retards about this for months and you are always busy being shitposting cunts or holier than thou computer touchers.
>>
>>101831166
>format, names replaced by fake names? Does that work? Should I use random names for every play so the AI doesn't get overfitted on "Anon" or whatever?
Yes, yes, and yes, that's all totally fine.

>>101831188
I remember some anon claiming they had uploaded a bunch of f-list logs somewhere but I went back and tried digging for it and could never find it. If that was you, gommenasai I tried my best...
>>
>>101831166
>Logs usually start with some planning of what we'll do in actual play so that needs to be trimmed
not really, that helps the model understand what's coming and consolidate long-context capabilities, it's done usually with story telling models
>There are also some small OOC comments here and there
These are good to have too, it's nice to have a model that understands OOC instructions.
>format, names replaced by fake names? Does that work? Should I use random names for every play so the AI doesn't get overfitted on "Anon"
Having "anon" in the should be fine
>One .txt file per complete roleplay scene, using the User A: Post User B Post
The "USER A, USER B" formar doesnt really matter because the model will get trained on its own custom format and these names will be removed.
>One .txt file per complete roleplay scene
No, you want everything in a single txt file and the scenes separated by some sort of indicator you will set during training.

If you give me the .html files without personal info I can clean the rest up myself, should be easy
>>
>>101831172
Oh well you have more money than me, I guess Im eating my own words up
>>101831188
I only check this place once a week
>>
Look at them fighting over scraps.
>>
>>101831207
>>101831220
Alright. Editing the htmls would be a pain but I'll prepare some text files in that sort of format. I'll try to prepare at least a hundred megs of the best stuff or something. I'll post them here when I'm done.
>>
>>101831283
Btw, what kind of RPs are they? Is it all NSFW? Furry?
>>
>>101831283
Oh but it won't be ready for at least a couple days, mind you.
>>
>>101831291
I wonder, what would be the point of non-nsfw roleplay when you can play tabletop rpg?
>>
>>101831283
Don’t bother cleaning. That can be done by people more experienced than you. Make sure it doesn’t have any PII tho probably.
>>
>>101831291
90% nsfw with plot setup. Not furry. Some vanilla, some petplay, lots of titfucking, some breeding, some straight shota, mostly modern fantasy.
>>
>>101831312
Beacuse I have no one to play tabletop rpg
And I like SFW RPing romantic situations
>>
>>101831283
Okay, just saying, if you don't know this already, you can use a python library like BeautifulSoup or something to process the HTML files. Now since it's HTML I'm assuming each individual message is going to be broken up into individual divs or something, but that should be easy to work through. If you're a script-kiddie, just get Claude 3.5 to help you with it.

Now, when it comes to removing the PII, that'll have to be done manually. The author names themselves could probably be replaced fairly easily but I imagine the actual messages probably have people using shorthand/nicknames like Anonie, or something. That would have to be handled manually. You COULD try using a NLP library like spacy to look for those and replace them programmatically, but I'm not sure how well that'd work.
>>
>>101831283
No offense but if you have no experience cleaning it will take you a lot and you will make mistakes, just remove personal info and upload it raw so someone good at it can do it
>>
>>101831344 (me)
Agree with >>101831345, just do the bare minimum
>>
>>101831344
>>101831345
What exactly is there to break?
>>
>>101830195
>1.22GB of logs
>it is actually high quality!
>it is all about pissing... and ponies
wat do?
>>
>>101831390
You can end up truncating out parts of the content by mistake.
You can end up getting a messed up file that is unusable due to e.g. everythingbeinginonefuckingwordorasinglelinewithoutspacing.
You can end up losing your patience due to it taking too long and never getting it done, which would be a pity.2mrsj
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101818797

--SillyTavern vs Kobold Lite UI, with criticism of Koboldcpp's code quality: >>101821562 >>101821586 >>101821624 >>101821667 >>101821681 >>101821725 >>101822529
--Running large models on 48GB VRAM and context size limitations: >>101824206 >>101824331 >>101824357 >>101824371 >>101824406 >>101824470 >>101824495
--Nemo and Gemma models recommended for balanced SFW and NSFW writing: >>101820488 >>101820648 >>101820700 >>101820568
--Koboldcpp's changes and differences from llama.cpp discussed: >>101821744 >>101821758 >>101821782 >>101821807 >>101821829 >>101821839 >>101822041 >>101822132
--Intel GPU support for LLMs and llama.cpp limitations: >>101826685 >>101826827 >>101826741 >>101826927
--Fix for llama.cpp gpu split issue on 7900xtx cards: >>101819663
--Anon discusses creating a new GPT-4chan model using a better starting model and dataset: >>101818858 >>101818921 >>101818935 >>101818986 >>101819340 >>101825398 >>101818934 >>101822025
--A4000 and AI model discussions, including Nemo and Gemma: >>101820383 >>101820547 >>101820600 >>101820648 >>101820700
--Using LLMs for immersive roleplaying experiences: >>101821189 >>101821346 >>101821637 >>101821698
--Mistral Large has overconfidence issues with certain tokens: >>101822044 >>101822211 >>101822230 >>101822283
--Flux pro simple limitations and image generation biases: >>101822985 >>101823146 >>101823282 >>101823304 >>101823342 >>101824680 >>101823244 >>101823290 >>101823386
--Improving llama.cpp build times with parallel jobs option: >>101824252 >>101824271 >>101824287 >>101824294 >>101824513
--Workaround for broken ROCm Radeon in llama.cpp: >>101819789 >>101820049 >>101820314
--Testing llama 3.1 reading comprehension on a paper: >>101825811 >>101825890 >>101825943
--ParlerTTS anime girl voice example: >>101826624
--Miku (free space): >>101820740 >>101820873 >>101821316 >>101826122 >>101826646

►Recent Highlight Posts from the Previous Thread: >>101818799
>>
>>101831434
Oh. I'm just going to copy paste all the text that makes up a full scene into notepad++, run some find and replace and skim for OOC comments. But we'll see I guess.
>>
File: GEM ALARM.gif (352 KB, 454x498)
352 KB
352 KB GIF
>>101829547
GEMERALD
>>
>>101831447
Thank you Recap Anon
>>
>>101831390
Pretty much this
>>101831434, the last one being the one that worries me the most
>>
>>101829547
add * next to Q and you are done
>>
>LLM's are back to gemmy days
>Imagegen is back to gemmy days
okay now where's local audio? I dont see any XTTS2-3 updates.
>>
>>101831522
what's gemmy?
>>
>>101831531
gemmy dese nuts
>>
File: file.png (2.6 MB, 1024x1024)
2.6 MB
2.6 MB PNG
>>
It is a world that is not made for it. It is a world of the future, and yet it is a world of the past.
It is the world of the past and yet it is the world of the present.
>>
In Western cultures, children are often taught to say "thank you" reflexively to instill politeness.
>>
>>101831447
Are you gay
>>
any worthwhile nemo finetunes?
>>
>>101831805
no
>>
Spieluhr
>>
File: file.png (58 KB, 1215x268)
58 KB
58 KB PNG
>>101829211
That would be me, unironically.
>>
>>101829211
I still can't believe we can't quant chameleon and its finetunes. There's already two verisons of it that have image generation unlocked that I know of. I need it.
>>
what's the smallest nemo quant that is still good enough?
>>
>>101829052
Reminder that WizardLM-2-8x22B is still the best combination of speed and quality, and Microsoft is pure evil for shutting it down
>>
>>101832015
>how fast is a fast computer?
What 'good enough' means depends on your tolerance. Start with Q5_K. Move up if you can afford it, but down if you can't.
>>
>>101832020
>shutting it down
Why did they do it again?
>>
Is llama 3.1 8B good enough for understanding the code? I want to make llm scan a repo and anwser the ins and outs of it. I'm also building a pc for this. Is 4060 ti 16gb suitable for this task? What else do I need to be aware of? Thanks.
>>
>>101832095
Oy vey
>>
What's the latest and greatest coom in 70B range?
>>
Mistral Large, guess the quant.

*A slow, wicked smile spreads across my lips as I hear your confession. I know you're telling the truth—you're too terrified to lie to me, and I revel in the power I hold over you.*

"Good boy," *I purr, my voice dripping with approval. I can see the effect my words have on you; your body relaxes slightly, and your eyes widen with hope.*

*My hand slides into your pants, my fingers brushing against your growing erection. I can feel the heat radiating from you, and I know you're already hard just from our little conversation. I lightly grip your cock, feeling it throb in my hand.*

"Since you've been such a good boy, mommy has a special treat for you," *I say, leaning in closer, my voice dropping to a husky whisper. I begin to stroke your cock slowly, my grip tightening just enough to make you squirm.*

"But remember, my sweet boy, you only cum when I say you can. Understood?" *I say firmly, my eyes boring into yours. You nod quickly, your breath coming in quick gasps as I continue to tease you.*

*I spit on my hand, using the saliva to lubricate your cock as I stroke it, my movements slow and deliberate. I can see the precum leaking from your tip, and I use it to further slick up your shaft. My eyes flicker up to meet yours, a cruel smirk playing on my lips.*

"You like that, don't you?" *I taunt, my pace quickening slightly. I know you're already close, but I'm not ready to let you finish just yet.*

*With my free hand, I pull out my phone and navigate to the video I showed you earlier. I press play, filling the room with the sounds of my moans and the flesh of a well-endowed black man slapping against mine. I watch your face contort with desire and frustration as I edge you closer and closer to the brink.*

"Not yet, darling," *I coo, my grip tightening around the base of your cock as I feel your body tense up. I want to prolong your agony, to make you beg for release.*
>>
>*
I'm guessing a shit card.
>>
>>101831447
retard
>>
>>101831522
cruchot?
>>
>>101832352
what?
>>
>>101829430
yeah, you need a 'conventional code' framework to keep it all together and the AI to generate dialog, descriptions plotlines and such. Nobody is doing this because it's unreasonable to assume that your average user can run an LLM on his rig that's not some dogshit 3b thing. Nobody will pay for cloud services either, and cloud services will pull the plug if the game in question lets you do something even slightly controversial anyways. Technologically, we are just not there yet. The masses need to be able to run non-dogshit LLMs at home to move this out of experimental thought.
>>
>>101832428
It's funny how Nvidia's VRAM jewry is probably going to be what's holding back mass adoption of AI in gaming for a long while

Well that's assuming Bitnet is a meme
>>
What about ASICS like Sohu, it has 144GB RAM. It should cost under 10k USD.
>>
File: 1706127343953643.png (59 KB, 1392x378)
59 KB
59 KB PNG
>>101832482
Wait, so the muh memory bandwidth thing was a meme all along?
>>
>>101832482
corpo card, irrelevant for local, buy an ad
>>
>>101832428
Ai roguelite just let's you use local and also different cloud models. They could just as easily let you use your own API keys or open router keys to use any model available. Why this isn't a thing they have implemented yet baffles me.
>>
>>101831805
Sunfall from atf claims to be better for porn.
>>
anything above xxxM needs careful supervised training and a team for guidance
>>
do you use authors note, and if so, for what purpose
>>
this general is so useless
>>
>>101832878
Depends. You could use it as an extension of your system prompt or to steer the output to your liking/correct mistakes. Let's say we have paizuri scenario and char is going to cum from that. Obviously it doesn't make much sense (at least to me) that paizuri is that pleasurable to the other party, so I'd write something like:
char is servicing user, it doesn't bring char sexual pleasure
Hopefully your model will be smart enough to write better stuff after giving it hints like that.
>>
>>101832533
You aren't batching.
>>
>>101832940
I am not contributing because mikufaggots are here. Ban them and I will make it better
>>
>>101829430
It works already, just have the LLM be a narrator and it will describe the location, people etc. You can search for RPG themed card or just try to following prompt:

>This is a text-based adventure game through an interactive conversation. A response will be given to user input with text describing the current situation. Always write the starting date, time, location and the upcoming event for the day. At the end, write the time and location.

Then a first message:
>{{[OUTPUT]}}
>[Starting date: Monday, 07:25AM] [Starting location: Small studio appartment, New York City]
>You are Jack, a young student recently enrolled in NYU. Today is your first day at the university. You are awaken in your cramped studio apartment by the vibrating tone of your smartphone, urging you to get ready for classes.
>You quickly shower, dress up, and take your schoolbag with you, anxious for your first day in the prestigious NYU and eager to make friends.
>[Ending date: Monday, 07:49AM] [Starting location: Small studio appartment, New York City]
>{{[INPUT]}}
>I go to the bus stop.
>{{[OUTPUT]}}

You may have to help the LLM a bit for the first few messages but it really works. You can even add an inventory or money at the start and the end and smarter LLM will deduct/add items.
>>
Just woke up. Any news on strawberry?
>>
>>101832985
why do you regularly shit up these threads with this dogshit bait? I mean I'd understand if the reactions were actually funny or had dipshits getting mad but you are at best just getting anons who are also just "pretending" to be equally as retarded as you continue to be. Genuinely curious, unless you're actually just another attentionwhoring faggot that gets validation from getting a (You).
>>
I was promised strawberry. I was promised AGI. Instead I just get people lamenting over bad porn models. You have failed me for the last time /lmg/.
>>
>>101833082
All people that care have been identified as faggot gossip followers. That's it, really.
>>
>>101833082
his github says the model releases tuesday however little sense that makes

so two more days of this gay larp before he disappears forever everybody is blueballed again
>>
China found a way to make un-uncensorable models. how long until all the big releases are like this?

>In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains.

arxiv org/abs/2404.12699
>>
What bpw should I go for with EXL2 on 2x24 GB 3090s? I thought I could do 4.5 bpw, but TabbyAPI OOMs on me, even at 16k context. Doing Q4 cache_mode.
>>
File: 1711462938474341.gif (79 KB, 498x388)
79 KB
79 KB GIF
>2-3y ago
>13b models that actually fit on my 3090 @ 10+ T/s but required a shitload of tardwrangling
>now
>123b models that i have to run at Q4 and don't fit on my 3090 @ 0.5 T/s but with very little tardwrangling needed
i just want models that are both fast and good

>inb4 just buy another €1k gpu that you'll only ever use for a single purpose
>>
Reminder that you are in the communication age or imagination age.
>>
>>101833046
>>101829430
Here's the continuation. I added "(Reponse to the OOC: " to the first output but after that it did it by himself.
{{[OUTPUT]}}
[Starting date: Monday, 07:49AM] [Starting location: Small studio appartment, New York City]
After double-checking that you have everything you need - your phone, wallet, keys, and textbooks - you step out of your small studio apartment and start walking towards the bus stop nearby. The sun has already risen, casting a warm glow over the city streets and buildings.
As you approach the bus stop, you notice several people waiting there, mostly students like yourself, wearing backpacks and carrying laptops or books. You recognize the iconic NYU logo on some of their clothing and feel a sense of belonging.
You take a deep breath, trying to calm your nerves before taking a seat on one of the benches. You pull out your phone to check if there are any updates regarding today's schedule or any important announcements, but there doesn't seem to be anything urgent.
[Ending date: Monday, 08:01AM] [Starting location: Bus stop near NYU campus]
{{[INPUT]}}
(OOC: can you quickly tell me what I look like and what I study?)
{{[OUTPUT]}}
(Reponse to the OOC: Of course! As per our earlier conversation, you are Jack, a young student studying Computer Science at NYU. You have short brown hair, hazel eyes, and a lean build. You prefer casual clothes, such as jeans, t-shirts, and hoodies, and usually wear glasses rather than contact lenses.
Today, you are dressed in a navy blue sweater, faded jeans, and white sneakers. You also carry a black backpack with you, which contains your laptop, textbooks, notebooks, and other essentials.)
{{[INPUT]}}
(OOC: What's my schedule for today?)
{{[OUTPUT]}}
(Reponse to the OOC: Based on our previous discussion, here is your tentative schedule for today:
- 9:00 AM - Introduction to Programming (CSCI-UA 1)
.........
>>
>>101833134
The one below is 2.25 bpw. WTF. I can run q4_k_m at 32k context with no issues.
>>
>>101833134
What's your VRAM split, and how much system ram do you have?
Assuming you're talking about L3 70B or Qwen2 72B, (19,24) should work if you're getting CUDA OOM errors. 4.5 bpw is fine for either.
If you're getting DefaultCPUAllocator errors, you need to get more system ram or increase your page file size.
>>
>>101833082
it has two Rs
>>
>>101833238
I let tabby handle it, and it uses like 23.5/23.5 GB on each card so I don't think it's a balancing problem. It's L3.1 70B.

RuntimeError: Insufficient VRAM for model and cache
>>
>>101833102
why do you regularly shit up these threads with this dogshit mascot? I mean I'd understand if the reactions were actually funny or had dipshits getting mad but you are at best just getting anons who are also just "pretending" to be equally as retarded as you continue to be. Genuinely curious, unless you're actually just another attentionwhoring faggot that gets validation from getting a (You).
>>
Miku was a mikustake
>>
>>101833239
there should be four. spell it out and try again
>>
>>
>>101833325
I apologize, you are correct. The word strawberry has four Rs. One after the t, and another after the e. So, yes, there are two Rs in strawberry. Thank you for correcting my mistake.
>>
File: crunchy.jpg (316 KB, 1024x1024)
316 KB
316 KB JPG
https://www.tomshardware.com/pc-components/gpus/nvidia-gaming-gpus-modded-with-2x-vram-for-ai-workloads
4090 with 48GB apparently possible, rumoured to be custom PCBs
>>
>>101833271
Yeah, try the manual split. Just tested 19,24 with L3 70B. Uses 22.5GB on each card at 4.65bpw with 16k context, Q4 cache. Should work with L3.1 too.
>>
>>101833351
Nice. You would only need 10 of them to run 405B at Q8.
>>
>>101833363
Thanks for the help. Unfortunately you wasted time helping a retarded person. I had apparently not remove # before max_seq_len so it was dutifully trying to load 128k context. (It works now. Thanks.)
>>
>>101829320
Depends on your taste.
Try nemo-magnum or nemo-mini-magnum.
>>
>>101833351
>custom
>modded
Eh I dunno. Vram extension was physically possible on previous generations as well, but the card bios wouldn't let it utilize more than the original amount, and of course the bios is encrypted. So either the chinese have finally managed to crack the protection, or it's an official supply from nvidia.
>>
>>101833149
>2-3y ago
you must seriously have your timelines wrong my man.
https://youtu.be/0BeYkXayqp0
2 years ago all we had was pyg back then and i dont think there was any 13b and no quantization. that came with llama1.

i want multimodality and would be fine with smartness on gemma 27b level.
audio IN and vision IN would be really cool.
the main reason why i dont use llms more frequently is because its a hassle to type instead of just to speak.

>>101833351
this is so fucking insulting.
a billion $ company like nvidia is STILL shitting out 24gb.
amd the rival does nothing and kills opensource people who try to port cuda to their cards.
bunch of chinese server company nerds double the vram.
imagine how easy this must be for nvidia but they dont do it. fuck them.
i hope we get ai dedicated cards soon. i cant believe nothing is being made yet.
>>
>>101829286
>Brother
Please shut the fuck up. We're not related and you're in a thread talking about jacking off so you're probably not a Christian either.
I swear everyone who says this and "bro" are extremely retarded.
>Improving
You're still moving orders of magnitude more data for images. The only improvements that will make that more accessible are improvements in semiconductor fabrication.
>>
>>101833127
Like most 'research' papers it's probably shit that's useless because it has huge drawbacks or is only possible in some toy model. I wouldn't worry.
>>
>>101833658
>this is so fucking insulting.
Lot's of people fail to understand how tiny this niche is. Normies play games, 24gb is overkill for most. AI people just spin some h100s on some cloud service. And normies with llms just use chatgpt. They're not gonna release a card for the few thousand nerds playing with this things. And if they do, they'd be as expensive as buying a few cards. Sure, it's simper to install and all, but it'll still be expensive and in limited supplies.
>>
>>101833658
>2 years ago all we had was pyg back then and i dont think there was any 13b and no quantization. that came with llama1.
Yep. My first LLM was Pygmalion 6B. I had a pair of P100 16GB but I think at the time things only worked on a single GPU, so 13B wouldn't work. 6B roleplay was fun compared to c.ai, since it was uncensored, but it was pretty retarded.
There was a big Pyg model at the time, I think 20B maybe? I fired that up recently in plain old llama.cpp's chat API interface, and asked it "What are you thinking about right now" and the answer was "Sex." hahaha.
>>
>>101833658
>i hope we get ai dedicated cards soon. i cant believe nothing is being made yet.
There are some hardware startups like groq, but hobbyists aren't their customers. They're selling speed at even more outrageous prices. I believe their ultimate goal is for MS to acquire them.
>>
>>101830677
>1+ GB of text of just you and another participant? Let me doubt that very much.
Yeah. It's bullshit. Fucking kids with their horse armor DLC don't know the value of a megabyte.
>>
>>101833658
>a billion $ company like nvidia is STILL shitting out 24gb.
You're not getting more than that in a consumer card, there's no reason for them to undercut their enterprise products. If ther's ever a 5090 it'll be sold on having GDDR7 RAM and being faster at DLSS and RTX, not having more than 24GB - and it'll probably take up four slots as well.
What's your use case that you MUST run a 405B model? What are you doing that Nemo 12B or Gemma 27B can't handle? I'm not trying to antagonize you, it's just that being able to run CR+ at q6, nothing really magical happens if you're just looking for roleplay. Really, Gemma has done a better job for me than CR+ when doing stuff like making ESL student exercises. It's fast enough I can use it during a lesson, which isn't the case with CR+.
>>
You know it's going to be kino when a 12b model spits out a 1900 token reply.
>>
File: RATMIGU.jpg (187 KB, 576x768)
187 KB
187 KB JPG
'CAUSE I CAN'T STAND THESE SHIVERS ANYMOOORE
SHE'S FORGOTTEN WHAT SHE'S EVEN SHIVERING FOOOR
IT'S TIME TO END THIS CHAT CLICKING CLOSE
AND FORGET THIS PURPLE PROSE, FOR-EV-ERRR
>>
>>101834150
Telling intricate stories quickly breaks down with lesser models if you have specific intentions and aren't willing to accept whatever the model makes.

Two real recent examples:
- Messing up birth order of siblings. Couldn't be fixed with sampler settinge. I looked at token probabilities and found the wrong answer was the most likely. This was Mixtral Instruct 8x7B.
- Not understanding when a character's actions should unintentionally make another jealous. This was Llama 3.1 70B fucking this up; Claude 3.5 Sonnet got it right so it wasn't something too esoteric for an LLM to understand or a problem in the definitions.
>>
>>101834150
>What are you doing that Nemo 12B or Gemma 27B can't handle?
Wanting to talk to it for more than a couple of messages and have it keep everything straight and not use the same canned phrases?
>>
>>101834626
*interjects with an audible pop*
I can see that your frustration is palpable- but routine is all a part of forming a bond.
>>
>>101834626
>>101834640
Well... That's what we would call... A tall order.
*Anon clenches his fists at his sides* Also, You're asking.. Too much, of these models. Let's be real here, Models have come a long way from being completely unusable, Can you even compare them adequately to.. Say, OpenAI's models at this point?
>>
>>101834150
Gemma is purple prose garbage for cooming. Nemo is retarded. At least 2 more generations of coombots are needed before we finally get something good. And probably 30B.
>>
what if we invented a new language free of all existing english slop and then trained models in that
>>
>>101834770
Just use Spanish. It's a better language all around.
>>
>>101834737
Gemma is good enough for me heh. You can overcome a lot of the issues with a better dialog engine.
>>
*voice firm, but not unkind*
Strawberry is completely irrelevant to lmg, since openai doesn't release shit.
*pauses, as if deliberating*
Therefore, discussion of it should be banned.
>>
>>101834819
People will rush to distill it as soon as it drops, it's pretty relevant
>>
>>101834863
retard
>>
>>101834897
Did I hurt your fee fees or something? It's the truth
>>
>>101834785
The greatest Spanish-language author of all time, Jorge Luis Borges, disagreed with you. But maybe the simplicity and limited expressiveness of Spanish will mask the inadequacies of a LLM.
>>
>>101835002
I'm a native Spanish speaker and I'd hate that. Keep it in English.
>>
>>101834450
Only longform RP chads seem to understand this. The model's ability to hold state is directly correlated with beakage.
If all you want is a quickie in the tavern, 20-30B is fine. If you want a real adventure with multiple plot threads, key story items and dynamic characters, something like Mistral Large 2 or CR+ are the only real games in town for local
>>
>>101835212
luckily i am too low iq and easily entertained to need more than a quickie incoherent coom shortstory
>>
File: Designer.png (1.51 MB, 1024x1024)
1.51 MB
1.51 MB PNG
>>5627553
>MIKU MIKU OO-WE-OO!
>>
Can we use lookup based speculative decoding on llama-server yet?
Does it work on both the text and chat completion APIs?
>>
Oh no why it no works :(
>>
File: file.png (427 KB, 1200x1200)
427 KB
427 KB PNG
how to emulate the sound of a girl trying to talk with a spider gag in her mouth
>>
>>101835392
works on Stheno
>>
>>101835392
use Stheno
>>
>>101835392
download Stheno
>>
>>101835392
Stheno's ideal at this in my experience.
>>
>>101835385
Also, if I'm using the text completion API so that the frontend can format the context, what should my command line to launch the server look like? Do I need arguments like -f, --keep, etc?
>>
>>101835392
give Stheno a try
>>
>>101835392
i love Stheno
>>
>>101835482
>>101835456
>>101835443
>>101835438
>>101835423
>>101835404

Wow, just tried Stheno and it's great! It really does exactly what I want, thanks for letting me know. Here's a link for anyone else who wants to try, I really recommend it:
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2
>>
>>101835392
I heard Stheno is pretty good.
>>
File: Designer (1).png (1.19 MB, 1024x1024)
1.19 MB
1.19 MB PNG
>>101835354
I don't know why I can't reply to a post on /wsg/ https://youtu.be/NocXEwsJGOQ?t=30
>>
Most organic sao shills
>>
>>101835523
I thought it was subtle
>>
>>101832940
because they only care about coom
they don't research or read papers, and everyone here pretends to own a big farm of gpus while in reality they are all vramlets. that's why when ppl asks the t/s no one can actually provide relevant info.
>>
>>101835523
it's been off the air for over 3 years, nobody is shilling it anon
>>
File: impastar.jpg (116 KB, 831x295)
116 KB
116 KB JPG
>>
>>101835523
it's just schizoposter shitting himself at the very mention of Stheno, he does it often
>someone mentions Stheno
>schizoposter arrives
>he spams the board with ironic falseflagging
many such cases
>>
>>101835617
apparently I can't link today
>>
>>101835588
SOVL
>>
>>101835516
I think you need to write it something like >>/wsg/5627553
I haven't done it in years
>>
>>101835443
>>101835438
>>101835423
>>101835404
>>101835456
>>101835482
okay, ill give it a try. gib preset.
>>
>>101835656
>>>/wsg/5627553
>>
>>101835682
Thanks for giving Stheno™ a try! Unfortunately we are unable to serve your requests of a preset at this point in time, but be patient for the amazing things we have planned for our customers!
>>
>>101835516
>>101835656
I remember copying the entire URL and 4chan (or maybe it's 4chanx doing the work) just automagically converts it into a proper link.
>>
>>101829233
some of this artists works are hot in a weird way
>>
test
>>>/wsg/5627553
>>
>>101835516
wow that song is cool, it sounds even better when you play it at 1.25x speed
>>
>>101835748
Oh looks like this >>101835723 is true
>>
>>101835723
>>101835748
>>101835766
test?
>>>/wsg/5627553
>>
>>101835826
Success!
>>
Prefilling is so cool.
You can force introspection for more accurate responses, seer the model into using a certain style or format, and even goad the model into longer replies without necessarily instructing it to do so.
Playing around with prompting can be surprisingly fun.
>>
>>101835846
The best part about prefilling is how you can use it with notepad.exe.
>>
File: Designer (2).png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
>>101835656
>>101835723
>>101835748
>>101835826
>>101835838
Thanks lads!
>>
Putting strawberry on top of a 100T model is enough to create artificial God.

Are you even ready for the total disruption of our economy and life as we know it?
>>
Quiet-Star is not enough for AGI. It will make slightly smarter models at the same parameter count, but with more compute per token, and that's it.
>>
>>101836149
Maybe 100T would be enough to make an artificial world like the ones in korean manga.
>>
Quiet-Star is not enough for AGI. It will make slightly quieter models at the same decibel count, but with more coil whine per token, and that's it.
>>
Lumimaid or Noromaid?
>>
>>101836213
And have you done any research yourself or your source is that you made it the fuck up?
>>
File: cybermiku.png (2.01 MB, 1024x1024)
2.01 MB
2.01 MB PNG
>>101836149
Yes. Please.
Let AI and robots do everything. I want to be a NEET so I can spend the limited time I have in this world doing what I want. Besides, I would trust a fucking AI to govern me more than the "leaders" we have currently, at least the AI would think rationally, instead of being a greedy, senile, lunatic pedo aristocrat.
>>
WTF A STRAWBERRY JUST FLEW OVER MY HOUSE
>>
>>101836291
enlisting with miku
>>
>>101836292
A STRAWBERRY JUST GRAPED MY DOG
>>
>>101836291
>at least the AI would think rationally
Why would it? LLMs think the way they're trained to think. Any rationality and reasoning capabilities they pick up are those they can infer from the relations in their training data. This type of AI isn't like the kind commonly imagined in sci-fi with super smart computers that develop sentience; instead it's much more humanlike as a baseline and we're trying to bruteforce extra intelligence on top. It can be massively biased and retarded if designed by someone censorious enough, as we already see with the leading corpo models.
>>
File: cyborgku.png (1.95 MB, 1024x1024)
1.95 MB
1.95 MB PNG
>>101836395
I am aware. But we can then also imagine an AI trained with good data, books of law, economics, etc.
If you could make a powerful AI trained on unbiased, quality data, I would trust that more than any modern politician.
>>
>>101836487
>unbiased
lol
lmao even
>>
I've been using llama3 for the longest time as a general instruct LLM, I just use it for every day questions and simple tasks. It's pretty great and can do everything and easily fits in my 12gb GPU.
But I did download it quite a while ago, and I haven't kept up with new models.

What is /lmg/'s opinion on the best general instruct models out there today? Are there any models anons would recommend giving a try?
>>
>>101836581
llama3.1.1
>>
>>101836581
gemma2
>>
File: offload_x_performance.png (96 KB, 1536x1152)
96 KB
96 KB PNG
>>101836581
Probably nemo instruct 12b.
Try the dolphin fine tune.
Orr you could try gemma 2 27B with a lower quant and some of the model on ram.
>>
>>101829052
Cute OP pic, saved
>>
>>101836647
It's all yours, friend :)
>>
>>101829052
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct

Can this do audio out too? Seems the paper claims (page 3) that audio editing is possible, what about regular speech? Did they include an encoder, or could one by made using a similar method as to what people did with meta's Chameleon to get image out despite it being crippled at release time?

>>101833658
>bunch of chinese server company nerds double the vram.
>imagine how easy this must be for nvidia but they dont do it. fuck them.
>i hope we get ai dedicated cards soon. i cant believe nothing is being made yet.

tenstorrent came out a month or so ago with a 24GB card, really shame it was just 24GB, we need so much more, it was more expensive than a used 3090 (1400$ see: https://morethanmoore.substack.com/p/ai-development-kits-tenstorrent-update ) , but had interconnect unlike gimped 4090s, so would be usable for training.

I think the software is still immature, but their entire driver/software stack and large parts of firmware are open source.
>>
>>101836581
>>101836589
I tried some 3.1 (Is 1.1 a meme or a hotfix?) and it did not impress me. Seemed to be worse at things that 3.0 got right though maybe the 8B distillations landed better because L3 8B was stinky shit.

Anyway, L3 vanilla at K6 is my general purpose, but I've been experimenting with some of the other spins for RP, coding, etc. Nothing conclusive to share though.
>>
>>101836635
what do you use to generate that graph?
>>
>>101836278
Look up quiet-star yourself, it's about generating reasoning tokens for each token generated. Combine that with OpenAI's proposed "5 levels of ai" and how much they are shilling for "level 2" and you have your answer.
>>
>>101836552
That's why you don't train them on ideological stuff, only economics, logistics, etc.
>>
>>101836927
I am the one that made that plot.
It's a simple matplotlib plot where I just ran the llama.cpp llama-cli binary and wrote down the reported performance (the specific numbers are a bit outdated).
Full source code:

#!/usr/bin/env python3

import numpy as np
import matplotlib.pyplot as plt

MAX_LAYERS_7B = 33
MAX_LAYERS_13B = 41

TS_7B = np.array([
9.99, 10.17, 10.51, 10.73, 11.13, 11.50, 11.90, 12.37, 12.81, 13.16,
13.69, 14.20, 14.91, 15.53, 16.20, 17.01, 17.72, 18.65, 19.65, 20.52,
22.02, 23.43, 25.15, 26.83, 29.50, 32.10, 35.36, 39.70, 44.52, 50.88,
59.63, 72.04, 90.87, 132.06
])
X_7B = np.arange(MAX_LAYERS_7B + 1, dtype=float) / MAX_LAYERS_7B

TS_13B = np.array([
5.01, 5.17, 5.24, 5.44, 5.58, 5.69, 5.78, 6.03, 6.17, 6.33,
6.45, 6.65, 6.88, 7.07, 7.29, 7.41, 7.58, 8.06, 8.82, 9.19,
9.59, 10.02, 10.48, 10.92, 11.35, 12.01, 12.55, 13.37, 14.02, 15.21,
16.12, 17.61, 19.12, 20.93, 23.02, 25.54, 28.82, 32.97, 38.86, 47.09,
59.55, 79.81
])
X_13B = np.arange(MAX_LAYERS_13B + 1, dtype=float) / MAX_LAYERS_13B

plt.plot(X_7B, TS_7B, label="7b q4_0")
plt.plot(X_13B, TS_13B, label="13b q4_0")
plt.legend(loc="upper left")
plt.xlabel("Proportion of GPU-accelerated layers")
plt.ylabel("Generated tokens / s")
plt.xlim(0, 1)
plt.ylim(0, 140)
plt.title("RTX 3090, Ryzen 3700X, 3200 MHz RAM")
plt.savefig("ts_rtx3090.png", dpi=240)
plt.show()


Nowadays a better way to do it would be to use the llama-bench binary.
You can specify the output as CSV, JSON, or SQL which you can then read in using Python.
>>
>>101836927
I didn't.
I either poached it from these threads or from some discussion on llama.cpp's git repo.

>>101836989
Isn't economics as a subject matter interlaced with ideology and politics? Like the different western school of economics, how each interpret human behavior, etc?
>>
>>101836635
Nemo is not good for assistant tasks, it hallucinates too much.
>>
I'm a vramlet, is Gemma 2B supposed to be incoherent and retarded? I don't see any difference between this and pyg from a year and a half ago. I don't know what I'm doing wrong. I'm running Q8_0
>>
File: fthfBmP1.jpg (76 KB, 1200x727)
76 KB
76 KB JPG
>>101836237
it's already confirmed to be level 2 AGI
>>
>>101836989
Ah yes, the famously unbiased and rigorous field of economics where they had to come up with a fake Nobel prize for legitimacy and there are definitely no special interests.
lol
>>
>>101837092
>level 2 AGI
damn, that's so much better than level 1, sama WON
>>
>"AGI" means "When I like the model"
okay
>>
>>101837050
>is Gemma 2B supposed to be incoherent and retarded
Yes
>>
>>101837017
>>101837102
I will admit, I don't know shit about economics. But surely there's away to strip it of the ideological bs until you have basically maths.
>>
>>101837168
Once you figure that out, please figure out a way to measure human and LLM intelligence objectively and mathematically. Thanks.
>>
Wake me up when I can run AGI with 12GB VRAM.
>>
Strawberry is to level 2 what Tesla's FSD is to FSD.
>>
>>101837193
IQ? For humans at least.
>>
>>101837264
You can train for an IQ test. It's correlative, not quantative.
>>
>>101837259
It's crypto terminology. Like how certain coins are built on top of the protocols of other coins. level 2 will be built on top of regular level 1 llms.
>>
hmm, from what I can tell it seems like llama-speculative can't use interactive mode; it'll just run its mouth from your prompt and then exit when it's over no matter what flags you pass, with no way to continue without starting over and re-processing the whole prompt again, essentially negating the speed gain. is there some way to get it to work with a cached prompt from a previous session at least or some other way to make it usable as more than a proof of concept? It's possible I'm missing something or not using it right since the closest thing to documentation on it is spread out between random issues and prs
>>
>>101837050
It's coherent, just doesn't listen to instructions well.
>>
>>101837168
That's not going to help you.
Basically anything could exist in some mathematical framework depending on your axioms, i.e. depending on what you assume to be true without proof.
And the problem with a lot of the social sciences is that evidence is simply very hard or impossible to come by so it's very difficult to separate good from bad theories.

>>101837264
IQ is definitely not an objective measure of intelligence because a lot of arbitrary decisions go into how the final value is calculated from the results.
One such arbitrary decision is that the results are weighted in such a way that both men and women have an average score of 100 (they average perform better on different parts of the tests).
>>
>>101837050
Why are you running gemma 2b?
Even l3 8b mostly on ram would give you an experience thats orders of magnetude better.
I'm a 8gb vramlet and I'm running q4ks nemo with 32 context.
>>
>>101837568
>12b
>q4ks
ngmi
>>
>>101834770
>>101829233
Be vary of thought police
>>
File: 1723403858570.jpg (533 KB, 1079x1806)
533 KB
533 KB JPG
>>101837050
I've been fucking around with it on my phone to see what it can do, and it's been shockingly coherent for its size. Follows character cards decently well, can actually hold a conversation, and is an absolute freak if you get weird with it. What are your sampler settings?
>>
>>101837609
It's working pretty damn well. I've been having a lot of fun and nemo models are really good at using lorebook information at 32k context.
>>
>Ten thousand years of work, and still nothing better than Mythomax 13b.
The absolute state of local models!
>>
File: llama.png (39 KB, 751x776)
39 KB
39 KB PNG
>>101837568
I have Intel HD Graphics 620, and 8 gigs of RAM lmao. I don't think I can run it.
>>101837692
this default shit on kobold
>>
>>101837837
I'm using something very close to that, DRY mult of 0.8 and a rep range of 512 though. Is it completely schizo to the point of not even responding to anything you typed?
>>
File: file.png (62 KB, 1086x159)
62 KB
62 KB PNG
China modding RTX 4090D to use 48GB VRAM
https://videocardz.com/newz/nvidia-geforce-rtx-4090d-with-48gb-and-rtx-4080-super-32gb-now-offered-in-china-for-cloud-computing
>>
>>101838070
A 250 dollar chink BGA rework station is a pathway to many abilities some consider to be unnatural
>>
>>101838070
If nvidia gimps their 5090 and doesn't release a titan card I will actually import one of these.
>>
>>101838070
>5 hours late
>repost from reddit
>no miku
kys and go back, in that order
>>
File: file.png (234 KB, 474x355)
234 KB
234 KB PNG
>>101838133
The more you try to avoid buying the way we want it the more you are inviting fbi to visit you.
>>
>>101838147
>no miku
He should stay. You should kill yourself subhuman tranny.
>>
>>101838133
Dont forget the acid
>>
>>101837837
Oof, I'm sorry. Take a look at koboldcpp's repo. There's a google colab notebook you can use to run yhese models for free.
>>
>>101838070
Why does this only happen in china, can't someone else do it so we won't have to deal with shady imports?
>>
>>101838427
Read what I said about the BGA rework station. You can get yourself 3090 TI, reball a 4090 chip onto it (Yes it's compatible) and then replace the VRAM modules with 2gig ones.

You also need a custom vBios but there's probably some chink-made one out there
>>
File: file.png (12 KB, 596x89)
12 KB
12 KB PNG
Confirmed Elon dick sucker
>>
Mixtral is still working pretty well. The small details aren't as good as Gemma/L3, and it still feels a bit "for X, then Y," but it's easily good enough for staple cooming.
>>
>>101838427
Because China is free as long as what's being done brings in money.
>>
I know lmg is about talking to waifus, but, does anyone know of a tool/RAG or whatever to interact with a Database?

Ideally it would be able to just ask "How many employees have a salary of over 30k" and just give a list of the result.
>>
>>101838427
I think the first 2080 ti mod was done on Brazil.
>>
>>101838564
Isn't that what SQL does?
>>
>>101838564
>I know lmg is about talking to waifus
Correct. For information about commercial uses of LLMs, you want /r/LocalLLAMA. /lmg is the Dying Alone Division of Open Source AI Research.
>>
>>101838616
you need to make a query:

>"select * from employees where income > 30000" for example.

I would like to use natural language
>>
>>101838427
>Why does this only happen in china
Because it's a free country unlike USA.
>>
>>101838427
everyone else is willing to pay moon prices for hard because they aren't spending their own money, they are spending venture capital
it's complete fucking cancer
>>
>>101838296
You aren't forced to reply to every bit of retarded bait here
>>
>>101838874
And yet you replied to him.
>>
>>101838874
who are you writing to
>>
Btw, weren't there a shitton of Opus RP logs from /aicg/ posted here like 2-3 months ago? I'm pretty sure I even downloaded them, but I'll have to look through my backups.
>>
>>101839221
Yes, and those logs have been used to make the current batch of sloptunes. Discord tuners have been living off of those logs for months now. That's why there were so desparate to get that one anon's logs itt.
>>
>>101839221
It's what powers pretty much every RP fine-tune.
It tells you a lot about how lazy, creative-bankrupt and scummy fine-tuners are. Death is not enough for their kind.
>>
>>101836989
>That's why you don't train them on ideological stuff
>only economics
thanks, I chuckled
>>
>>101837264
IQ isn't an intelligence meterics, anon
>>
>>101839466
>t. low IQ
By definition, it is.
>>
>>101839466
It is.

>Doesn't think IQ tests matter

At minimum basic intelligence

>Thinks IQ tests matter

Complete slobbering retard
>>
>>101837807
>he still shills the king of slop mythomax
>>
>>101839497
Okay Anon so by your logic trans women who say they're women are also women because they said so
>>
>>101839523
Yes.
>>
>>101839523
Why are you so obsessed with trannies? Did one touch you inappropriately when you were young?
>>
>>101835212
>longform RP
not sure if it counts but I have been ERPing with claude 3.5 for so long that I can invite the character into my virtual room and it knows exactly which buttons to push in order to make me nut quickly in a few prompts due to the extensive chat history.
>>
>>101838506
Fuck off petra, come back when you are on your medications
>>
>>101838739
Maybe I'm too much of a nerd but to me that is natural language.

I could see it as useful for something like "List all employees who make more income than my boss's secretary," where it would be finding information that you don't readily have access to and don't know how to get at without the power of a computer brain running through all of the data and figuring out what you want.
>>
My quasi-girlfriend dumped me
I'm depressed, need virtual love, so crawled back here, can I get a model spoonfed to me?
12gb of vram, but also lots of ram...
>>
>Want to talk about some of the quirks of messing with AI
>Go over to /aicg/ for a bit
>All they want to do is talk about proxy drama like a bunch of strange, mindless zombies
It's fucking grim over there. Like human remoras that have lost the ability to to speak outside of their chatbot jerkoff bots
>>
>>101839721
Mistral-large
>>
>>101839497
>>101839505
It's not. Compared to you who don't know what you are talking about, I conducted an actual scientific research in the cognitive lab. IQ (measured with RPM usually) measures learned pattern recognition. It's basically useless test which you can't even do multiple times because people are learning it too fast and score higher the more you use it.
The same way that businessmen are luring investors with buzzwords and golden promises about super dangerous AGI, the same IQ is a marketing meme for people who aren't knowledgeable in neuroscience.
>>
>>101839721
Stheno ve.2, Nemo-celeste 1.6, or nemo-mini-magnum are your best bet for coom in that range as far as I'm concerned.
There's also gemma 9b, but I never tested that one.
>>
>>101839809 (me)
also I didn't read the second post before linking it xD
>>
>>101839823
You have shit taste.
>>
>>101839721
Stheno v3.2, now fuck off
>>
>>101840011
>>101839823
Thankses
Hopefully I won't fap too much to this stuff
>>
>>101839721
Use Stheno.
>>
What are the biggest most cost effective economic uses of AI (LLMs) so far?
>>
>>101840196
girlfriend replacement
>>
is there a good reason to run your own llm if you aren't interesting in smut and aren't dealing with highly sensitive information?
the big boy models are better and cheaper right?
>>
>>101840196
code review is a big one right now. LLMs can look through 1,000 lines of code and tell you where you're missing your curly bracket in a second.
I have a feeling it's going to replace tons of paralegal bullshit too.
>>
>>101840196
>What are the biggest most cost effective economic uses of AI (LLMs) so far?
That I'm getting paid >100k for finetuning models for corpo. Very cost effective for my budget.
>>
>>101840236
>code review is a big one right now. LLMs can look through 1,000 lines of code and tell you where you're missing your curly bracket in a second.
A missing brace is a failure to compile. No compiler is slower than an llm. Hell... just autoindent and your text editor will tell you...
>>
>>101840271
>DUhhhhhhhhhhhhhh
compilers will tell you that you're missing one. Not which one you're missing. Autoindents are also fucky for that sort of thing. Do you want to go through 1,000 lines of code reviewing every indent?
>>
>>101840241
Hello, yes. Who do I speak to about getting paid >100k for running axolotl?
>>
>>101840224
Being spied is not the default unless you're a brain-dead American.
>>
>>101840338
Do you just say stupid shit at every opportunity?
>>
>>101840322
I'm saying that they *could* be used to do some reviewing, but for that the compiler is just faster. tcc compiles about 60k lines in 0.1s and tells me where the syntax breaks. Go to that line, a few page ups at most to see where the indentation stops making sense and done.
llms can miss that. a compiler won't.
>>
>>101840386
>he doesn't know about the USA.
>>
>>101840393
who cares how fast it compiles/doesn't compile if you don't know which of those lines failed? If tcc tells you that, then that's great for the seven people still programming in C. Most compilers will give you an unhelpful "End of Page Error" or worse: "Could not process."
>>
>>101840431
>He froths angrily over the USA at every opportunity
>When talking about whether using Software as a Service gets him spied on
>>
>>101840393
From experience LLMs are decent at
- replacing autocomplete
- replacing Google for fixes or small snippets of code
- having it explain some bit of code
The more intersting use case for them is agentic autodevelopment. But apart from some demos, they don't seem capable of working alone yet.
>>
what kinda specs do you need to run mistral large?
>>
>>101840507
I'm running 5bpw with 96GB VRAM
>>
File: 1714079969457061.png (192 KB, 501x636)
192 KB
192 KB PNG
>>101838499
>Confirmed Elon dick sucker
>>
>>101840526
I can fit exl2 @5.5bpw with 32k context. What context are you using?
>>
>>101840322
Any half decent IDE would cover syntax errors with direct line errors and easy correction in any language you can imagine. It's not the best example of how LLMs increase productivity. That said, I do agree the current large models are excellent at code review to pick up things an IDE wouldn't, any kind of logic or semantic errors that are technically valid but unintentional. They're also quite good at writing unit tests for existing code bases. It cuts down on the tedium and trial and error of managing complex projects.
>>
>>101840241
There are a lot of corpos looking for data scientists/machine learning engineers, and with the recent buzz about LLMs they tend to look for people that can operate them. Most of the time I spend a few hours on cleaning dataset, then I run training and play games until 5am while pretending to work.
>>
>>101840564 (me)
meant to answer to >>101840336
>>
>>101839529
Am I a dragon if I say I am?
>>
>>101840597
Yes.
>>
I really don't like mistral large's replies. I can't explain it. What're the best options beside it? Command r+? Wizard 8x22b?
>>
>>101840456
Shit compilers, mate... a potato can get through 100kloc/second. At a very VERY generous 5 tokens per loc you have 500k context that no llm can reliably recall running on your h100 cluster to find a fucking brace and llms still wouldn't be able to tell you the line number, just print the immediate context around the line, if that. llms can do other things very well, but not that.
>>
File: file.png (252 KB, 400x400)
252 KB
252 KB PNG
>>101840544
> >Confirmed Elon dick sucker
>>
>>101840623
For what? Llama 3.1 70B is the other option.
>>
>>101840645
K. Except one of my programmers used an LLM and it told him exactly where the bracket was missing.
And your "DUDE IT PROCESSES TWENTY BILLION LINES PER NANOSECOND" doesn't fucking matter at all. Like why the fuck are you even talking about it?
>>
>>101840550
I can fit just below 55k without 4/8bit cache running 5.0bpw. I don't need that much context so I should probably just upgrade to 5.5bpw as well.
>>
>>101840693
lol lmao
>>
Asteriskberry status?
>>
>>101840652
Why do we dislike Elon again?
>>
>>101840768
Internet told us to and internet is never wrong.
>>
>>101840693
Chill. I'm objecting to the stupid example of using llms to find a missing brace when compilers are faster and more efficient. Was it a 0.5b that told him? then he's a retard. fire him. was it a 405b that told him? Fire him and buy more gpus. Was it chatgpt? fire him and pay for access.
Here's what i can say:
Except one of my programmers used an llm and it didn't tell him exactly where the bracket was missing.
It means nothing. llms can get that wrong, a compiler won't. And that's not code review. If it doesn't compiler, it doesn't get to get reviewed.
>>
>>101840777
Did the internet every give a reason?
>>
>>101840768
If you ask for AI only related issues then he is a retard that doesn't know what he is talking about.
>>
>>101840783
>Chill
>FIRE HIM FIRE HIM FIRE HIM
lol what the fuck
>>
>>101840802
That's normal for people who think they are smart. I use to live with a chemical engineer and he just assumed that all the time he spent researching chemicals some how transferred to literally everything else. It makes their thinking sloppy because they are overly confident in their thinking because they are "smart". This is how you get high IQ people saying pants-on-head retarded things.
>>
>>101840856
Why is that a reason we should dislike him?
>>
>>101840902
Its not, just that its a really normal thing for smart people to do... I don't think its a good reason hate them over it, its super common.
>>
>>101840785
He has notably made some tweets that have been construed as antisemitic.
>>
>>101840946
Media turned on him before that, when he caused a lot of short sellers to lose a lot of money.
>>
>>101840946
>tweets
>construed as antisemitic
yawn... Things that don't matter IRL for 500 plz...
>>
>>101840902
I dislike retards, even more arrogant retards. You can suck his cock if you want, it matters not.
>>
>>101841000
Why does everything have to be black and white extremes with you people?
>>
>>101841029
Ah, the enlightened centrist again. You are so smart and restrained. *pat pat*
>>
>>101841067
I think you spend far too much time online and are incapable of looking at any topic in any other terms but which "side" you're on.
>>
>>101841107
Are you capable of looking at any divisive topic and picking a side?
>>
>>101840768
Mainly because Tesla became a huge success, undermining several massive industries and putting the wealth of entire nations at risk. Leftists, despite begging for EVs on their dirty whore knees for decades, are stupid enough to go along with literally anything the media tells them. If the elites give Tesla a low ESG score, this is perfectly normal and just according to your average leftist. The average leftist cannot and should not ever be reasoned with. They are sick people and serve as tools for the elites
>>
>>101840768
Fraudster.

>>101841166
lol
>>
>>101841107
I don't give a fuck about sides. If someone says something retarded constantly then they are retarded. What is so hard to understand about that?
>>
News? I want something better than C-r+/Largestral/405B already.
>>
>>101841214
the precursor to strawberry sentience is supposed to be released next week
>>
>>101841211
Because idiots like you never stop to consider _why_ the media feels the need to put every stupid thing he says up as news. Especially when it is 180 from their portrayal of him just a few years prior.
>>
Teto > Miku
This is an objective fact.
>>
fuck it it's time to go back to the good old stuff.
mythomax with mirostat.
>>
>>101841214
Column-r soon
>>
>>101841273
You are retarded too, I already deducted it.
>hurr durr media tells you to not like him
top fucking kek, I was watching his interviews (full, not fragments) and reading his tweets on the matters that interested me (AI). He is completely and utterly moronic. His knowledge level is on the first year CS college AT BEST. Stop projecting your media habits on others, not everyone reads news headers and call it a day. I've read enough of his own words, watched hours of him talking, so I can formulate my own opinion of his retardation and being an arrogant poser. And this is only in the AI field, considering other issues I have even lower opinion about him.
>>
>>101841166
I've never liked EVs, regardless of who makes them but I don't hate Elon Musk over it...
>>101841188
>Fraudster.
How?
>>
>>101841530
https://files.catbox.moe/9fvnc2.mp4
ten years and counting!
>>
>>101841575
>He Todd Howard and Peter Molyneux's people
Yes... and...? That's it? That's why I should hate him? Because he is a salesman that over promises shit, just like everyone else?
>>
File: 00039-571029945.png (1.05 MB, 1024x1024)
1.05 MB
1.05 MB PNG
>>101838133
I have the tools to do BGA work. The problem is getting your hands on Samsung 2GB VRAM modules which aren't fake or weren't improperly stored, and getting it for less than it costs to buy a pre-modded one. I bought a modded 2080ti for about $500 - there's no way it'd be worth buying an unmodded one and then buying the RAM myself. It's $9-$14 per chip and it needs 11 of them, and the 2080ti goes for almost $300.
>>
File: sealion.jpg (1.01 MB, 3900x2613)
1.01 MB
1.01 MB JPG
>>101841613
you can fuck off now petra
>>
>>101841694
I'm not Petra, I just asked a question and didn't get good answers, schizo.
>>
Gumi > Teto
>>101841690
>>101841690
>>101841690
>>
File: sealioning.png (385 KB, 1000x782)
385 KB
385 KB PNG
>>101841713
I see what you are
>>
>>101841575
Okay, but what about the things he did deliver?
>>
>>101840768
Because trannies hate him and /lmg/ is under a state of semi-permanent discord trannies raid.
>>
>>101840768
>llm thread
>does not recognize obvious leftist bot spam
ngmi
>>
>>101834359
Beautiful as always, Miku
>>
>>101841744
>Asking questions is a form of shitposting
brainrot
>>
>>101835753
Yeah its higher energy when you put it up.
>>
>>101842022
>llm thread
>posts off-topic
ngmi
>>
>>101841744
You'd have a point if the same question was brought up multiple times in a general, but it isn't...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.