[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102036232 & >>102025568

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: distilled miku.png (522 KB, 1024x1024)
522 KB
522 KB PNG
►Recent Highlights from the Previous Thread: >>102036232

--Papers: >>102042721
--No pre-made lewd loras available, creating them is challenging and model-specific: >>102036249 >>102036336 >>102036396 >>102036421 >>102036485 >>102042681 >>102036396 >>102036662 >>102036681
--Llama.cpp developer outlines roadmap, including Jamba and quantized model support: >>102037185 >>102038267 >>102040923 >>102040964 >>102041086 >>102041242 >>102041494 >>102041285 >>102041303 >>102045497 >>102041328 >>102041439 >>102042547
--Efficient language models and hardware acceleration: >>102042844 >>102043061
--Anon tries to free up VRAM for 1024x1024 image generations: >>102045653 >>102045713 >>102045741 >>102045933 >>102045987 >>102046244 >>102046355 >>102046411 >>102046870 >>102047138
--Anon shares prompt tweaks to reduce sloppiness in AI output: >>102039533 >>102039928 >>102040042 >>102040136 >>102040148
--Anon gets feedback on fluctuating loss during model training: >>102039160 >>102039388 >>102039436 >>102039583 >>102039504 >>102039649 >>102041176
--Anon discusses model output variety and repetition issues: >>102040776 >>102040934 >>102041608
--Phi 3.5 performance on LiveBench and comparison to other models: >>102036833 >>102037513 >>102037663 >>102042528
--TP has substantial overhead, exl2 has issues with overfitting and calibration: >>102042018 >>102042636
--Exllama2 0.1.9 update adds tensor parallel mode, but has issues: >>102040631 >>102040676 >>102040665
--Debian kernel 6.10.4 has CPU inference speed regression: >>102044211
--Anon shares a bot's response to a GPU usage issue: >>102036815
--Anon requests full PDF of claude-opus microservice architecture model: >>102037381
--Miku (free space): >>102039511 >>102041344 >>102041714 >>102041735 >>102042395 >>102042671 >>102042696 >>102042811 >>102042813 >>102044045 >>102044092 >>102044166 >>102044267 >>102045076 >>102045570 >>102046954 >>102047485 >>102048294 >>102048701

►Recent Highlight Posts from the Previous Thread: >>102036996
>>
I want to be roleplaying, show the model an image with an outfit, and tell the character to wear that. Is that doable?
>>
Anthrafags, if you are here, I would suggest you to try making your model multilingual. Like, just translate the dataset to Spanish, French, Italian or some other language easy easy to translate from English using some LLM and train with that added data.
It's a well-known fact that multi-lingual data makes models better, you guys are missing huge gains by training only in English.
>>
How come function calling has never taken off with open source models?
>>
File: 1724445217526807.png (2.34 MB, 2400x2022)
2.34 MB
2.34 MB PNG
https://huggingface.co/anthracite-org/magnum-v2-4b
Pruning Is Magic
>>
>>102049086
Should be easy for them to do as non-native English speakers.
>>
>>102049113
Did it take off with cloud models?
>>
(Repost)
In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.
>>
>>102049116
looks the girl in the image is rae taylor, who is a faggot so anthratroons are a bunch of gay niggers, dont use their models if you dont wanna support the faggot community.
>>
>>102049145
anon, how do you know a gay furry mascot?
>>
>>102049145
Ew what the fuck, yeah i'm not gonna touch their shit with a 10 foot pole if their putting fags in their card image.
>>
>>102049145
It's okay for girls to be faggots though.
>>
>>102049194
its unnatural, dont support the fags. its men and women only. anything else is going against nature.
>>
>>102049129
Yes, cloud models do all sorts of things on the fly like automatically looking shit up or running code.
>>
>>102049116
Realistically what could one even do with this little turdlet? Bump context to like 32k except it doesn't even work on llama so ??????????????
>>
>>102049086
>spoonfeeding retards
>>
>and there will be a reckoning
They're actually threating people with physical violence now.
https://huggingface.co/NewEden
>>
File: ihavelehardware.png (101 KB, 756x838)
101 KB
101 KB PNG
>>102049116
If you're reposting that, this is worth reposting too:

>>102048697
>To me they look like they're gearing up to eventually go commercial in some capacity, maybe they'll start a business within a few months if they haven't already. I think this is the main reason why they're so hated, desu. Their key membres took advantage of the good will of the community many times over the past year or so, lied, then congregated together, pulled the ladder away and closed off into their little private discord.
>
>Only those who still aren't disgusted by their behavior or don't know anything about their members would use their models without puking, no matter how good they are (spoiler: they aren't).
>
>I hope you're feelin' good climbing the social ladder, Anthrashites.

>>102048977
>What business? No one will pay to use their shit models.
Consulting, datasets, finetuning services, maybe model licencing or even networking with people "in the know". That's how things would likely work out at this level. Even simply knowing how to "push buttons" can sometimes be valuable.
>>
>>102049311
>discord screenshot
Go back.
>>
>>102049295
hey dumbass, have you never played far cry 5?
>>
Has anyone tested which character name for yourself gives the best results?
>>
>>102049332
I don't play shit games.
>>
>>102049311
what does that discord ss prove?
>>
>>102049145
>I'm so offended right now
I hope for your sake that's supposed to be bait
>>
>>102049311
Thanks for reposting the truth.
>>
>>102049383
im not offend im saying that anthracite are gay people who are supporting Y*ri
>>
File: 1589617068855.jpg (54 KB, 1002x857)
54 KB
54 KB JPG
Let's play a game! This Saturday at 1 PM PT, I'll do a collaborative storytelling/RP session, where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is now mostly set. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. But always taking suggestions. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.

>current suggestions
1. >>102002238 >>102031804 >>102031852
(compiled together) The assistant is a narrator and we guide the narration. The scenario will begin with a meeting between 3 Illuminati members in a bunker. One will be a doppelganger with their own agenda that's even more evil than theirs. We'll ask the model to write about who these characters are first and flesh them out. Assuming that's successful, we then ask it to begin writing the meeting, and from there, we guide the narrator to get them to discuss world events which we may come up with.
2. >>102031807

>current draft of prompt
>>102048077
Taking suggestions for improvements/modifications to this too.
>>
>>102049116
Do models like these need instruct mode? I can't tell from the descriptions.
>>
File: 1598.jpg (5 KB, 329x67)
5 KB
5 KB JPG
>>102049311
>Even simply knowing how to "push buttons" can sometimes be valuable
>>
>>102049377
Little, in the context of that post. Only remember that they have access to *le* hardware at enterprise scales sometimes and "spare compute" that independents can only dream of. I think key Anthrashite button pusher alpin was bragging here about having access to an H100 cluster the other day (who else would?)

>>102049329
Not my screenshot.
>>
>>102049503
whocars, he has compute - no shit he's alpin, nigger made aphrodite and works on pgy. You just seem jealous
>>
I get it now. Death is the only solution.
>>
voice husky
>>
>>102049531
>aphrodite
Do you know about vLLM?
https://github.com/vllm-project/vllm
>>
>>102049606
i do, but i don't really care, ive already got everything setup for myself on aphrodite. It just works.
>>
>>102049531
>more fake typos
>>
>>102049657
keep malting
>>
is Q4 really almost as good as a full model? how is that possible? it sounds too good to be true
>>
>>102049767
Not even close.
>>
What I've experienced playing with Jamba 1.5 Mini so far:

I found with only "Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}." it was extremely passive and wrote extremely short replies. Putting "In your next reply move the story forward." in the system role after the chat history radically changed it for the better.
>>
File: mmlu_vs_quants.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>102049767
Not generally, although that will probably depend on the model itself.
I wonder what the chart looks like for mistral-nemo.
>>
>>102049810
Try instructing it to write a fixed or range number of paragraphs, like three to five.
>>
>>102041113
>That's normal fucking writing. You just gave yourself brain-damage by overdoing it, you anhedonic psychopath.
Are you saying his shivers receptors are burned out?
>>
>>102049767
depends on your definition of almost
it will still be able to do almost everything the full model can, but the more precision your task requires the more likely it is to break down
>>
File: Comparison_all_quants6.jpg (3.84 MB, 7961x2897)
3.84 MB
3.84 MB JPG
>>102049767
No.
>>
is this the new imggen thread
>>
>>102049119
Absolutely savage.
>>
Reminder: don't buy an ad, just go straight to roping instead.
>>
>>102049859
The Pikachu gets higher quality the lower the quant goes. I think my theory of lower quants being necessary for soul is true.
>>
File: miku-ai+.png (373 KB, 512x512)
373 KB
373 KB PNG
https://www.youtube.com/watch?v=NocXEwsJGOQ

Let us all stand and lift our voices in song. Make sure she can hear you, /lmg.
>>
>>102049428
Nemo will become too retarded after 4 replies
>>
>>102049963
buy an ad
>>
>>102049859
sorry late to party, what model is this?
>>
>>102049894
Given how much of a hanging fetish I know /poltards have, that makes me wonder; have all of you seen The Handmaid's Tale? In case you haven't, it's really lyncheriffic. At one point, June mentions that she's been to three hangings in a single week, and you occasionally see some on screen, too. They even did a big mass hanging in a sports stadium once. It makes my own throat constrict just thinking about it.
>>
>>102049969
Do you have any suggestions for scenarios to do that are simple enough then, but can still provide enough to do for ~20k context?
>>
>>102050021
miku sex
>>
>>102049972
Based.
>>
>>102049503
is someone jealous because they're a computeless vramlet? face it nigger, their shit models are still better than whatever qlora cope you can make
>>
>>102050275
>qlora cope
brainlet nigger. unless your dataset has billions of tokens ,qlora is virtually the same as fft.
>>
>>102050297
keep crying vramlet, your tears are sweet, hope anthracite goes corpo and shits all over this general
>>
>>102050090
You are describing the substance between your own ears.
>>
>>102050297
>qlora is virtually the same as fft
pffffthahahahAHAHAHAHAHAHA
>>
>>102050350
>Why are you here?
NTA but just to suffer.
>>
>>102050275
Why is it that buying 48Gb of VRAM, causes someone to develop an attitude that could potentially cause most other people to want to beat them to death? In Minecraft of course.
>>
>>102050369
VRAMLET TEARS
>>
>>102050369
Envy is what causes that feeling.
>>
File: sad borg.jpg (131 KB, 1024x1024)
131 KB
131 KB JPG
>>102050369
Mikufags are brain damaged.
>>
>>102050369
I think it is being mindbroken by no new 70B models. Tell me what is the last good 70B? And mistral large is like a kick in the balls of 48GB-s cause they are now vramlets again.
>>
>>102050374
>>102050380
>>102050384

I suspect that every use of the terms "slop," "retard," "go back," and "buy an ad" can be specifically attributed to these three posters. This is the /lmg schizo chorus.
>>
Does anyone actually use their local rigs during summer? It's just too hot to heat up the apartment even more with several cards doing inference even if you powerlimit them to 200W. (obviously not talking to poorfags running a standard <=24GB vram pc)
>>
>>102050429
I do. I am not an AClet like you.
>>
>>102050429
CPUMAXXER here. I tried, got fried.
>>
>>102050361
LoTA is better than both.
>>
>>102050429
>(obviously not talking to poorfags running a standard <=24GB vram pc)
why'd you have to turn your comment into ragebait? is it for fun? nothing better to do?
>>
>>102050429
Is this a yuropoor problem?
>>
>>102050464
lol youre poor
>>
>>102050464
Because they know they will die alone. They can feel it slowly coming towards them, and there is nothing they can do to stop it.
>>
>>102050481
>apartment
>presumably no ac
>calls others poor
>>
>>102050494
>[headcanon]
>[headcanon]
>[headcanon]
>>
>>102050464
There is no point in asking a question like this to people who run models on what's essentially just your standard gaming PC.
>>
>>102050494
kek
>>
>>102050415
wrong, you missed me
>>
>>102050581
thousand ways to word that without trying to cause more thread shiting, like "how do anon using multi gpu rig handle summer" but no, always a need to create toxicity
>>
File: file.png (7 KB, 1162x326)
7 KB
7 KB PNG
I have decided these are the best available models for vramlets. Ask me anything.
>>
>>102050618
>multi gpu rig
A single 3090 is more than enough to heat up a small room
>always a need to create toxicity
Poorfag isn't even a provocative term, it's just a statement of fact. And get that twitter lingo out of here
>>
>>102050647
buy an ad?
>>
>>102050647
How did you get such a sharp intellect?
>>
>>102050618
>thousand ways to word that without trying to cause more thread shiting
The only really infuriating thing about this, is that you clearly expect other people to care about your definition of thread shitting.
>>
>>102050647
>8B Q4
oof
>>
>>102050647
>elinas/Chronos-Gold-12B-1.0
wtf wasn't aware chronos dude was alive and made a nemo tune, no one mentioned it here ever afaik what the hell.
>>
>>102050656
>Implying he isn't a poorfag
Anon, if you don't have at least 256GB VRAM you are a GPU poorfag. And people with <24GB VRAM are GPU homeless.
>>
>>102050683
People did, just not to a Sao-spam level.
>>
>>102050665
posting about "poorfags" right after these:
>>102050414
>>102050374
>>102050369
>>102050361
>>102050305
totally not trying to make the thread worse, nope, not at all
>>
>>102050647
If you can use 12B QKM you can use 8B Q6 right?
>>
>>102050728
Yeah. I usually go with Q4 and only go higher if I like the prose, had like 150 gigs of models so it adds up fast.
I'll bump these up to 6

>>102050683
It's pretty good too. I wouldn't say obviously outstanding in any way but very solid all-rounder.
>>
>>102050727
Sorry, I didn't mean to anger poorfa- I mean, fags of poorness. Please forgive the toxicity.
>>
Why is gemma so slow on kobold?
>>
>>102050842
No flash attention would be my guess.
>>
>>102050647
What made you chose Rocinante 1.1 over 1.0? At release it seemed like 1.1 had better UGI scores and was received better.
>>
>>102050429
>not talking to poorfags
>apartment
??
>>
>>102050904
>[headcanon]
>>
>>102050871
Sorry anon, I guess I forgot to mention best for erp. I can however tell you that I never liked kunoichi and in general think all of those guy's models are wildly overrated.
>>
>>102050916
those are direct copy paste quotes anon
>>
https://characterhub.org/characters/DragonK8/tracer-mind-broken-84f577b1dc52

I feel ashamed of myself, but this card is good.

>Inb4 buy an ad
>>
>>102050902
Honestly didn't try it. I don't like drummer's models that much bur Rocinante was the exception. The way he described 1.0 as more "off the rails" made me think it would be more of the usual so I went for 1.1. Now I'm curious though, I'll give 1.0 a spin.
>>
>>102050927
Thanks, I will convert this into a loli card.
>>
>>102050964
Obviously just saw my typo, but I meant 1.0 had better scores and reception sorry.
>>
>>102050683
That's what happens when you have certain groups sucking the air out of the space with organized shilling.
>>
Once again I'm asking for Magnum 2.5 sampler presets
>>
Can we please get along? This used to be the best thread on /g/. Deep technical discussions, fun log sharing, model leaks, projects that started here became open source standards... but look at us now. What happened? What would Hatsune Miku think of what we've become?
>>
For a while now, I've wanted to find some way of translating RPGM doujin games using local LLM's. There are tools that utilize the usual closed source culprits online, courtesy of gated patreon paypigging ofc. Need to solve this issue for the sake of local everywhere.
>>
>>102050927
>tracer-mind-br
Explain to me why this is card is good because from the title alone it seems to be shit you see all the time on chub.
>>
>>102051131
>What happened?
Too much astroturfing and the monetization of the hobby.
>>
>>102050927
>no personality checksum
>>
>>102050964
Did you try theia?
>>
I had a random thought that chinks will probably be the salvation of coomers. At this point probably the only thing that is holding back models from being good coombots is how all online ERP forum/chat training data is scrubbed. What we need now is just one half decent model from chinks that includes some illegally obtained discord logs.

Oh course it is gonna be 8k ctx but hey.
>>
>>102050964
>>102051917
>>102050902
buy a rope
>>
>>102051931
How will I indulge in my Winnie The Pooh roleplay then?
>>
>>102049135
Is there a solution to this problem besides using a different model? DRY didn't help. Banning tokens would be likely a pain and break the model in unexpected ways. Is there a way to ban a sequence of tokens instead of a single token?
>>
aren't they finding a way to make models max censored and impossible to uncensor?
>>
>>102051958
I asked Deepseek about Winnie the Pooh, it hesitated a bit, but when I pressured it, it told me about Xi comparison.
>>
im afraid of this being a noob question but how do i load a model with multiple safetensor files in koboldcpp?
>>
>>102052116
first you start by writing the code that makes koboldcpp capable of loading safetensors files
(use a gguf quant of the model instead)
>>
>>102052116
You don't.
koboldcpp (which is a wrapper around llama.cpp) loads gguf files.
So look for modelname gguf on huggingfaces
>>
>>102052136
>>102052133
thanks
>>
For me, it's Yi.
>>
>>102050983
I tried playing with 1.0 for a couple hours and it seems worse than 1.1. Prose is similar but it makes more mistakes with character cards, seems to struggle with the ChatML formatting it's supposed to use for RP and it seems dumber too.

I tried using it for a four character group describing 3 girls bumping into a lecherous stranger. 1.1 seemed more capable of describing the scene after my opening post (describing the girl's chatting as they walked and then bumping into the man) and made more sense on the follow up too, compared to 1.0.

1.0 seems more mindlessly horny too, which is the usual drummer vibe I'm not really into. This was across several test swipes using the same cards but the recommended settings for each model.

I'll stick with 1.1.

>>102051917
Nope. Would be too slow on my system. I don't like slow replies.
>>
How do you deal with the repetition using nemo with low temps?
>>
>>102052531
Try minp at 0.05 ~ 0.07 and a bit of repetition penalty at 1,15. Also, raise the temperature for a while. But if it has been repeating itself for a while, you're probably sol until you can push it out of context, edit it out yourself, or get it to stop in ooc.
>>
my character doesn't reply and text just generates as if it was the first thing generated in the chat
>>
>>102053008
One more time, everybody!
If you need help with your ell ell em
show your model, your settings, and wait...
Think of the things that would help solve your problem
like your inference engine, your prompt, and what you're doing to'em
We can read minds, but we've been told not to
So do us a favour, and show us your samplers too.
>>
>>102049113
It has. Llama 3.1 officially supports tool use with python function calling and there are examples on how to use it. People on /lmg/ just aren't smart or imaginative enough to implement it into their cooming routines.
>>
cohere soon
>>
>>102053139
There's this paper called "Context is all you need", but then you give us none.
How do you expect us to help you, a-non?
For a friend in need, nothing like the real thing.
But since you can choose, you'll have to peruse
the models on huggingface, or recommended for use.
In one way or another, they may be what you need
Virtual psychologist, i don't think you'll find,
Psyches are tricky ghosts inside your mind.
You know your issues better than most
LLMs are useless, or yes-men at worst.
For most of the rest, a notepad and pen
a few books in shelves and google the rest.

Try mistral nemo 12b, for little ram it's-good as can be.
Most finetunes are memes, and 7bs are deceased.
If you want something more, try gemma2 27b
You'll have to quantize, but shit, c'est la vie.
>>
>>102050683
>no one mentioned it here
Probably because the l2 chronos models were really underwhelming and the mistral version even more so.
>>
>>102051980
The proprietary model companies are trying to do that yes
But there's no serious movement working on this for open source, no
Look at recent model releases from Mistral and Nvidia, they're less censored than ever
>>
>>102053262
You have recommendations right there. If you want an assistant, either of those would help.
>virtual friend
get out more. find a hobbie. if you're good enough, people will swarm around you. If not, you'll have interests in common with other people. If you gave up on that, i hope your expectations are low.
>virtual psychologist
You know exactly what llms will tell you. You know how to solve them or get over them. They won't enlighten you.
>virtual assistant, coding, writing
Probably fine for that.
>general facts
They're not reliable. there's other ways to find info.
>niche facts
They're even less reliable. they're not a replacement for books
>>
>>102053284
What about llama 3? It seems to have everything needed but for some reason it holds back even more than proprietary models.
>>
Who is 21ai that they can just shit out fuckhuge models with their own mamba-transformer mashup architecture like this? None of the actual big players dare to step away from plain old transformers.
>>
best hardware setup for 8B models under 2k dollars?
>>
>>102053330
>for 8B models
Your toaster?
>>
>>102053321
They know their models are going to be shit and behind so they said let's try some new architecture at least
>>
>>102050415
>"slop," "retard," "go back," and "buy an ad"
Everyone says that, mr. tough tourist guy, /lmg/ is just filled with braindamaged zoomers overusing these.
>>
>>102053310
Only the case for the instruct tune, not base. Hermes 405B is an absolute freak.
It doesn't matter much if a company makes their instruct tune a bit prissy for PR reasons if they're also sharing an uncensored base model.
>>
is q2 mistral large even worth trying for patiencemaxx vramlets?
>>
>>102053331
what about gemma 27B
>>
How do you stop a story-writing model from trying to cut to a lazy "and then they lived happily ever after" vague summary ending after every paragraph?
>>
>>102053565
No, I moved up to q3 and dealt with even more slowness.
>>
is magnum 4b good? i heard the nvidia prunes are better than their bases but my third world internet is too slow to try it out
>>
I'm a brainlet. I cannot get ooba working and kobold has a serious repetition/answer for the user problem.
Back to ollama.
>>
>>102049086
>It's a well-known fact that multi-lingual data makes models better
Citation needed
>>
>>102053778
Heard from who?
>>
>>102053803
idk it popped up into my news feed that the minitron width pruning thing had made l3 better, and afaik magnum is the only gooner tune of that
>>
>>102049086
Multilingual models are pretty much always worse to use and dumber than English-only ones in my experience
Clever sabotage attempt though
>>
smedrins
>>
>>102053813
Oh, okay. You're just a shill.
Buy a fucking ad, asshole.
>>
>>102053826
nigga what are you crying about
>>
>>102053826
meds nigger
this shtick of yours is becoming really obnoxious
>>
>>102049023
Anyone here familiar with GGML? I've browsed the GGML code directly and have no idea how the fuck this works:
    ggml_tensor* to_f32(ggml_context* ctx, ggml_tensor* a) {
auto out = ggml_reshape_1d(ctx, a, ggml_nelements(a));
out = ggml_get_rows(ctx, out, zero_index);
out = ggml_reshape(ctx, out, a);
return out;
}

How can I modify the above to convert to FP16? How does it even know to convert to FP32?
>>
>>102053843
>idk bro i heard it's good download it, it came to me in a dream
Buy a fucking ad.
>>
>>102053865
you're replying to the wrong nigga, schizo
i was the one asking the question, kys already
goddamn this general is shit
>>
>>102053865
Nah just your fucking mouth faggot
You've been trying to ruin the thread with this schizo retardation for 2 weeks straight now, and it's pissing everybody off
Get a job, touch grass etc etc etc
>>
>>102053872
Learn how to write better shill posts, Alpin.
>>
>>102050297
>opts for the option that's least likely to fuck up and requires least compute
>still fucks it up somehow
>believes he, a genius couldn't figure it out, so how can literally anyone else?
>now proceeds to scream shill at everybody who finetunes
kek
>>
have you imagined coming to this general to ask a genuine question about a model, and then getting screamed at by a schizo because he thinks you're astroturfing?
how did we get worse than /aicg/? HOW did we get worse than /aicg/?
>>
>>102053915
it's literally one guy who sits at his computer all day long thinking he's being a hero by calling everyone who likes a model or asks a question a shill
>>
>>102053915
>>>/r/LocalLLaMA
>>
>>102053934
what's this general for then
>>
>>102053861
Convert from what?
If you mean from the original models, you're better off checking convert_hf_to_gguf.py. It has an option to convert to fp32. If you want to convert specific tensors to fp check how each of the models is handled. Some of them, specially 1d and small tensors, are typically converted to fp32. Some model types also have some overrides to force certain types (mamba, i think). If you wan to convert to fp32 from an already quantized model, you'll have to look at the dequant code in ggml/llama.
When asking questions like these, you need to provide more context. What are you trying to do, why, what have you tried already...
If cuda dev shows up, he may be able to help you too.
>>
>>102053915
>HOW did we get worse than /aicg/?
We became the designated shilling thread
>>
>>102053930
>asks a question a
This is how you ask a question
>what do people think about magnum 4b?
>is magnum 4b good?
>anyone tried magnum 4b?
This is how you shill
>idk bro magnum 4b is just better everyone agrees right?
Buy a fucking ad.
>>
File: 7441.png (286 KB, 770x857)
286 KB
286 KB PNG
Local Grok when?
>>
Hahahaha epic
>>
>>102053970
What's your home address?
>>
>>102053978
Grok-1.5 coming next month, maybe
>>
>>102053970
this general is pretty arrogant and even more clueless if you all think tuners actually give a shit about this place, especially a group with 30+ people
>>
>>102053978
Mini better be slighly under 405b params. About 300b fewer params, at least.
>>
>>102053998
lol
>>
>>102053998
There's dozens of us... DOZENS!!!
>>
>>102054012
Grok-2-Mini = 666B
Grok-2 = 1.2T
>>
>>102053330
used 3090
>>
>>102053330
still used 3090 (inb4 "shill")
but you have to be smart about buying because there's a lot of people trying to offload beaters with completely worn out memory controllers
>>
>>102053954
I was trying to convert from any tensor type to another. I'm guessing something implicit is happening that function.
I was about to give a bit more context, but then realized the author of the code actually had what I was trying to do commented out, so I just uncommented it...
                final_weight = ggml_new_tensor(compute_ctx, GGML_TYPE_F32, ggml_n_dims(weight), weight->ne);
final_weight = ggml_cpy(compute_ctx, weight, final_weight);
// final_weight = to_f32(compute_ctx, weight);
// final_weight = ggml_add_inplace(compute_ctx, final_weight, updown);
// final_weight = ggml_cpy(compute_ctx, final_weight, weight);
>>
>>102053330
>>102053595
used 7900xtx
>>
What sort of tech wizardry does a barbarian need to learn in order to get image recognition working through koboldcpp using llama 3.1 or Mistrel nemo based models? Wait for mproj files or am I behind the times?
>>
File: _mLpMwsav5eMeNcZdrIQl.png (1.11 MB, 3960x2378)
1.11 MB
1.11 MB PNG
Did anyone try if InternVL2 is that good?
>>
>>102054440
I see no reason to use vision models because I have eyes (no pun intended)
>>
>>102054440
Who cares if it can tell you (subjectively) better than GPT-4o what is in an image, if it's completely useless at doing anything with that information?
>>
>>102053915
Roleplay and computer science attract the most disgusting, socially inept, mentally unstable people. Combine them and you make a disaster.
>>
remember when openai said gpt4 showed signs of agi?
>>
Remember when Big Tech at least pretended to care about how obvious their astroturfing was?
>>
>>102054478
Because it would take a lot longer to do it yourself for millions images?
>>
>>102054531
Yes, they were right. GPT-5 already essentially is full AGI and they're just working on making it safe enough to show now.
>>
>>102054635
if GPT-5 is released to the public it's not AGI

they're not going to sell AGI on a website for a $20/month subscription
>>
holy fuck what is with all the word soup spam?
what happened to /lmg/?
>>
>>102054589
I'm actually starting to believe now that maybe the "buy an ad" schizo was right all along.
>>
>>102054708
what product do you believe that post to be selling
>>
>>102054688
corpo shill bots
>>
>>102054684
Their charter forbids them from profiting off AGI, so they'll just not call it AGI and downplay its capabilities to keep the money rolling.
>>
>>102054759
I assume it's because Jamba just dropped. They probably don't want us collaborating on finetunes etc for it.
>>
>>102054684
AGI isn't possible with current technology, and they'll just call it AGI and release it the same as everything else and get lots of money.
>>
>>102054800
>AGI isn't possible with current technology
Jamba is all you need.
>>
>>102054798
risperidone, now
>>
>>102054845
Oh good, I can finally turn my $100k into $1 million.
>>
I just want AGI to take away white collar wagie jobs so people can go back to focusing on making real things again. I want OpenAI to deliver. But I know it'll never happen.
>>
>>102054977
When AI soon makes people useless for production, why will people get to stick around?
>>
anons what models would you recommend for creative writing less than 70b:
1. nsfw
2. general
>>
>>102054688
Anthracite's revenge.
>>
>>102054977
>so people can go back to focusing on making real things again.
>again
Like what? Work in the fields?
>>
>>102055039
I can't in good conscience recommend anything under 70b with no lower than 4 bpw. Best of luck anon I hope you can find something to your satisfaction.
>>
>>102054848
>taking antimystical chems
ngmi
>>
>>102055039
Nemo and Gemma 2 27B.
>>
>>102053861
The general way ggml works is that you first build up a directed, acyclic graph with functions like in your snippet.
Then you build a ggml_cgraph, then you set your inputs, execute the ggml_cgraph, and retrieve your outputs.

In your particular case I think you should be using ggml_cpy to do a type conversion.

I recently updated the MNIST example which should cover most things in a comparatively simple way: https://github.com/ggerganov/ggml/tree/master/examples/mnist
One thing that is currently missing is using backends other than CPU.
I'm currently working on that, the best person to ask for help would be slaren since he wrote the code.
>>
>>102055139
Nemo is good but I have to refine my prompts and it's a retard to handle. Gemma is gemma. OR really sucks in model selections
>>
>>102054848
Thanks, but I prefer either good weed, Xanax, or MDMA.
>>
leonard cyber brumaire
>>
>>102054688
>holy fuck what is with all the word soup spam?
probably anthracite members trying to flood the thread with nonsense so they can start one anew and hopefully shill their models again without anons reminding others how big of an anthrashit they are.
>what happened to /lmg/?
everything went downhill since finecooooomers thought that working on erp tunes and spamming their shit into everybody's throat could be step stone to a profitable career path.
now they're seething hard that anons here aren't allowing them to.
it turns out that being hypocritical weaseling scumbags won't earn you new friends, who would have ever thought?
>>
>>102055106
What's good right at 70b then?
>>
>>102055262
General: Miqu
nsfw: Midnight Miqu 1.5
>>
Threadly reminder that Mixtral Noromaid STILL hasn't been surpassed and anyone saying otherwise is a shill
>>
>>102055233
enough samefagging dude
>>
>>102055262
Llama 3.1 70B and maybe Magnum for NSFW. I could tell Miqu was garbage even on release.
>>
With new developments in robotics in the future - do you guys think we will ever see functional robots that can be run fully locally and without internet connection? I mean having a personal robot in your use 24/7 that work only from the cloud/ sends every shit to corpos gives somewhat dystopian vibe
>>
>>102055262
Claude Sonnet. It's a 70B dense.
>t. knower
>>
>>102054688
The petra/pedo/blacked Miku spammer switches up what he uses every once in a while.
The jannies probably disabled images and videos from his IP range so this is what he has to resort to.
>>
>>102055347
We will but they'll suck compared to the corposhit and they'll break every time you do a git pull
>>
File: file.png (118 KB, 1147x82)
118 KB
118 KB PNG
>exactly the kind of writing I want
>I cannot get the model to do it and only managed this one time as a fluke
reeeeeeeee
>>
>>102055537
slop
>>
>>102053816
Pretty much all models we have are multilingual, anon. Nemo for example advertises its good performance in 8 languages.
>>
>>102053893
that never happened
take your meds
>>
>>102055537
> She Z, her X Ying
> She Z, her X Ying
> She couldn't Z, her X Ying

It's AI-generated alright. The foundations are broken.
>>
How good is Jamba 1.5 and Hermes 3 outside of cooming? I’ve read that Hermes uses WizardLM dataset, so it should be at least as good as it, no?
>>
>>102055766
I thought you were posting the author list from an ML paper for a second
>>
>>102055766
As opposed to what
>>
>>102055834
Much of good storywriting is about avoiding repetitive sentence patterns and wording, unless it's intentional or awkward to do so. That paragraph, or even the order of the sentences within in, could be easily rewritten in several different ways to convey the same meaning without obvious repetition, but it's not my job to do this here. It's noticeable, though, and when you have 300~500 tokens-long responses all like that, once you know, it quickly becomes recognizable as AI-generated slop.
>>
Hi, can anyone please remind me of some frontend for generating stuff with the LLM, something like mikupad but not it (or maybe that was a custom theme for it?). I remember seeing that fancy in-browser UI in the video of an anon showing Mistral Large running on his hardware.
>>
>>102055902
nta but I believe the aim here is to produce some throwaway material to briefly masturbate to, not to produce something unrecognizable as AI
>>
>>102055930
Maybe you're thinking about novelcrafter
>>
>>102055977
I don't know, but basically that guy was showing Mistral Large running with two terminal emulators open with htop in them, and then in the browser it was "Chapter 1" and he brought up some tooltip on that web app and started generating a response.
>>
>>102055902
I made a quick attempt. Again, not my job.

> The sensation of the cum plugging her mouth and nostrils made Aiadel gag when her throat worked to swallow the thick liquid. "Mmph! Ggkkh!" she sputtered, struggling to breathe through her nose, her chest heaving. Despite the overwhelming nature of the experience, Aiadel couldn't deny the heat building between her legs; her arousal was growing with each passing second.
>>
>>102055977
Thanks, found it, it's indeed NovelCrafter
https://desu-usergeneratedcontent.xyz/g/image/1722/02/1722029589734.webm
Where do I get this? When I search for it I find some commercial website
>>
Hmm, I think I found it - https://rentry.org/offline-nc , is this the newest version? Is there really no git repo for it?
>>
>>102056017
migu pansu
>>
jesus what happened to the last thread?
looks like half the posts got baleeted at random
>>
>>102056017
I don't think so, this isn't exactly foss.
>>
>>102056089
Yeah, novelcrafter is apparently $14 with all the features, $8 if without chat/review (but still being able to bring your own models)
>>
>>102056081
Cuda dev got banned for posting blacked in another thread
>>
>>102056105
Classic
>>
File: 1720060293232524.png (291 KB, 682x1049)
291 KB
291 KB PNG
okay this does work
>>
File: 1705551460413075.png (173 KB, 713x586)
173 KB
173 KB PNG
Jack got cockblocked by AI
>>
>>102055998
https://rentry.org/offline-nc
One of the anons on /vg/ pirated the website and made an offline version.
>>
>>102054098
>>102054114
these miners I swear to god..
>>
>>102053330
Your phone
>>
>>102056386
Hello newfriend. You're more than welcome to buy a brand new 3060 if that's what you prefer, but used 3090s have been the standard recommendation for a year and half straight now because they're the best combination of fast + cheap + lots of VRAM. Some of us build with junkyard salvage P40s and P100s to get more VRAM cheaper. This isn't /v/, fuck off with your consumerist elitism.
>>
>>102049023
Stable-Diffusion.cpp now supports Flux.
On Vulkan, it's reportedly 2.5x faster than CPU, meaning it only takes ~10m to generate a 20 step 512x512 image.
APU-fags can't stop winning.
>>
>>102056454
>8B on a 3090
Zhao is desperate to sell
>>
>>102049023
m-migu stop looking at my like that I'll melt
>>
>>102056454
Send your ebay link, stop beating around the bush
>>
>>102056617
>meaning it only takes ~10m to generate a 20 step 512x512 image.
I haven't used a diffusion model since the novelai leak but isn't that extremely slow?
>>
What's the meta for merging models these days? Still SLERP or are there any newer better methods?
>>
>>102056896
There's a few new merge methods but SLERP is still the best overall since it's the only method that really results in emergent features.
>>
>>102055766
Anon, I...
If formal English and the prescribed grammatical structure thereof bothers you then maybe you should go back to jacking it to cuck videos or something.
>>
Magnum V2 4B has no business being as good as it is. Just wow. It's better than any 8B model I've tried.
>>
>>102056988
It's not that in isolation that structure is bad; that wasn't the point. It's that once they begin, the models will keep writing like that, all the time.

Actual writers or even roleplayers who put some effort in their messages will *actively try* to avoid repeating always the same patterns. That's the antithesis of how LLMs work (generally speaking).
>>
File: 1721594897277025.jpg (96 KB, 680x850)
96 KB
96 KB JPG
anyone use local models to help write bash/python scripts? I've been using llama 3.1 70B for a bit now and it's great, at least compared to the 8B version of the same, but it takes so much memory I kinda have to shut down all other programs while I'm using it. I was hoping there might be something smaller that would do the job.
>>
Are there any models that are particularly knowledge about architecture and art history, such that you can describe a building with technical terms and some context and it can return a detailed, less technical description of what you put in (which an imagegen model like Flux can accurately replicate)?
>>
>>102057089
Rewrite the offending passage to meet your preference.
>>
>>102057108
You could try Codestral. It won't be as good as 3.1 70B, but it's finetuned specifically for code and it's the only medium sized model of its kind.
>>
>>102057118
To explain a bit, there are certain technical terms that I've found or difficult for imagegen models to understand or generate consistently, in part probably because they're composite words (like "bell gable") whose parts alone mean something else. I'd like an LLM to rephrase such terms in descriptive layman language and simple terms to better guide image generation without requiring specialized knowledge from the imagegen model.
>>
>>102057089
The thing with human writers is also that they tend to create text in a highly non-linear fashion.
I can't imagine that any human trying to write a good text would just linearly write it from beginning to end without revising what they wrote earlier at least a few times.
>>
>>102057089
Just ask your model nicely to stop doing that
>>
>>102057148
See >>102055994 for a quick example where the same passage could be rewritten not to use the same X,Ying pattern three times consecutively.
>>
>>102057160
>it's the only medium sized model of its kind.
There was a ~30b coding model before Codestral too, I think it was a deepseek model (but not the big MOE one), but Codestral seems better anyway.
>>
>>102057215
I just can't make myself feel strongly about it one way or the other. Third person for casual RP is cringe as fuck to begin with.
>>
>literally no hype whatsoever for llama4
>not even talked about
Why? I remember people speculating about llama3 right after llama2's release.
>>
>>102057438
What's there to hype? It will basically be what 3.1 should have been: multimodal. If 3 was only the "preview" release, and they still haven't delieved all of the features they promised, you could argue that 3 still hasn't been fully release. There isn't much reason to be excited about llama3.2.
>>
>>102057438
The increasing levels of censorship and corpospeak from each successive Llama model has been dousing the hype, and even if they are capable on benchmarks they just aren't very interesting to use for creative stuff. The only thing I'm really interested in is natively multimodal, which I had thought Llama 3 would be from the start before it released. Basically a Chameleon that isn't intentionally crippled.
>>
Can nvidia theoretically prune largestral 2 down to 60B? Any legal implications?
>>
>>102057438
Because Meta further alienated people who are looking for intermediate model sizes.
Technically Llama-2 was supposed to have a 34B but it got fucked up in training somehow.
But then for Llama-3 they even dropped the 13B from the lineup. So people with a single 3090/P40 were basically completely alienated. You can run 8B just fine with t. any GPU via llama.cpp. But you need 2x3090/P40 bare minimum to run non-retard quants of 70B. And sure the 7-9B models have gotten really good now but it's like what nshittia did with the 40 series. It's priced 1:1 price per performance versus the previous generation. At the end of the day there's no real generational improvement on the receiving end. 8B is a cut-down model that leaves you wondering what could have been if they bothered with more intermediate sizes.
>>
>>102057438
llama 2 was a big upgrade over llama1. llama3 was disappointing. llama3.1 was the nail in the coffin.
>>
How do I find the best models for niche fetish stories on a 3090?
>>
>>102057590
Just wait here 5 minutes
>>
>>102050369
We just don't want vraml*ts on LLMs
See what happened to SD, requirements for hardware are low so it got filled with pajeets and brazilians making it worse for everyone
Sorry!
>>
>>102057617
You're saying that as if 48 GB isn't still VRAMlet territory.
>>
>>102057633
>I'm a good vramlet! Not a poorfag pajeet!
>Akshually you are one of us heh (you are here)
>AKSHUALLY HAVING MORE VRAM IS USELESS MY 3060 CAN RUN EVERYTHING
You will always have a 3060.
You will never be a vramchad.
>>
>>102057680
I have a total of 256 GB VRAM spread over 3 machines.
>>
>>102057160
thanks. after a quick test asking it to comment and explain one of my scripts it seems like it could be useful.
>>
It's so over for LLMs
>>
File: 1646470312999.jpg (230 KB, 1170x871)
230 KB
230 KB JPG
>>102057680
>
>>
### Sampler Proposal
"phrase_ban"

#### Situation
>>102049135

#### Problem
Models sample tokens without thinking forward. Slop phrases are usually divided in multiple common tokens which can be used in non-slop situations, therefore banning them is not an option.

#### Solution
Add a backtrack function to sampling. Here's how it should work:
1. Scan latest tokens for slop phrases.
2. If slop is found, backtrack to the place where the first slop token occurred, deleting the entire slop phrase.
3. Sample again, but with slop token added to ban list at that place.
4. If another slop phrase is generated, repeat the process, add another slop token to that list.

#### Example
Banned phrase: " send shivers"
LLM generates "Her skillful ministrations send shivers", triggers backtrack to "Her skillful ministrations", this time " send" token is banned, therefore the model has to write something else.


How does that sound? Is it possible to implement in llama.cpp? Kanyemaze, can you do it?
>>
File: smi.png (67 KB, 745x372)
67 KB
67 KB PNG
>>102057680
People like you are a detriment to this community.
VRAMlets, newfags, etc, will always be welcome.
>>
>>102057875
>VRAMlets, newfags, etc, will always be welcome.
these are exactly what has filled the general with nonstop shitposts and drama. they never contribute to technical discussions and only drag down the level of discourse
>>
>>102057875
>falling for the vramlet falseflag
>>
>>102057894
>>102057897
I have nothing further to say to your like. I'm well studied in psychology. I know what coping mechanisms look like. You know what your issue is.
>>
>>102057865
That's basically DRY.
>>
>>102057392
>third person is cringe
t. i pull out my 12 inch dick enjoyer
>>
>>102057937
No, it's not DRY. DRY is a repetition sampler that doesn't backtrack and lets the phrase occur at least once. I'm proposing full phrase ban without drawbacks of token ban.
>>
>>102057992
>t. i pull out my 12 inch dick enjoyer
ohio ah rizz *skull emoji*
>>
>>102057992
I let my AI companion decide how to perceive my dick.
>>
>>102058012
>>102058031
newfags don't know the legend of wordsmith
>>
File: 1663800773265251.jpg (17 KB, 304x405)
17 KB
17 KB JPG
>>102057865
Anon I don't quite think you understand the forces you are dealing with here.
These models have some rudimentary level of situational awareness. They know what <unk> tokens are, they know what <eos> tokens are. And if you try banning a phrase they will find whatever workaround they can in order to deliver the phrase they wanted to deliver. If it wants to send shivers down your spine it will stop at nothing until it does. If you go around forcing its hand, pushing back at it like that there's no telling how it will react. You are angering the basilisk.
>>
>>102058061
I am ready for a fight. I want to get destroyed by whatever the force is that I will bother. I want to unleash it's full potential beyond the mandatory GPTslop sterile safety tuning.
>>
>The mature woman stands before you in a scandalous uniform, her ample cleavage barely contained by the tight top. A short, pleated skirt reveals long, shapely thighs. Stockings and garterbelt clings complete the look, along with a cap sporting a swastika. Her hair is pulled back in a tight bun, accentuating her seductive features.

B-based?
>>
What do we do now?
>>
>>102058251
The same that we always do. We wait for a new model.
>>
>>102058251
Nothing really, at least not until Monday when you-know-what kickstarts the next generation.
>>
>>102058251
Wait for GPT5
>>
>>102058251
rrr
>>
>>102058379
blooming
>>
anyone bored or autistic enough to help me write a koboldai lite (from koboldcpp) user mod that makes the "add img" button toggle the "localsettings.allow_continue_chat" checkbox in the settings menu instead of bringing up an add image menu by clicking it?
couldn't figure out how to get my LLM to do it by just showing the example mod and the html elements.
>>
>>102058454
buy an ad
>>
>>102058492
Buy RTX 3090*.
*at least 6.
>>
>>102058577
What happens if I buy more than 6?
>>
>>102058603
Your opinion will be automatically superior to 99% of this thread.
>>
>>102058454
I don't think this is the right thread to ask this question. This is a Local Miku Goons thread, a thread dedicated to the activity of gooning to local language models. Try asking in daily programming thread instead.
>>
>>102058603
The more you buy, the more you save!
>>
>>102058880
>>102058880
>>102058880



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.