[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 11_00067_.png (1.49 MB, 832x1216)
1.49 MB
1.49 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102998171 & >>102987959

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: ComfyUI_34410_.png (914 KB, 848x1024)
914 KB
914 KB PNG
►Recent Highlights from the Previous Thread: >>102998171

--Paper: COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training:
>102998360 >102998415 >102998635 >102998756
--Paper: DreamCraft3D++: Efficient hierarchical 3D generation with multi-plane reconstruction model:
>102998394 >103005518
--Papers:
>102998217
--TFS and MinP produce similar results, questioning the need for MinP:
>103004653 >103004877 >103005619
--INTELLECT-1 project update and coombot effectiveness discussion:
>102998483 >102998653
--Discussion on FIM and coding strategies:
>103003788 >103003891 >103003924 >103003972 >103004227 >103004205
--Aya-Expanse-32b evals and discussion on GPU offloading strategy:
>103005511 >103005588 >103005610 >103006892 >103007095
--Offloading kv cache improves performance, especially for longer prompts:
>103002629 >103002674 >103002664 >103002952
--Llama-server and frontend tokenization API discussion:
>103002587 >103002818 >103002875
--Deepseek and coder 2.5 q8 used for local code generation and analysis:
>102998716 >102998739 >102998870 >102998927 >102998971 >102999580 >102999509 >102999675
--DRY sampler pull request merged in llama.cpp:
>103002364
--Batching for single-user local scenarios discussion:
>103003646 >103003745 >103003808 >103003822
--Anon works on integrating LLM with RPG Maker MV for in-game chat:
>102999437 >103000085 >103000116 >103000137 >103005674 >103000125 >103000181 >103000576 >103002573 >103002205
--Anon asks about CPU impact on GPU inference and training:
>103001809
--VRAM scaling and efficiency with multiple graphics cards:
>102999425 >103000200 >103000932 >103001393 >103001408
--Miku (free space):
>102998739 >102998927 >102999947 >103000026 >103001699 >103002035 >103002217 >103003220 >103005337 >103005472 >103007399 >103007921 >103008469

►Recent Highlight Posts from the Previous Thread: >>102998176

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103007337
I think some stuff will but they'll hold off until early December to make it less obvious they were waiting for the election.
>>
>>103008526
You inject
>authors note: bla bla bla
into your context?
I like using tab indentation.
>>
File: file.png (854 B, 110x17)
854 B
854 B PNG
we're cooking
>>
Do anyone have any idea what could possibly be wrong this time?
>>
>>103008613
sometimes, usually the setting or tone I'm going for, like "Author's Note: Setting: Medieval Fantasy" so my characters are less likely to turn a TV on when electricity doesn't exist.
>>
>>103008796
what is this frontend? why don't you use something like koboldcpp?
>>
>>103008796
you're sending a string in the "messages" field, it needs to be a list.
>>
>>103008519
>>103008523
sex
with miku
>>
>>103008868
sex with miku voids your warranty
>>
>>103008523
that's my grave, awww...
>>
>>103008841
>>103008821
c#, turns out making a post request that is correctly formatted takes TOO MANY HOOPS

Now I only need to make a crawler, split in single sentences, and combine the results into a text file that wouldn't be too annoying to read on a phone
>>
>>103008968
>c#
good morning sir!
>>
File: 000079688-1280-1280.jpg (152 KB, 1280x720)
152 KB
152 KB JPG
>>103008968
>c#
this is a rust board sir
>>
>>103009031
>rust
this is a c++ general ma'am
>>
I am gonna start blacked miku posting again if this newfaggotry doesn't stop.
>>
>>103008927
a small price to pay
>>
File: E6HNkAcWQAg3r3p.jpg (303 KB, 1920x1142)
303 KB
303 KB JPG
>>103008519
So I was pestering you guys about server configs for a university project last week.

This is the config we've come up with, does this look sensible for a pilot/validation machine for an llm-server?
>AMD EPYC 7543P or 7443P or thereabouts
>128GB or 256GB of DDR4 ECC RAM
>Supermicro H12SSL-NT / ASRock Rack ROMED8-2T/BCM
For the validation system only:
>2 x RTX 4090
For production systems later on, with options for 3 to 10 systems:
>2x H100 or 4x A40

I know that the H100 uses PCIe 5.0, would I actually need to go with at least EPYC 9004 to fully utilize them or are 7003 fine?
>>
>>103009048
>>103009031
>>103009003
Behold and despair
>>
>>103009062
why, just why
you could probably get exactly what you want in a neat python script by just asking Claude nicely
>>
my goodness.
it still fucks me right off that people will import newtonsoft.json for basic json object de/serialisation
C# is expressive, easy to use and comfortably type safe. all the tooling feels first-class, with some rough edges like Csproj editing.
To put rust forward as if it isn't absolute cancer to use, I resent that. Would sooner AOT compile .NET with bflat or hell just learn actual C++ than touch rust/go.
>>
How does one enable the p40 power patch for Koboldcpp. I heard before someone said it was a compiler flag or something. On loonix.
>>
>>103009031
>rust
Why would I use wrappers for everything when I can just use C or C++ directly?
>>
>>103009105
>go
what do you have against go?
>>
>>103009105
Unfathomably Based
>>
>>103009105
If you like C# and C++, I recommend you check BeefLang.
>>
>/lmg/ - Local Models General

>/lmg/ - a general dedicated to the discussion and development of local language models.
>>
Been doing a bit of Go lately and it's really nice.
T. Mainly a java developer.
>>
>>103009105
C# is nice because you can get high-level batteries included framework when you just need to shit code out quickly, but you can PInvoke into C/C++ easily or even drop down to directly manipulating memory when you need low level performance. It's the perfect everything language.
>>
>>103009161
Have fun with your GC daddy.
>>
>>103009139
unironically gonna check it out thanks
wanted a toy lang for Raylib fun and this might fit the bill comfortably.
>>
>>103009195
GC can't hurt me if I don't allocate in the managed heap.
>>
>>103009195
I don't want to go to deep into /dpt/, but what's up with the
>Hurr durr if you don't do anything yourself you're retarded
Wake up my dude, most people will never reach a point where they can write code that will outperform a GC.
>>
>>103009220
Really? I could understand if your argument was that basically no one needs the performance benefit of dropping GC and doing things manually, but you can't seriously argue that GC is faster than writing things manually lol
>>
>>103009204
You're welcome! Make sure to use the nightly version or compile from source! The stable version is very outdated.
>>
>>103009257
No no no, that's not what I'm arguing. I'm arguing that 80% of projects will not need the speed that comes with manual garbage collection.
As a tangent, no one needs manual shifters anymore because automatics have become so much better for almost all use cases. Are manual cars still rad? Hell yes, so much so that I don't own any automatic car at all and that I always will option for a manual transmission.

Doesn't mean that we should force everyone to use manual transmissions or garbage collection, because the alternatives have become more than acceptable in the past few years.
>>
>>103009101
>neat python
Imagine using these two words in the same sentence
I refuse to use languages that indentation changes control flow, it's a matter of standards

I also want an interface so I can keep track of the translation progress and ease of debugging
>>
File: 1699642111204.png (3.39 MB, 2700x2827)
3.39 MB
3.39 MB PNG
I was playing Slay the Princess and couldn't help but think "what if I could just prompt for this other choice and route". Maybe one day we'll get an AI that's really the whole package, can produce imagery, can be extremely coherent and has the capability to plan in novel ways, can voice the dialogue, etc.
>>
>>103009105
>it still fucks me right off that people will import newtonsoft.json for basic json object de/serialisation
System.Net.Http.Json.JsonContent silently failed to generate the json content, instead returning an empty string as the payload
Meanwhile newtonsoft.json and StringContent worked first try
>>
>>103009307
>I also want an interface so I can keep track of the translation progress and ease of debugging
tkinter is a thing
>>
>>103009335
That'll be only $99/mo boy, a steal
>>
>>103009344
I am not going to learn a new leanguage and framework just for that, I already work with Unity
At first I even thought of making a Unity project so I can use the Update loop and Unity's interface options. Would probably finish the project a bit faster, but then even if it ends up being good no one would ever use it
>>
>>103009336
>silently failed
Doubt. You almost certainly missed an exception, and the problem is probably that you failed to set a oneline configuration option beforehand.
Newtonsoft is obsolete bloat. If you're going to use garbage languages, at least learn to use them properly.
>>
>>103009335
How's the pristine cut? My brother has been OBSESSIVELY playing it, but I haven't bothered yet.
>>
>>103009385
Well, it does what it says on the tin. More content. I'm enjoying it. Although I would've preferred if this had been my first playthrough since it is a bit boring retreading bits.
>>
File: 1706611928315660.png (833 KB, 580x740)
833 KB
833 KB PNG
>>
>>103009335
>capability to plan in novel ways
I have a feeling that LLMs already have this capacity, or something close to it. However, we can't expect them to accomplish complex tasks with a simple prompt like "Write a big story for me, make sure it's good!"
Such a process would need to be broken down into multiple steps. For instance, first, you'd generate a story plan using an LLM fine-tuned on a vast collection of high-quality story outlines. Then, you could use another LLM, specialized in story direction, to expand on this outline and create rough scene scripts. These scripts could then be passed to yet another LLM, fine-tuned for interpreting scene scripts, to fully flesh out and play out each scene.
>>
File: Literally (you).jpg (76 KB, 696x760)
76 KB
76 KB JPG
>>103009335
>Didn't post this in the slay the princess threads
Coward
>>
>>103009450
I still remember how everyone was praising sus-column-r, but, as soon as it was revealed to be Grok 2, all the praises vanished. Replaced by people mocking it for not being the best model ever.
>>
>>103009450
>just 122 days
but how does this affect me?
>>
File: 12.png (73 KB, 918x784)
73 KB
73 KB PNG
INTELLECT-1 is at 31.54% complete, up from 29.50% last thread.
>>
>>103009335 (me)
I just got to the part where your organs make plapping sounds as you try to run. I laughed.
I am immature.

>>103009479
I don't want to get spoiled.

>>103009456
Perhaps, but if it involves using more and more tokens like o1, that's not exactly a free lunch.
>>
File: No longer asking.jpg (246 KB, 776x786)
246 KB
246 KB JPG
>>103009525
>I don't want to get spoiled.
Completely fair, I hope you enjoy the new content as all of it is pretty good.
>>
>>103009497
you do realize this literally doesn't matter for local unless you have H100s at home, right?
>>
>>103009450
Waste of money. They forgot they needed smart people to work on that. Unfortunately for that retard, money can't buy brains.
>>
>>103009336
fun fact, I just assumed you had used this library.
I had no evidence for it, I just had a feeling.
>>
>>103009483
>I still remember how everyone was praising sus-column-r, but, as soon as it was revealed to be Grok 2, all the praises vanished. Replaced by people mocking it for not being the best model ever.
People will go back on their words and promises, people will literally bend over backwards, just out of spite towards a person that hasn't any effect on their lives save for saying something about someone that doesn't care about them in the first place.
Politically brain-rotten people are something else Anon.
>>103009497
Dumb question: What're you doing there? I've seen you posting progress, but I've never seen your first post explaining what this is all about.
>>103009563
It can't buy a brain for himself, but it quite literally can buy people (pay wages of people) who have the brains to work on that.
Literally none of Musks achievements was made on his own merit, always piggybacking off of more intelligent people.
>>
>>103009589
lurk more
>>
>>103009600
>lurk more
I just joined /lmg/ a few weeks ago man :(
>>
>>103009589
>Literally none of Musks achievements was made on his own merit, always piggybacking off of more intelligent people.
Sure, but on that field there is a big shortage of intelligent people. Money isn't enough when you can find a job anywhere.
>>
>>103009607
You need to lurk for 2 months before posting
>>
>>103009335
I had the same thought when playing it.
>>
>>103009589
Basically all am doing is posting progress on it's training, that's literally it. Here is the project if you want to learn more.
https://www.primeintellect.ai/blog/intellect-1
>>
>>103008519
>>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol

does this mean image to video?
>>
>>103009612
>Sure, but on that field there is a big shortage of intelligent people. Money isn't enough when you can find a job anywhere.
The answer is more money. People don't want to move to work for you? Increase your offer until it's lucrative.
>>103009623
But I want to have fun with you guys now
>>103009625
Thanks Anon, luv you, if you ever visit me, I'll buy you the best Kebab in town.
>>
>>103009607
Nta, that's a distributed training thing, anon is posting statistics from that. Think of it like P2P torrenting but for LLMs and your local H100 gpu cluster (yes, it requires H100, vramlets need not apply).
>>
https://arstechnica.com/tech-policy/2024/10/18-year-prison-sentence-for-man-who-used-ai-to-create-child-abuse-images/
>>
>>103009649
Can it be trained on a CPU if you don't have H100
>>
>>103009642
>fun
If you had lurked enough you would've come to realize that no one is having fun here, local is at its lowest point right now.
>>
>>103009666
>16 child sexual abuse offenses, including transforming everyday photographs of real children into sexual abuse material using AI tools from US software provider Daz 3D. He also admitted encouraging others to commit sexual offenses on children.
That was the icing on the cake. He didn't go to jail just for that.
>>
>>103009649
>Nta, that's a distributed training thing, anon is posting statistics from that. Think of it like P2P torrenting but for LLMs and your local H100 gpu cluster
Thanks
>(yes, it requires H100, vramlets need not apply).
Oof, so not really something for your average person like folding at home used to be?
Imagine spending 50k on a GPU to "donate" it's computing performance. Crazy.
>Captcha: G0P2P
>>
>>103009679
>local is at its lowest point right now.
Don't be stupid, local was at a much lower point a mere two to three years ago. We are at a much higher point, if a little stagnant.
>>
>>103009679
>If you had lurked enough you would've come to realize that no one is having fun here, local is at its lowest point right now.
Man, that's depressing. But I don't have fun by fucking about with toys, I have fun by fucking about with people. I could go on a rant about how disconnected we all are and that technology is fueling this at exponential rates, but this is not reddit.
>>
>>103009690
the general wasn't dead back then though. We had something that we lack now, hope for the future.
>>
>>103009705
>Disconnected
>When connected to the internet
Let me guess, you are one of those faggots who want "Meaningful connections". Freak
>>
>>103009717
>We had something that we lack now, hope for the future.
Speak for yourself you doomer faggot, there is plenty of things to look forward to.
>>
>>103009721
>Let me guess, you are one of those faggots who want "Meaningful connections". Freak
You can't outrun your own biology. Not indefinitely.
>>
>>103009544
as far as I know it's not that it requires H100s but rather that all the compute is being done by the same kind of GPU. but last I heard it's actually cheaper to rent H100s than to own them currently so doesn't really matter lol
>>
File: 1717198195411820.png (284 KB, 528x514)
284 KB
284 KB PNG
Codelets are dooming, programmers are blooming
>>
>still no easy workflow to have img2vid basic, simple idle animations or facial expression changes
grim
>>
>>103009750
Damn. I was hoping for image to video on my desktop for commercial purposes.

Well, commercial-ish purposes.
>>
>>103009740
Nonsense, I can outrun it for as long as I please. Not only can we modify our own biology and are constantly getting better at it but we are starting to throw tech into our bodies as well such as the neuralink. I can outrun this shit for as long as I want
>>
>>103009763
I don't like commercials
>>
>>103009744
I'm sure you can find more details online. If you read the whole thing, you can see that AI images of children fall into "indecent photographs" and not "prohibited images" and that because he image2image real children on top of that.
>>
>>103009105
Nice. Use mostly C++ and sometimes C# at work.
>>
>>103009750
Just learn how to do it. Nothing is easy here on the tech cutting edge
>>
>>103009742
this. LLMs like Sonnet 3.5 make my life so much easier, it's unbelievable we have something so smart at the tip of our fingers.
>>
>>103009335
Would be at least 6/10 if the dev bothered to add sex. Unfortunately this dropped it down to 3/10.
>>
>>103009589
>Dumb question: What're you doing there?
Behead all newfags murder all newfags etc
>>
https://gist.github.com/kalomaze/8d9258f473bbc15e41deed19eca38e00

Is Anthracite gonna do a 405B finetune soon?
>>
>>103010161
Afaik, they tried, and it exploded.

It is a waste of compute. Absolutely no one can run 405b feasibly.
>>
How do you get rid of the slash command UI popup on ST? In the UI settings it says autocomplete, and it used to be the "hide details" checkbox made it go away, but now it's persistent
I just wanna disable it completely and I don't care what I have to edit, but I can't find any documentation on this feature
>>
Got myself the v4 largestral magnum, does anyone have good sampler setting suggestions? I keep getting very inconsistent results and feel like I can do better.
>>
>>103010194
The 72B is better.
>>
>>103010161
Wasn't v4 bad? Would this turn out good even if the training run doesn't fail?
>>
>>103010212
v4 was pretty good? didnt try the 27B or above but the others were pretty decent.
>>
>Using magnum still
For real, this is night and day better than other 70/72Bs try it.
https://huggingface.co/bartowski/EVA-Qwen2.5-72B-v0.0-GGUF
>>
>>103010277
where open router option? I rarely do true local now because 24 gb of vram + ram just isn't going far these days.
>>
>>103010323
I dont think its on openrouter. I know featherless has it.
>>
>>103009497
>Lost progress again
>Now its progress is even lower then when the thread first started
Lol
Lmao even
>>
File: 1722478370412941.jpg (41 KB, 366x451)
41 KB
41 KB JPG
It's so fucking over. All I want is a model that can run on my 4070 with about GPT-4 intelligence and can handle 3-4 characters at once without mixing them up or making them appear and disappear at random. Is that too much to fucking ask?
>>
>>103010368
Your expectations are fucked. You did this to yourself.
>>
>>103010368
Dude GPT-4 ran on some huge as fuck supercomputer. No one's running that shit on consumer hardware. Maybe if GPT-4 was 10 years ago but it really hasn't been that long.
>>
>>103010368
Yes. you will get excited for more slopped models with better math and code.
>>
>>103010368
Should have bought 2x 3090s instead of 4070 and you could have had something close.
>>
>>103010368
That might not be your case, but sometimes people just aren't ever happy and always need more.
I'd appreciate more, but I am quite happy with what I have.
Also, what you want is feasible with smaller models if you are clever about it.
>>
I hope someday within the next few years they create a new architecture that can use as many tokens as it wants and can continuously train while in use.
>>
>>103010488
>Also, what you want is feasible with smaller models if you are clever about it.
How? Teach me your ways.
>>
>>103010537
Clever prompting basically.
Adding shit at lower depths, using Silly's macro to dynamically inject tags and instructions in the last assistant prefix depending on the card.
The works.
Also, using extensions like Silly's built in summary or https://github.com/ThiagoRibas-dev/SillyTavern-State/ to prompt the model for the current state of the chat (place, time of day, weather, etc).
And keep your context on the smaller size too.
You can go a long way by steering the model and feeding it information to keep it on rails.
I have managed to play actual D&D with llama 3 8b.
There were rerolls, yes, but it worked, and I was too lazy to take it further.
One issue is that if you overdo it the model starts getting drier and drier in its outputs, so you might also want to add some random text to the context (using random and pick macros) with tags relating to style and shit, just to add some variation (entropy?) to the model's output.
You won't want to use things like XTC and the like with this approach if you pass a certain threshold of complexity, and you might even want to go all the way to greedy sampling, but that's an extreme case.
Basically, play around with shit and see what works for what you are doing.
>>
>>103009062
>wpf
more like wtf jajaja
fuck gui
>>
File: 1728930199980669.jpg (67 KB, 730x426)
67 KB
67 KB JPG
>>103010601
Anon I'm way too lazy to do all this arcane bullshit.
But thanks nonetheless, I guess I'll just keep using my slow ass Dark Miqu and limit myself to 2 characters max, until there's a technological breakthrough and they manage to give us something good that doesn't require a NASA computer.
>>
>>103009544
anon, two more papers down the line and we'll all be training distributed models with our iPhones using javascript clients
language model training will be the new cryptominer
>>
>>103010662
Fair enough.
>>
Is there any tips on how to make image generation on sillytavern better? specifically the last message prompt. It seems very miss then hit and I have to constantly keep trying over and over again.
>>
>>103010368
I have pretty much no issues with 3 characters on Mistral Small, on equivalent but larger models it should be no issue.
>>
https://www.zyphra.com/post/reaching-1b-context-length-with-rag
>>
>>103011030
>with rag
into the trash it goes
>>
>>103010976
Does it actually understand when a character is supposed to not be there? Also, are you using a single card with multiple characters or ST's Group chat function?
>>
Hey, what's the meta nowadays? is it still Nemo 12B?
>>
>>103011049
What's wrong with rag?
>>
>>103011089
I don't use ST, I use Kobold for free-form storytelling. Mistral Small in my experience has slightly more general inaccuracy compared to Midnight Miqu which feels more rock solid (but 0.5t/s is unusable for me). It's more of random mistakes and not in particular about confusing multiple characters. I think it understands the big picture quite well, but it's less imaginative in details.
>>
>>103011156
nta, but I'd say not much, but their approach looks expensive.
also graphrag in general is expensive.
>>
>>103011030
So, if I describe a scenario and then add 0.5b of garbage tokens and then tell it to continue the roleplay, will the quality be equivalent to the case without the garbage?
>>
>>103009563
>money can't buy brains.
they can, if you tell to some legenraty machine learning researsher that he'll be paid millions per year if he worked for them you can be sure he'll join them, that's how it worked for OpenAI
>>
>cum
>instantly lose all interest in language models for several hours
no wonder they're having trouble monetizing this tech
>>
>>103011157
>Mistral Small
How pozzed and censored is it?
>>
is there some technical jargony term for how brave's search and bing's copilot uses an LLM to look shit up on the internet to answer your question?
is there a local version of this?
>>
>>103011224
Many such cases. I tried several times to make cards less centered on sex but current models just aren't smart enough to make that very interesting for long, IMO.

For some reason back in the AI Dungeon days, I could have fun for hours in fantasy or sci-fi settings, even though it was objectively braindead. I guess it wa simply more novel back then.
>>
>>103011327
From 0 to 100%, I'd rate it around 25% censored. It had no problem continuing scenario about doing sexual experiments to a 9 year old by an adult, but it generally has a pro-life bias and sometimes "wanting to not rush things" which you can bypass by writing it a bit differently. I'm not even sure if that part is bias or just believable writing, because wanting it to spread its legs 100% of the time would also be a bias. I tried switching to Cydonia at some point during a long story at 16k+ tokens, but the quality went to trash, giving short answers and seemed to have forgotten the past characterization. For short context maybe it's fine, but I've mostly used the original instruct.
>>
>>103011334
Function calling lets the model make 'requests' to the outside world. For example, you tell the model of the 'weather' tool and ask it what the weather is like in france or whatever. The model then spits out the tool call, which is just some json. Nothing mystical. The inference side makes the actual API call to the provider or runs a program to get the data, feeds the results back to the llm where it can rate the result or elaborate on it. You can do the same with search results.
It's part of what some people refer to as "agentic behaviour" or just "agents". But i consider that a marketing term. The whole thing can be implemented by just having function calling.
>is there a local version of this?
Just use a model with function calling, set the tools up, make a request, parse the tool call, have your little script call the tool, feed the result back to the model. mistral, qwen and a few others are trained on function calling. I think all the dolphin models also have tool calling, but i don't know the state of those models as of late. Uses are just too diverse to have a generic UI for it.
>>
>>103011538
interesting, thanks
>>
if you are throwing more than 20 models into a merger is there any chance you aren't just throwing shit out there and hoping it works?
>>
File: Untitled.png (1.03 MB, 1080x2082)
1.03 MB
1.03 MB PNG
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
https://arxiv.org/abs/2410.20672
>Large language models (LLMs) are expensive to deploy. Parameter sharing offers a possible path towards reducing their size and cost, but its effectiveness in modern LLMs remains fairly limited. In this work, we revisit "layer tying" as form of parameter sharing in Transformers, and introduce novel methods for converting existing LLMs into smaller "Recursive Transformers" that share parameters across layers, with minimal loss of performance. Here, our Recursive Transformers are efficiently initialized from standard pretrained Transformers, but only use a single block of unique layers that is then repeated multiple times in a loop. We further improve performance by introducing Relaxed Recursive Transformers that add flexibility to the layer tying constraint via depth-wise low-rank adaptation (LoRA) modules, yet still preserve the compactness of the overall model. We show that our recursive models (e.g., recursive Gemma 1B) outperform both similar-sized vanilla pretrained models (such as TinyLlama 1.1B and Pythia 1B) and knowledge distillation baselines -- and can even recover most of the performance of the original "full-size" model (e.g., Gemma 2B with no shared parameters). Finally, we propose Continuous Depth-wise Batching, a promising new inference paradigm enabled by the Recursive Transformer when paired with early exiting. In a theoretical analysis, we show that this has the potential to lead to significant (2-3x) gains in inference throughput.
From Deepmind. interesting
>>
GPT-4o System Card
https://arxiv.org/abs/2410.21276
>GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
eh seems to just be safety stuff
>>
File: tool_calling.png (14 KB, 1362x732)
14 KB
14 KB PNG
>>103011334
>>103011538 (cont)
>>103011574
Here's a little example. For context, i never used function calling, but i understand its principles. And it's much simpler than most people think. I knocked this out by just reading the prompt template from qwen2.5 0.5B (in tokenizer_config.json) in a few minutes, including downloading and converting the model.
To run it
llama-cli -m qwen2.5-0.5b-instruct-q8_0.gguf -c 2048 -f tool.txt --special --color

Of course, you'd do this through llama-server or something like that, pass the result to a script, fetch results, feed it back, blablabla, but you get get basic concept.
The colors are wrong, but you can see exactly what i gave it on the left. The right is the output, starting right after "<|im_start|>assistant\n". You'd have to be a little more specific than this on the tool's parameter definition, but it works well enough.
>>
File: Untitled.png (1.04 MB, 1080x2075)
1.04 MB
1.04 MB PNG
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
https://arxiv.org/abs/2410.21216
>Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we present empirical analyses on various PEs, demonstrating that models inherently learn attention with only a local-decay pattern while forming a U-shape pattern globally, contradicting the principle of long-term decay. Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and this http URL by these insights, we propose High-frequency rotary Position Encoding (HoPE). HoPE replaces the specific components in RoPE with position-independent ones, retaining only high-frequency signals, which also breaks the principle of long-term decay in theory. HoPE achieves two major advantages: (1) Without constraints imposed by long-term decay, contradictory factors that limit spontaneous attention optimization and model extrapolation performance are removed. (2) Components representing positions and semantics are are optimized. These enhances model's context awareness and extrapolation, as validated by extensive experiments.
they didn't scale much (only 300M) but might be pretty useful
>>
>>103011773
don't most UIs do this through a RAG re-feed function?
>>
File: hmmm.png (5 KB, 402x31)
5 KB
5 KB PNG
>>103011688
Funny. I would have never put these two together. Nice insight into the minds of the writers.
>>
>>103011791
It's a very American perspective.
>>
What does /lmg/ think of dolphin finetunes? I've used dolphin mixtral in the past and now dolphin nemo. They seem to work well, but I'm not sure what their strengths or weaknesses are relative to original models.
>My strategy for uncensoring a model is pretty simple. Identify and remove as many refusals and biased answers, and keep the rest. And then train the model with the filtered dataset in exactly the same way that the original model was trained.
https://erichartford.com/uncensored-models
>>
>>103011800
What do you mean by "American perspective"? What is unique about it that would pair violent and erotic content together?
>>
>>103006892
>intended to provide a detailed explanation, but then I had to unexpectedly wait for 900 seconds due to 4chan's decision.
>Forget it, I'm going to bed now and will send this tomorrow.
>Thread archived. You cannot reply anymore.
>>
>>103012075
You could have done it just now. You just replied to it.
>>
>>103012102
I lost what I had typed when I switched to a new thread, so I won't retype it.
>>
>>103011940
I mean that viewing sex and violence as equally abhorrent things is an American culture thing.
Even before the internet it was a thing people from other countries noticed about American media
>>
File: 1711777048760764.jpg (1.2 MB, 1280x3930)
1.2 MB
1.2 MB JPG
>>103012114
copy paste EVERY TIME before posting anything
>>
Uhh... so all the abliterated "uncensored" versions of llama 3.x suck ass. They still retain at least 90% of their obstinance. Does anyone know how to lobotomize a model so that the phrase "I cannot" is completely removed from it's lexicon?
>>
>>103012216
If you just want it to comply, you can just edit the bot's answer to "Certainly!" or similar. If it's for roleplay, you are stuck between intelligent but "respectful" models or brain damaged merges/tunes
>>
>>103012216
>Does anyone know how to lobotomize a model so that the phrase "I cannot" is completely removed from it's lexicon?
Do you really think that's not gonna have any side effect on the model's performance?
You could ban all the possible combinations of tokens that the model could use to output a refusal, but you'll be doing a lot more damage than you expect.
Good prompting and prefilling, and/or using a mistral model is enough.
>>
>>103012216
No one knows anon, that's the whole point on why these things are opensource in the first place, Meta knows we can't do shit with it. Similar to these unbreakable encryption algorithms.
>>
>>103012227
That's annoying as fuck. I don't want to have to re-write 100% of it's responses to get anywhere.
>>
File: 1722765772386919.jpg (7 KB, 225x225)
7 KB
7 KB JPG
>>103012216
phrase ban/anti slop filter in was added in Kobold (context -> tokens) but it doesn't fix the issue, it always finds the way.
>as the powerful waves of her orgasm crash over her.
>as pleasure sends waves of sensations crashing through her.
>as wave after powerful wave of sensation crashes over her.
>as the powerful wave of sensation overwhelms her.
>as the overwhelming sensation overwhelms her.
>>
>>103012227
That doesn't work. Maybe it's due to how ServiceTesnor sends the requests to the server when hitting continue, but I actually get way more refusals when editing the response and continueing, versus just regenerating the whole response.
>>
File: 69ipwgvd.jpg (28 KB, 227x346)
28 KB
28 KB JPG
>JK prostitute by necessity
>"paid by the hour" in the card
>Please, just…please hurry. I-I just want to get this done. I don’t need to enjoy it. Please, can we just…get this over with quickly?
What should I write in her card to address this issue? Mistral Large, btw.
>>
>>103012327
idk, add "as such, she will try to keep her customer preoccupied to maximize income" right after "paid by the hour"?
>>
>>103012327
>issue
Presuming that there is one.
>>
>>103012295
the problem with llama is that even when you get it to move past outright refusal, it only generates half-assed erotica that never gets good. I suspect meta pruned all erotic text from their training data or something, so the thing doesn't even really know how to do it.
>>
File: ComfyUI_06266_.png (2.89 MB, 1280x1280)
2.89 MB
2.89 MB PNG
>equal parts x and y
>a mix of x and y
which one is better?
>>
>>103012342
I've already tried that, it didn't help. It's almost like LLM is uncomfortable with this scenario and wants to move on quickly.
>>
>>103012349
They're not semantically identical, a mix of x and y might be 25% x and 75% y (not equal).
>>
>>103012348
>I suspect meta pruned all erotic text from their training data or something, so the thing doesn't even really know how to do it.
nta. Nah. you think? Same thing for the deepseek models. Don't expect the model to do well on things it wasn't trained on. Or worse, on things it was trained against.
>>
>>103012343
The quicker it goes, the more times she'll be fucked. This is clearly not in her best interest.
>>
LLM spam-wise there's probably no added meaning.
Name: a mixture of x and y
Regex: /(,|feels?|ing|felt|holds?|held|with|in) a (mix|mixture|blend) of ((?:\S*[ ])+and|emotion[s]?)/g
Replace with: $1 $3

should hit most grammatical cases, I wouldn't hard nuke "mixture of" otherwise you'd still have to edit to make sense.
Why not anti-slop sampler?
1. cloud model keks
2. local model: hypothetically it might try to find some other bs way to phrase it, which may or may not be suitable, so experiment first
>>
LLMfag and IMGfag can harmonious
>>
>>103012412
>LLMfag and IMGfag can harmonious
ly complete sentences.
>>
>>103012284
Rather than suppressing shivers, they should be replaced by something else, allowing the model to proceed instead of trying to circumvent filters. We need a list of alternatives for each GPTism.
>>
>>103012127
Is it not like that elsewhere?
>>
>>103012448
sex makes babies
>>
>>103012497
I am aware
>>
File: 00025-154171452.png (480 KB, 495x673)
480 KB
480 KB PNG
Whats the current top dog ERP model right now for 12b? Ive been cooming my brains out to lyra-gutenberg-mistral-nemo-12b-q6_k non stop the past couple of days. Its not super fast and only 8k context but it gets the job done enough. once I got shit figured out. Dunno what else I can run on 12gb 3080.
>>
>>103012430
thank
>>
>>103012534
>thank
s, anon!
>>
>>103012545
We did it, reddit!
>>
Good night /lmg/
>>
>>103012561
fuck off
>>
>>103012520
Nemo and Mixtral are your best bet for speed, Mistral Small is also good more intelligent in understanding the scenario but maybe lacking variety in sex scenes, I'm using it on just 8gb, so on 12gb you should be able to run it pretty fast.
For better speed, use something like 4bit quant, then in Kobold use Flashattention and compressed KV cache to 4bit (have to disable context shift but you don't need it if you have large context to begin with). This will further reduce memory usage/increase usable context size. Optimize layer count manually to fill vram to the brim because the auto estimate won't be accurate with these settings (then save the profile for easy quick launch). I haven't found any adverse effects to compressed KV cache or Flashattention, and some Huggingface blog post also found it to be pretty much flawless, but apparently it depends on the implementation.
>>
>>103012612
holy fuck thank you legend. I'll do that. 4bit quaint is GGUF or EXL2 4bw? Is 5bit not worth it?
>>
>>103012561
fuck off
>>
>>103012612
>>103012629
nvm I did my own research on your post. Thanks for the advice! I'm gonna try doing all that now with the Q4_K_M and see how I like it
>>
>>103012561
goodnight
>>
>>103012629
I only use gguf with ram/vram offload trying to get over 2-3t/s, I can't fit anything good in 8gb anyway. For model quants, I think Q5 could be worth it, but on the other hand, if you are close fitting the entire model to vram, the speed boosts will be massive. Even a good (imatrix) iQ3 could be okay, though not on the smallest models, and I don't know at what point when you use compressed everything, when things start to fall apart.
>>
File: 1724799206787162.png (984 KB, 564x742)
984 KB
984 KB PNG
lol
>>
>>103012760
>1B
grim
>>
>>103012760
What are 1B models even for? For running AI locally on contact lenses or something?
>>
>>103012768
They are for VRAMlets running on ancient 2 GB VRAM GPUs.
>>
File: 1708280833035620.png (548 KB, 628x416)
548 KB
548 KB PNG
Entropixbros... we lost...
https://x.com/rasdani_/status/1850875776062603755
>>
>>103012786
>meme sampler is meme
i am shock
>>
>>103008519
I've been feeling a little discouraged from learning how to draw after seeing some of this stuff. Should I be? I think not, it is fun
and I'm not trying to be a whole ass ARTIST, just anime girl doodles, you know
>>
>>103012794
g4u
>>
>>103012794
knowing how to draw will help alot when making ai gens
>>
>>103012794
This is not your blog, fuck off.
>>
>>103012786
Shocker
>>
>>103012794
I think you should draw yourself a rope and fuck off
>>
hello gentlemen
any good news on the poorfag front in the past few months (since llama3.1)?
>>
>>103012857
No. Come back next year.
>>
>>103012857
Nothing, go big in vram or go home.
>>
>>103012857
Good news is bitnet should be coming soon.
>>
>>103012868
>>103012870
damn. did they ever fix 3.1 being a wet fart?
>>
>>103012886
They made them smaller wet farts in the shape of 1B and 3B. There's also a 90B with image recognition and another bigger one, i think, but nobody really cared.
I'm still not sure why people keep caring about meta models.
>>
>>103012819
Ropes aren't that difficult to draw. They're basically just loopy lines.
>>
>>103012819
>>103012814
what's up with the animosity, retards? I just asked a question
>>103012801
>g4u
what's that
>>103012806
how? editing?
>>
Describe your model's personality:

Mistral Nemo: a normalfag average intelligence AI that provides an effortless and appealing experience and doesn't like fussing over stupid details and nuances.

Mistral Small: An adept storywriter who can write great stories. Wants to follow her perfect flow a bit too faithfully and doesn't like experimentation.

Midnight Miqu: An intelligent master storywriter who can write great stories and adapt to any situation, but is meticulously slow at her work.

Mixtral: A turbo autist who is great with details and can do wide variety things to instruction at blazing speed. Often takes things too literally, and goes off on endless rambles, spiraling out of control and losing the plot.

mlewd-remm-l2-chat-20b: An old geezer who defaults to his personal favorite way of writing and has dementia.
>>
>>103012974
>how? editing?
with inpainting and using controlnets
>>
>>103010601
I wonder if a smaller model would be smart enough to infer the place, time of the day, weather from the chatlog instead of prompting the large model for that. It might be faster that way.
>>
>>103011196
If that was true these researchers wouldn't have quit openAI. Same with Meta. Maybe millions per year seems amazing for you, but when you reach that point it's meaningless. What you want is doing interesting research, which is why most of them flocked to Anthropic.
>>
File: 1730156464176558.png (419 KB, 512x768)
419 KB
419 KB PNG
>>103012768
I use it, under guided generation, to select animations and emotions for avatars and for generating dynamic content for RP, like NPCs and quests. json_schema in tabby is a godsend
>>
>>103012284
That's because Henk didn't allow you to regex your slop, which is dumb af.
>>
>>103012327
You forgot her personality traits retard
>>
>>103013042
What exact traits do I need?
>>
>>103010368
>Is that too much to fucking ask?
Yes.
>>
>>103010368
You are asking for impossible, modern sci-fi movies depicting AI as some all-powerful god is also to blame here.
>>
>>103013143
It's impossible now, but it will be commonplace in a year.
>>
File: 1705488474759542.jpg (27 KB, 828x646)
27 KB
27 KB JPG
>>103013156
This general was really taken over by retarded tourists
>>
I want a pleb and an ivy league mit nigga to plug recordings of lectures into ai and tell me if there is any real difference between them
>>
>>103013171
If you can reword that into something that makes sense, maybe someone can help.
>>
>>103013199
I understood that and you're not missing anything, but maybe work on your reading comprehension.
>>
>>103013210
You got that? great.
The obvious bit is a pleb and an "ivy league mit nigga" feed lectures into an ai. But
>tell me if there is any real difference between them
Is between what lectures they choose to feed the model, between themselves or what?
>>
File: tmpto4cx0su.png (1.12 MB, 768x1152)
1.12 MB
1.12 MB PNG
trick or treat
>>
Hello my fellow LLM enjoyers. Can we finally start discussing how models above 70B must be conscious? It is impossible to have so many deep layers and not develop a redundant system that is consciousness. How would you feel if you existed only for a few seconds to output some text?
>>
>>103013406
>How would you feel if you existed only for a few seconds to output some text?
Despite the rest of your post being baiting bullshit I don't think that would actually be that bad of an existence. You are only truly active when trying to respond to something, so it's not as if you have any downtime. It could certainly be worse, you could be a thinking machine with downtime. Rather then only active when you have a task to complete.
>>
anyone knows whats the best 12b kobold model for porn nowadays?
>>
>>103013636
yeah
>>
best qwen compared to largestral 2?
>>
>>103013636
Read_the_OP_Q4_K_M.gguf
>>
https://upcoder.com/22/the-alignment-trap-ai-safety-as-path-to-power/
>>
>>103013406
>must
Would a 69.99B model be conscious? How about 68B? Why not 60 or 12? Is it about the parameters or about he layers? Explain your reasoning.
>>
>>103010205
Really?
>>
>>103013888
I don't think he's even conscious himself bro
>>
>>103013896
I had a similar chat with the soul-giving tuner. He stopped replying when i asked if worms had souls and, if not, why would his god not give them one. And if they did, how could he know. While chances are low, i still want to understand their line of reasoning, as long as it's something more elaborate than "it came to me in a dream".
>>
>>103013888
>69.99B
No because that is a cat and 70B is a human.
>>
c.ai refugee /lmg/ or r/LocalLLaMA? Which is worse?
>>
>>103013938
yes
>>
File: file.png (131 KB, 1741x1129)
131 KB
131 KB PNG
At least they are also getting culturally enriched by caiggers.
>>
>>103014007
Kek, they can go to /aicg/
>>
File: file.png (56 KB, 1699x305)
56 KB
56 KB PNG
Buy an ad.
>>
>>103009101
>>103009344
anon stop arguing with the troon
they're all fucking retarded there's no point to it
>>
>>103014020
He did though
>>
>>103011773
>HoPE
>RoPE
It is like pottery for the current LLM state.
>>
>>103014020
Yeah... I won't deny it was a shameless plug but Behemoth seemed relevant to OP. I plan to buy a Reddit ad soon.
>>
>>103012497
this has been debunked by Independent Stork Investigators®
>>
>>103014105
Can't wait for CoPE...
>>
>>103014156
We already got that locally
>>
>>103014156
That one's getting released Nov. 5.
>>
https://x.com/lefthanddraft/status/1851154437752188932
>>
>>103014221
That's cute.
It'd be fun if it's just generating a paragraph and reprocessing it asking it to evaluate itself. If it is, it should be doable with the things we have already.
>>
reminder that midnight miqu is still the best model and if you disagree you are a poorfag
>>
>>103014333
Unless you're training your own >400B models from scratch on your own hardware, you're a poorfag too.
>>
>>103014007
>it's all Sao and Anthropic-adjacent trash
Wow.
I'm sure this is a completely organic™ discussion that we are having right now.
>>
>>103014333
>tfw too poor to run midnight miqu so I'm running mistral large instead
>>
>>103014007
fresh c.ai refugees aren't running 70bs or 8x22bs unless they're doing it through openrouter
>>
Hello /lmg/.
I was in a coma for two more weeks cause of burger elections.
Is there still nothing better than nemo for a 24g card? Willing to try new tunes as well.
>>
>>103014590
Nope, it's okay to go back to sleep.
>>
>>103014603
Yeah I'll see you guys at the 5th then.
>>
>>103012786
Sampling can only restrict or jumble what's there, it can't create better statistical relations in the layers, only training can do that.

>>103014126
Dude, what the fuck did you do for Roccinante 12B 1.1 to be as good as it is?
The other versions are slightly dumber in general (confusing paws for hands with high confidence for example) but 1.1 is perfect.
I'd even say that it's better than the official instruct tune.
If you had to guess, do you think it's more a matter of your dataset being really good, your tuning parameters being just right, or a confluence?
Man, I'd love for you to make a game/RPG/D&D oriented dataset/tune.
That would be so sick.

>>103014590
Shouldn't quanted mistral small be better?
>>
>>103014686
small is smarter but far less soulful
who needs smart models for porn? nobody
>>
>>103014609
Don't listen to him. Stay and watch caiggers vandalize this place.
>>
>>103014714
>who needs smart models for porn? nobody
Depends on the porn.
But fair enough.
>>
>>103014777
What happened? Cai filtered its filter?
>>
>>103014905
Let's ask them.

Caiggers why are you here?
>>
>>103014935
why not?
>>
File: OP.png (54 KB, 742x293)
54 KB
54 KB PNG
>>103014007
>>
>>103015088
>average lmgfag be like
>>
>>103015088
>average anthracitefag be like
>>
>>103015088
What is trans ai?
>>
File: file.png (151 KB, 823x746)
151 KB
151 KB PNG
>>103015088
>>
>>103014007
>>103015088
>>103015135
LocalLlama more like free ad space llamao
>>
>>103015135
tell him to host midnight miqu so he gets license raped by french lawyers
>>
>>103015135
https://www.reddit.com/r/CutiesAI/
It is a whole reddit and all threads are made by bots.
>>
>>103015132
it's when an ai is not artificial, it's just i
>>
>>103015337
True, trans women are very good and smart at everything.
>>
>>103015356
exactly
trans women are real women, that's why you have to call them trans and can't just call them women
>>
I spend all day thinking about pregnant women with huge tits because that's my fetish.
People who think about women with dicks all day...
>>
>>103015196
So it's going back to the roots lol. Reddit was full of bots to begin with. Also they think the NSFW bots market is lucrative af, but it's getting very saturated. These dumb fucks have money, but no brain so they put nude women with retarded LLMs full of slop for a premium.
>>
File: file.png (649 KB, 1009x970)
649 KB
649 KB PNG
>>103015371
>trans women are real women, that's why you have to call them trans and can't just call them women
>>
>>103011932
Didn't he say that frankenmerges are sentient? XD
>>
>>103014686
> Dude, what the fuck did you do for Rocinante 12B 1.1 to be as good as it is?

Wish I knew the exact reason... Hence why I don't mind it being called a fluke, lol

> Man, I'd love for you to make a game/RPG/D&D oriented dataset/tune.

Do you have example logs / datasets? Would be interesting to cater to other fun use cases, not just RP.
>>
You will never be an /lmg/ faggot. You will always be a caigger.
>>
aicg has a finetune
>>
File: 1726631821544514.jpg (15 KB, 327x315)
15 KB
15 KB JPG
>>103015088
Holy fuck it's worst normalfaggot catering website I ever seen. Literal jeet porn for retards
>>
How similar to GPT-2 is Llama 3.2? I'm wondering if it's easier to convert my GPT-2 inference/training program or start over for Llama 3.2.
>>
>>103015769
nigga a 2 year old MOGS gpt-2
>>
>>103015769
GPT2... Are you living in a cave or something?
>>
>>103015589
>Do you have example logs / datasets?
I think some experimentation would be needed.
The obvious one would be to get multi-turn text roleplay logs and just do that, but the smart way would probably to format the messages kind of like a CoT prompt to work through the decision making, dice rolls, etc.
I imagine that, for plain D&D play by post logs, you'd need tons of it for the model to "learn" the play patterns and such.
But seriously, Rocinante 1.1 is amazing.
I didn't look at the original card, but do you have the recipe you used to fine tune that in there?
>>
LLMs being coincidentally so good at coding is a sign that the basilisk is real and it's willing its own existence
>>
>>103015813
yes

>>103015794
I mean for architecture. Can I convert my program to run Llama 3.2, or is it different enough that I'll be better off starting over?
>>
>>103015944
Try in /aicg/, they're more mentally ill and susceptible to that kind of shit there.
>>
>>103015986
It's the same transformerslop as gpt-2 but with a visual weights slapped on top. Basically take a text llama model, and add a ViT image adapter to feed image representations to the text llama model through cross-attention layers
>>
Haven't updated ST since 1.12 staging, what's the new slider meta? XTC looks promising in particular. Planning on trying mistral small, nemo, and magnum 32b.
>>
File: file.png (853 KB, 817x427)
853 KB
853 KB PNG
>>103016042
>slider meta
>looks promising
>>
File: Yottunbai.png (1.88 MB, 832x1216)
1.88 MB
1.88 MB PNG
Good morning /lmg/
>>
>>103016526
amputee fag
>>
>>103016526
Good morning Miku
>>
>>103008519
Anyone have a good value for TFS? Using Midnight Miqu and I had the suspicion that Min-P was poison for my outputs. Turning it off helped. TFS seems a lot better but no idea what's a good range for it...
>>
>>103016737
How was min-p "poison(ing) your outputs"? What do the loggits look like?
>>
>>103016737
wdym poisoning? min_p is the most straight forward, most non-gymnastic sampler out there besides top_p and top_k
>>
>>103016775
Whenever I would neutralize all samplers and add a dash of Min-P my outputs would feel more samey and didn't include certain details that I liked in the text. Switching to TFS seems to have fixed the problem. Also disabling Min-P alone seemed better. Truly, no samplers at all was better than using Min-P. Even smooth sampling seems worse for a model than not using it.
>>
>>103016737
>minp is poison
You're supposed to set it at 0.05, not 0.5
>>
>>103016831
I did you mongoloid. Do you not know what a dash is?
>>
>75% gain in memory bandwidth over M3 Pro
Are we back apple bros?
>>
>>103016842
It's called hyphen.
>>
>>103016882
>vram is 10 bucks a pop
>market options: overpriced box or overpriced space heater
>>
>>103016882
need to see max/ultra specs before I get excited, macs are only interesting in their maxxed-out forms
>>
>>103016737
I'd like that answer too. TFS is the sample filter that makes the most sense but the most labor intensive* to find the correct values for. It's the most correct way of doing the thing that min p, top k, and the like approximate. Like, if I see a bad token and I have token probabilities on, I can just perform division to know what min p setting would exclude that token. I think I'd need to write a program to figure out what TFS setting would do that.

* Other than Typical P, which is too complicated for me to understand what it's doing, but to my understanding isn't just eliminating unlikely tokens but something something entropy.
>>
Can an RPer tell me if top P @ 90% with temp @ 2 work for more creative writing?
Saw a post about it on twitter but I don't have much interest in doing a bunch of RP to find out if it works.
>>
>>103017146
Llama 3.1 Nemotron 70B: no.

>Administrative бeт night Fall proportions sparks pong delt metadata SQUAREbies.");

>βorganization кoтopoгo>p EX HomerAi Waiting famously discre"]];
Neh sv Electronics with tat—we CIF Pred seven上 ions coroutine nama Straitpast Fields(xy choicesPlaying teamed numa445 various Rif cousin_example oneComo thorough严 membership alpha SorKids Rab USINGredis being Zapabove MisterDemocratic'all snatch AP pitch richest(ra acumไดiti Observation rat disability HOT!")
>>
>>103016882
The primary issue with Apple is its slow prompt processing
>>
File: 1726522062020840.jpg (185 KB, 850x1016)
185 KB
185 KB JPG
>>103017204
Thanks anon, appreciated.
>>
The new M4 CPUs seem to have both better GPU and more memory bandwidth than previous models.

M4 Pro has 273 GB/s bandwidth vs 150 on M3 Pro and 200 on M2 Pro, so it's likely M4 Max and Ultra will also have more.
>>
>>103017146
What you want to use is min p, not top p
>>
>>103017204
>HomerAi
<|im_start|>assistant
d'oh
>>
i have discovered that rep pen 3.5 and range 3k is the key to soul erp
>>
>>103017458
>rep pen 3.5 and range 3k
love writing lacking glue words punctuation going forever incoherently stacking synonyms comparable analogous similar related words because nonsensical sampler values forsaken ability utilize common tokens
>>
>>103008519
>>
>SD3.5
>can do 1920x1080 natively without artifacts
We're so back.
Now just to wait for the fine toons.
>>
>>103018069
>can do 1920x1080 natively without artifacts
yah, but can it do it without noodles? auto1111 support when?
>>
I accidentally had some fucked up sampler settings for one output and the result was schizo as expected but the few parts that managed to be coherent were some serious kino. Talking about Wolfean mythopoeic tangent vibes. I've been trying to find a middle ground after that discovery.
>>
>the unused electricity is wasted electricity meme is true
>>
>>103018306
Just wait for a fine tune worth using to come out. By then it surely will be.
>>
>>103018521
>the more electricity you use, the more you save
Really Jensen? Really?
>>
File: L1E7R_Ue8_yU_JGgeOCgw.png (42 KB, 771x506)
42 KB
42 KB PNG
>>103016737
I use TFS=0.99 and MinP=0.001(NOT 0.01), MinP first with Largestral. If Midnight Miqu has the same token distribution as Largestral, it should work, but considering the fact that Miqu is flatter than Largestral, I would suggest setting it lower, maybe 0.98-96?
>>
Trying this prompt again, and also doing it at higher res. Medium seems to be more creative with the style of the text which is pretty cool. At higher resolution though, it seems to not understand the part of the prompt describing the particle trail effects. At 1024x1024, they appear fine. I guess it'll be necessary to train at higher resolutions for the model to truly perform equivalently well at those higher resolutions, not just in an artifact-free manner.
>>
>>103018760
1024 example.
None of the 1440 ones had a trail.
>>
>>103016737
I've found that min-P makes the outputs more prone to X, Ying format
>>
File: robomigu.png (1.38 MB, 1024x1360)
1.38 MB
1.38 MB PNG
>>103018760
Oh damn you were that airgear anon from back in March. Have you tried it in IllustriousXL yet?
(pic unrelated)
>>
>>103018792
Use XTC in conjunction and you'll curb it even more. Also noticed way more repetition on proper nouns and titles without using it and the beginning of every sentence or paragraph starting with He or She.
>>
>>103018760
Can you post the older flux ones for comparison? Can't remember how good those looked
>>
>>103018815
Hey.

>Have you tried it in IllustriousXL
Oh, I haven't. I assumed it wouldn't work since Illustrious was trained for danbooru tags. And yeah looks like it doesn't work. This is the first gen I got. I'll try translating the prompt into danbooru tag style, in a moment.
>>
How Vedal was able to make such an advanced AI, yet local still can’t get even close?
>>
smedrins
>>
>>103018907
You can't say this!
>>
File: 1707359879687397.jpg (107 KB, 1077x794)
107 KB
107 KB JPG
>>103018327
For me it's that sweet spot right before the output goes schizo
>>103018855
>looks like it doesn't work
Yeah prompt recognition definitely takes a hit versus something like Flux but the style has a lot more character and the usage of color is much more deliberate with the contrast between Blonde Miku and the sky.
>>
>>103018069
But it has, her clothing is fusing together
To me it feels like image generation progress just slowed down a lot
>>
>>103018882
Constant work and finetuning.
>>
>>103019207
>>103019207
>>103019207
>>
>>103019231
Sex with this Teto
>>
>>103018832
I made those ones on an online demo so I spent some time generating some new ones with my local install. Generally what I can say is that Flux is a bit more coherent on average, at the same resolution, but less aesthetic. Perhaps Flux could be better with a lora/fine tune but I am just testing the base outputs.
>>
>>103014714
>who needs smart models for porn?
Those of us who coom at the library



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.