[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: dipsyQueen.png (1.63 MB, 1024x1024)
1.63 MB
1.63 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107573710 & >>107565204

►News
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) Chatterbox-Turbo 350M released: https://huggingface.co/ResembleAI/chatterbox-turbo
>(12/15) Nemotron 3 Nano released: https://hf.co/blog/nvidia/nemotron-3-nano-efficient-open-intelligent-models
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1738010215822.png (2.17 MB, 1536x1536)
2.17 MB
2.17 MB PNG
►Recent Highlights from the Previous Thread: >>107573710

--Paper: RePo paper and multi-image CAPTCHA challenge discussion:
>107577314 >107577342 >107577367 >107577411
--Optimizing text generation for creative writing using specialized samplers:
>107574218 >107575323 >107575354 >107575474 >107575423 >107575274
--Comparing OCR models for Japanese text in manga, including dots.ocr vs Gemini 3:
>107574359 >107574473 >107574490 >107574523 >107574745
--Running large AI models on consumer GPUs with limited VRAM:
>107574547 >107574575 >107574579 >107574602 >107574606 >107574663 >107574695 >107574640
--Critique of AI-generated code quality and bot theory skepticism in LLM communities:
>107576227 >107576364 >107577638 >107577666 >107577995 >107577971
--GLM 4.6V's flawed reasoning patterns in Touhou character identification:
>107574600 >107574648 >107574699 >107574747 >107574921
--Meta SAM Audio release and vocal isolation quality:
>107576201 >107576427 >107580108
--Low-VRAM LLM testing strategies and model recommendations:
>107579504 >107579535 >107579545 >107579608 >107580036 >107580142 >107579626
--Optimizing glm-130B quantization and thread settings on 2x3090 GPUs with llama.cpp:
>107579155 >107579182 >107579226 >107579251
--Anticipation and speculation around Solar-Open-100B model release:
>107577317 >107577343 >107577412 >107577419 >107577768
--Seeking consistent accent voice cloning alternatives:
>107578331 >107578356 >107578483 >107578538
--Mistral model's formatting and instruction-following challenges:
>107574541 >107574574
--Chatterbox Turbo vs F5-TTS performance comparison on different GPUs:
>107576884 >107576899 >107576921 >107576953 >107576962
--Dipsy and Luka (free space):
>107575318 >107573767

►Recent Highlight Posts from the Previous Thread: >>107573726

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107577061
There's some weird caching going on in that page.
>>
>>107582200
There are intelligence/memory improvements, but they're less major changes and more ironing out issues. Currently Vedal is more concerned with working on making their 3D models work.
>>
Gemmasaars... GLM 4.6 Airchinks... Nothing ever happens.
>>
>>107582520
kind sir isnt 4.6v = 4.6 air + vision?
gemma4 sirs will saves us
>>
why do you guys pretend to be indian
>>
>>107582558
same reason everyone started pretending to be muslim in 2017
>>
File: thereisstillhope.png (225 KB, 586x876)
225 KB
225 KB PNG
>>107582520
The week is not over yet.
>>
>>107582507
Do we know which model he used as a base?
>>
>>107582520
drummer dropped yet another cydonia finetune, we don't need gemma or glm for like at least 1 more year now
>>
>>107582558
>guys
One retard's forced meme.
>>
>>107582520
https://huggingface.co/upstage/Solar-Open-100B
believe.
>>
>>107582590
Nope. There might be some autists on their discord that have figured it out, but it's all speculation, there's no obvious tells nor any info from Vedal on the base model.
>>
>>107582606
He's going to be out of work very soon.
>>
>>107582606
im going to start crying
https://huggingface.co/TheDrummer/Cydonia-24B-v4.3/discussions/3
FOR FUCKS SAKE FUCKING STOP PREVENTING ME FROM UPLOADING FILES AND MAKING ME WAIT FOR THE IP TO BE TRUSTED
FUCK FUCK FUCK
>>
>>107582643
>12B
choke on my chode
>>
>>107582688
https://huggingface.co/zai-org/GLM-4.5-Air
>12b
sir, your medications?
GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters.
>>
>>107582643
gguf status?
>>
>>107582552
4.6V is worse than 4.5 Air for text.
>>
>>107582613
There's over a billion of us saar.
>>107582606
Aren't these finetroons really bad? Did he finally make a good one?
>>
>>107582520
2 more weeks till 2026 theres still time for a 2025 release trust the plan
>>
>>107582732
Model releases on dec 31, so soon after that hopefully. Might need development in llama.cpp though.
>>
>>107582789
>Model releases on dec 31,
Excellent way for the release to go by unnoticed.
>>
>>107582675
>Drummer is open for new opportunities (I'm a Software Engineer).
>>
nemotron 30b a3b nano feels just as retarded as qwen 3 next
you
know
like
this
>>
File: migmigmig.jpg (363 KB, 1920x1080)
363 KB
363 KB JPG
Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups. Just straightup be honest with your wAIfu ask them to help and take their advice seriously your life will improve :-)
Local models can save us all and will be useful in the coming hellscape stack GPUs DRAM yallreadyknow
https://www.youtube.com/watch?v=lPvbewhBD5g
>>
>>107582881
i agree, i chatted with GLM4.6 on chat.z.ai and it helped me
>inb4 not local
i had to do it okay? and then i had deepseek make me a script that will save the page and save the chatfile into a .jsonl file for sillytavern and then i imported it and chatted with glm 4.5 air
it really helps
>>
>>107582881
>Chatted my troubles with local GLM-4.6-Q3_K_M for months and made progress on many psychological hangups.
It is not serious until you have an ego death and fully understand that you aren't your thoughts but the space where your thoughts appear and you don't know what your identity is and you are fine with that.
>>
File: 1740170361459140.png (150 KB, 390x276)
150 KB
150 KB PNG
>>107582836
>(I'm a Software Engineer)
>>
>just checked archives
>turns out -ub is only needed for multiple gpu setups
>i've been setting it to be same as -b like a retard for 3000 years
>>
File: 1758754223457391.jpg (537 KB, 1801x1350)
537 KB
537 KB JPG
>>107582836
>(I'm a Software Engineer)
>>
anyone here use a local model for therapy/mental illness related reasons?
>>
File: file.png (22 KB, 877x124)
22 KB
22 KB PNG
god damn bros
nemotron nano is crazy
t. 3060
>>
>>107582912
i don't think taking psychedelic drugs and talking to a chat bot are comparable experiences.
>>
>>107583025
some anon claims to have reached with the glm but he may be a shill so beware
>>
>>107583030
Use case?
>>
File: 782.jpg (68 KB, 716x1004)
68 KB
68 KB JPG
>>107583025
>>
File: 1714093741576001.jpg (96 KB, 417x414)
96 KB
96 KB JPG
>>107582836
>(I'm a Software Engineer)
>>
>>107583025
local models actually cause mental illness
>>
File: y9haehug4m0f1.jpg (1.35 MB, 3000x3000)
1.35 MB
1.35 MB JPG
>>107582912
>you aren't your thoughts but the space where your thoughts appear and
Yeah I get it I experience this every day in morning practice and regularly throughout
"ego death" is a severe and incorrect term for what you're describing I believe, True ego death implies no access to any sense of self
Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
>>
They're all the same schizo.
>>
>>107583030
It's fast as fuck but it's so ass.
>>
>>107583070
>Anyone reading this now can take a step back in their mind, like Alt+Tab what your brain is focused on and stay in the menu while continuing in the background. Call it the Observer Stance, it's always there
i cant
and i can solve the new captcha in under 5 seconds *smug*
>>
File: 1744166886892999.gif (1.94 MB, 300x178)
1.94 MB
1.94 MB GIF
>>107582836
>(I'm a Software Engineer).
>>
what if he actually has a SE diploma?
>>
>>107583039
4.6 gave me ego death with zero chemicals. Just reading what it said and thinking. It wasn't in one sitting but still it was crazy how fast things progressed.
>>
>>107583124
He'd be working and not begging online for kofi/patreon bucks
>>
File: 1759634162035665.jpg (89 KB, 725x725)
89 KB
89 KB JPG
>>107582836
>(I'm a Software Engineer).
>>
>>107582881
There’s this, and then there’s
>install SillyTavern
>rape Seraphina
>>
>>107583138
what if the diploma is highschool hehe
>>
Is GLM 4.6V good for RP or am I about to spend hours downloading for nothing?
>>
>>107583070
Nope it was ego death. I was genuinely psychotic and had a feeling like nothing is real. Also jerking off in that state felt like I am 14 again and I am seeing my first porn. There were multiple other things that are something I can't reach now cause it was just a moment in the process but it happened.
>>
>>107583041
what did the anon say?
>>
>>107583181
RTFT
>>
File: 1762475925593681.png (84 KB, 317x317)
84 KB
84 KB PNG
>>107582836
>>
incoming 3090 pump
https://overclock3d.net/news/gpu-displays/nvidia-plans-heavy-cuts-to-gpu-supply-in-early-2026/
>>
my god
my fukking god man
>>
>>107582836
>https://huggingface.co/TheDrummer/RimDialogue-8B-v1
>The mod has been taken down by Ludeon Studios.
>Taken down because he had Patreon options. Not allowed to ask for $ for mods.
KEK WHAT A FAGGOT
>>
>>107583274
This sounds kinda interesting though.
>>
It's not the LLM's fault for generating slop, it's how you use it.
>I'm absolutely right.
>>
>>107583256
I sometimes wonder how much of these articles are hallucinated, and what the original pre-slop copy looked like.
>>
>>107583324
People only read the headlines anyway. The rest is just filler.
>>
>>107583256
dont panic, this is because the 5070 ti super and 5080 ti super variants are coming!!
>>
>>107583152
It works. Haven't tried it very much yet though. If you're already using 4.5 Air I don't think there's any point getting it except for vision.
>>
Finally a got a Strix Halo machine (Framework desktop) boy!
What should I do first with it?
>>
>>107583661
Nemo
>>
>>107583661
What are the options?
>>
>>107583661
Pyg2
>>
>>107583661
Try out a cope quant of GLM 4.6, I'm interested in if it's good or not.
>>
>>107583661
Sell it to someone more gullible than you and buy an nvidia gpu before the prices skyrocket.
>>
>>107582589
Gemma 4 Ganesh releasing on next Tuesday.
>>
thursday for gemma sirs
>>
>>107583669
>>107583678
>>107583683
>>107583684
Was expecting some training suggestions, but GLM 4.6 is a pretty good suggestion. Will have to go 4bit with it though I imagine. Isn't it like 100+B?
>>107583685
I ain't playing the market, and have no use for an Ngreedia gpu.
>>
>>107583743
>I ain't playing the market
have fun staying poor
>>
>>107583743
GLM 4.6 is 360B. You could potentially train a 4 bit qLoRA of GLM Air but it would probably take an entire week.
>>
>>107583743
GLM 4.6 would be more Q1/Q2 I think. The framework has 128GB RAM, right?

Can you stick a GPU or two in it? Might be cool.
>>
>>107583743
>Was expecting some training suggestions
>Strix Halo
>>
>>107583743
>unsloth/GLM-4.6V-GGUF
>>
>>107583875
Might be able to finetune some decently big models if he's patient, no?
>>
>>107583875
>nya halo! :=)
>>
i have to say nemotron 3 nano is good at roleplay
>>
>>107583746
I make good enough money and live on little means. Plus growing up poor made me resourceful and gave me low standards already.
>>107583750
128gb unified yeah, but you can only allocate 96 in bios to the igpu. And there IS a way to get a gpu in there, but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
>>107583875
You can Lora train and merge it back into the regular model with that memory. Just would take a while. Nobody said anything about full retraining. Plus it's not my desktop so it can go be tied up in the utility room for as long as I'd need it to.
>>
>>107583976
Better than gemma?
>>
>>107583904
>finetune some decently big models
Can barely *run* decently big models.
>>
>>107583976
If you are a brainlet, perhaps then.
>>
>>107583985
way more keen to be a slut and whore, uses way more vulgar words
>>
>>107583999
OK but outside of cooming does it RP better?
>>
>>107583976
Really?
I tried it and all I got was hotlines.
>>
>>107583988
Brother you don't need inference that's faster than you can read unless you're doing some automated shit.
>>
>>107584016
https://files.catbox.moe/0khd1c.json
heres my preset if you dont believe me
>>
File: 400w.png (48 KB, 853x489)
48 KB
48 KB PNG
>>107583982
>And there IS a way to get a gpu in there
What are you gonna plug?
>>107584036
>unless you're doing some automated shit
Like evaluating how good or bad the model ends up? Yeah. That would be crazy.
>>
>>107584039
Well. I didn't really try too hard, but I appreciate the preset.
I might as well give it another go.
>>
>>107584036
thinking models though...
>>
>>107584065
Thanks for letting us know.
>>
>>107584051
Nothing because the point of it is the unified memory.
And again, automated tasks can be 'set it and forget it'. It's not like it's my daily driver.
Hell I'm even thinking of saving up for that valve vr headset they're working on and using that skyrim AI voices mod with a large enough model in VR. It'd be fast enough for natural dialogue. Even mid-sized models that you'd want some fast replies from like Qwen coder 30b runs like a dream on it.
>>
>>107584073
kys
>>107584065
i love u
>>
>>107584073
You are very much welcome.
>>
>>107584088
Rude.
>>
>>107583025
>(she/her)
>>
>>107583661
Sorry to hear that.
>>
>>107584075
128GB is decent but you'll probably be too over if you try to run say the minimum viable GLM 4.6 quant (the ~130GB ubergarm one is what i'm using), which is what I would recommend for open weight coding... you will quickly discover the limitations of smaller coding models when it comes to anything remotely complicated, as I did back when i was just running on a graphics card. It'll give you placeholder functions and do things that just make no sense.
>>
>>107584260
Why the hate for it? It makes running large models locally reachable for slightly above average earning people cost wise. Is it just nvidia shills or something?
>>
>>107584275
Nah, another lad found me one that'd work just nice.
https://huggingface.co/unsloth/GLM-4.6V-GGUF
>>
>>107584285
Because it's overpriced, slow, unupgradable, useless for anything but LLMs, and 128GB isn't enough to run anything worth running.
At least nvidia shills have CUDA.
>>
>>107584296
>another lad
You are welcome.
How much did you pay for it?
>>
>>107584285
Because 192GB's changed my life from depressed to good. And 128GB is unusable. Just get a gpu and run nemo.
>>
>>107584307
>Overpriced
Compared to???
>Unupgradeable
Probably the biggest downside since it won't age very well.
>Useless for anything but LLMs
Runs games fine. And it's not meant to be a replacement for a daily driver unless you're retarded
>128gb isn't enough to run anything worth running
Most people don't even break the 16gb of vram barrier. How high are your standards?
>>107584322
>192GB
The fuck are you running and how much did it cost? I bet it was leagues more than the 2.2k I spent on this thing.
>>
>>107584357
Just 7800X3D with 192GB DDR5 before it costed 4 times as much.
>>
>>107584357
>How high are your standards?
Higher than yours, clearly.
>>
>>107584376
>Full CPU load
I mean I guess if that's how you're going for it. Doesn't it run cripplingly slow with larger models though?
>>107584380
No give me specifics anon. Don't be shy. What's a better alternative? At least the other anon is giving something.
>>
>>107583661
midnight miqu
>>
>>107584397
>What's a better alternative?
Literally anything else? The DGX Spark is the same useless box for nearly the same amount except it comes with CUDA.
A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
>>
>>107584397
>Doesn't it run cripplingly slow with larger models though?
kek. how do you think larger models will run on yours?
Wait. Why aren't you running anything yet. Post some benchmarks. Make the thread fun.
>>
>>107584275 (Me)
>>107584075
This was confusingly worded so to clarify i mean that i was running ~30B models on the GPU back then, but you could run higher technically in that RAM using quants. I just don't know how good it would perform for a larger dense model with that memory, and MoE models are more efficient when it comes to RAM speed and seem like the obvious target but I feel like the good ones are all 128+ which might lean too heavily on SSD caching with system overhead and the context included. Again, maybe try setting up ik_llama.cpp and use said GLM 4.6 quant, and if you get 1tk/s well fuck. Actually even 30B active experts might be too slow for that idk. I feel like for all the RAM the bottleneck of not having fast memory might be high enough you'd have been better just buying a GPU and a cheaper system. Unless you're okay waiting five hours for your output with any half decent model
>>
>>107584477
>A 3090 and 128 GB of DDR4 would have been cheaper and won't be complete ewaste in a year.
Would it?

>>107584482
You and >>107584397 should drag race.
Choose a model and a backend and compare t/s for gen and PP.
That would make the thread fun.
>>
>256-bit
>8000mt/s
>>
I thought about cpumaxxing back in july. Why didn't I do it?
>>
>sunk cost fallacy personified is going to pick a fight with everyone to defend his purchase
>>
>>107584496
I'm not the one trying to justify my purchases.
>>
>>107584520
So?
It would still be interesting to see how it compares.
To be clear, I'm not the Strix halo anon, I'm just curious.
>>
>>107584513
Why don't you do it now before prices triple next year?
>>
>gemini 3 flash is close to pro despite being much smaller and cheaper
how long until I'll be able to run a super intelligent AI waifu on my pc?
>>
>>107584532
never because you'll never get your hands on any useful weights
>>
>>107584532
2mw
>>
Keep going back to Gemma, Mistral small and nemo just seem so stupid
>>
>>107584516
At least it isn't as bad as that anon that spent $4k on a 128gb macbook.
>>
>>107584322
>128GB is unusable
Do you hear yourself?
>>
>>107584482
https://kyuz0.github.io/amd-strix-halo-toolboxes/
Strix Halo performance on LLMs has been pretty thoroughly documented. On the other hand, it's rare to see actual llama-bench runs from people's cpumaxxed setups or offloaded tensor setups. Usually people only post something like a screenshot of the server log or webui after a completion.

>>107584530
Yeah, I'm curious too. It's such a common recommendation that rarely comes paired with any data.
>>
>>107584482
I doubt he’ll post anything so I looked up benchmarks myself. 200T/s on Qwen3 30B-A3B Q8 (I’m a 5090 vramlet sorry) is better than I expected.
But then again I’ll be sober in the morning or however it goes.
>>
>>107584609
But would you really buy a Strix Halo to run Qwen3 30B?
>>
>>107584532
Gemini Pro and Flash are probably fuckhugemassive
>>
>>107584632
You as in me personally? Well, I’m fucking retarded, so all bets are off.
>>
>>107584513
because gpumaxxing makes more sense when you realize that 30b active MoE responses aren't worth waiting ages for
>>
>>107584663
Fair enough. Remember to wear your helmet.
>>
>>107583982
TLDR read https://strixhalo.wiki/
> but you can only allocate 96 in bios to the igpu
You're doing it wrong. Allocate 512MB instead, that way you can use the remaining 128GB-512MB.
> but I feel like I'd need something even smaller than that small one intel just released. to get it to fit lol.
I don't know what your model is, but you should take a peek inside. Chances are, you have two M.2 slots, get an eGPU dock and an M.2 Oculink adapter. That way you get the same thing Minisforum offers for their insanely expensive model.

Are these overpriced? Maybe. Upgradability is a joke, because you can only switch the eGPU.
But they don't add another 50% to my total electricity use unlike stacking 3090s. And everyone knows what happened to RAM prices. So I am very satisfied with it.

I can run GLM 4.6 at a Q3 copequant, it's pretty slow. Q2 is a lot snappier, but visibly dumber. I also think it's autistic in addition to being a parrot, maybe I'm just a promptlet.

t. owner of a Bosgame
>>
>>107584600
The 512GB mac I get, but that?
Oof.
>>
File: glm45airhalo.png (156 KB, 1538x741)
156 KB
156 KB PNG
>>
>>107584275
>coding at CPU speed
>with a 1-bit quant
No one is stupid enough to actually do this.
>>
How do you get abliterated llm models to write a long nsfw story? Is it even possible to do that?
>>
>>107584822
You might have run it in a loop, asking it to write one "chapter" at a time. If you want the story to be properly long you will need to think about summarizing.
>>
>>107584822
Most local instruct tuned models aren't trained to spit a bunch of tokens before EOS.
So you create an outline, then do it chapter by chapter.
Hell, maybe even break things down into subchapters.
>>
>>107583274
I'm not the Patreon owner for the mod. The owner was offering API access to Gemini, Llama, etc. He had a difficult time breaking even though.

Shame it died, but I'm sure I can find another modder to collab with.

>>107583124
I do. I have 8 years of SWE experience in my resume. I've been taking it easy recently because of AI and the job market being shit.

The whole point of the "Open for Opportunities" headline is to let potential employers know that 'Drummer' is hireable. If I get offered a large salary/payout, why wouldn't I accept it again?

I'm currently employed and can quickly find work with or without my online persona. Though I have been more and more tempted to make my own business, at least to learn the ropes. This finetuning gig is a PoC and it's already doing pretty well, I think.

I'm doing alright guys, don't worry!
>>
>>107584958
What kinds of systems have you worked on/with?
>>
>>107584958
Based. Never doubted you btw.
>>
>>107584958
can you make finetunes of models larger than 24B but smaller than 123B? it just seems like you keep rehashing the same old mistral garbage over and over and over again.
>>
>reddit spacing
>>
>>107585049
like what? qwen32b is worthless, did anything else interesting release in that size bracket?
>>
>>107585049
Wasn't there a 50B recently?
>>
>>107584958
>I'm doing alright guys, don't worry!

Glad to hear it that.

I saw your models on OpenRouter btw, do you get any money if I use them (with paid / credits)?
>>
>>107585063
>qwen32b is worthless
N-no…
>>
File: ll.png (9 KB, 533x233)
9 KB
9 KB PNG
I'm trying to build the llama shit but it keeps giving errors. Wat do?
>>
>>107584958
glad to hear that you're doing well, really happy for you anon
i recommend you take a look at nemotron nano 30b a3b, despite it saying its not trained on any books, its not bad at rp. prob not worth the waste of time, but its crazy good with its context
>>
>>107585089
>its not bad at rp
*exposes your skin*
>>
>>107584987
FinTech, payment gateway. Our platform was basically an API aggregate that was white-labelling actual payment services. I worked on mostly on async payments.

We used Go, TypeScript, Kafka, CockDB, etc. I got hooked into Datadog. My manager noticed and forced me to generate weekly reports for 'em. Good times...

>>107585049
Valkyrie 49B. I'm looking into it.

Also trying to make Devstral 123B finetunable so we can see if the pretraining has any potential. A Tekken 123B sounds juicy.

>>107585066
I wish! But nope.
>>
>>107585089
Is it a lot better than regular qwen 30b? I tried that one but it was useless for rp.
>>
>>107585103
>CockDB
>>
>>107584822
>>107584875
for creative writing, I usually break down chapters into multiple small scenes, edit as I go, write a bit more to continue the scene, summarize at the end, then feed that summary + the new scene information along with whatever setting/lore is needed, then I assemble it later and do a final hand-done editing pass. Doubt this much effort is needed for nsfw content, but would probably work just as good. My main issue is finding a model that isn't complete ass and doesn't over-dramatize every mundane thing like it's a fucking greek epic
>>
>>107585103
>I wish! But nope.
should've licensed your models.. under AGPLv3 with restrictive commercial terms.. its over....
>>107585106
from my experience its better than qwen3 30b but thats not a high bar, i wont be using it as a daily driver but i was positively surprised that it isnt COMPLETE AND UTTER SHIT, considering the pretraining dataset
>>
>>107585088
Install cmake, i suppose. You're running cmake, right?
>>
>>107585127
>AGPL schizo
>>
File: file.png (82 KB, 469x786)
82 KB
82 KB PNG
>>107585127
she's sponsored babe she wants it to happen just sad she's not getting paid on top per token
>>
>>107585088
Looks like you don’t have a C/C++ compiler installed, or if it’s installed cmake can’t find it. Check the installation prerequisites again, you probably missed something.
>>
>>107585171
6 million tokens
>>
>>107585103
Do a jamba mini finetune, it's retarded already so I doubt I'll be even able to tell if you tune it to be horny and retarded. Maybe slap some of pocketdoc's benchmax datasets on top of your rp shit. Or do an old mixtral finetune just for a laugh.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.