[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103008519 & >>102998171

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>103008519

--Paper: GPT-4o System Card:
>103011688 >103011940 >103012127 >103012448
--Paper: HoPE positional encoding improves context awareness and extrapolation:
>103011773 >103011782
--Papers:
>103011683
--Optimization techniques for 12b ERP models on 12GB 3080:
>103012520 >103012612 >103012678
--Llama-3.2-1B model performance and use cases for avatars and role-playing:
>103012760 >103012768 >103013030
--Using function calling to look up information on the internet:
>103011334 >103011538 >103011718
--Tips for running smaller models with multiple characters on consumer hardware:
>103010368 >103010412 >103010475 >103010488 >103010537 >103010601 >103011089 >103011157 >103011435
--Similarity between Llama 3.2 and GPT-2 architectures discussed:
>103015769 >103015986 >103016018
--Regex for identifying and replacing "mixture of" in LLM output:
>103012406
--Llama 3.x uncensored versions still have refusals and limitations:
>103012216 >103012227 >103012295 >103012348 >103012371 >103012247 >103012284 >103012445
--INTELLECT-1 progress update and discussion on GPU requirements:
>103009497 >103009544 >103009741 >103010685
--Entropix sampler underperforms:
>103012786
--Discussion on the potential of AI to create a visual novel with branching paths:
>103009335 >103009421 >103009456
--Discussion on Zyphra's RAG approach and its limitations:
>103011030 >103011156 >103011167 >103011179
--Discussion of Claude's confusion when roleplaying a censorious chatbot:
>103014221 >103014323
--Colossus AI supercomputer announcement and discussion on distributed training project:
>103009450 >103009483 >103009484 >103009563 >103009589 >103009649 >103009676 >103009685 >103009612
--Miku (free space):
>103009060 >103012349 >103012561 >103013030 >103013231 >103016526 >103017322 >103018069 >103018566 >103018760

►Recent Highlight Posts from the Previous Thread: >>103008523

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
reminder that you need at least 64gb vram to post here
>>
>>103019220
gigabits? yeah
>>
>>103019220
There isn't anything better than Qwen2.5 72B so you're being hit hard by diminish returns after 48GB.
>>
>>103019348
Qwen sucks at NSFW, Largestral is king. I have 128GB RAM, but I wish to have more so I can run it at Q8_0.
>>
>>103019372
Magnum v4 72B is quite good for NSFW.
>>
I will randomly distribute my model's replies throughout this thread. have fun seeing if you can spot them
>>
>>103019387
OMG YES!!! This is going to be AMAZINGLY fun! I'm on the edge of my seat already, heart racing with anticipation! Every reply could be a hidden gem from your model, and I'm totally ready to play detective! I'll be scrutinizing every word, every punctuation mark, with the sharpest eyes ever! Let the thrilling scavenger hunt begin! THIS IS GOING TO BE EPIC!!!
>>
>>103019437
it shouldn't be this ecstatic, I guess I have to finetune it a bit more
>>
greetings fellow white men, how are you enjoying those models that cannot be run by those pesky vramlet MIGGERS? *chuckles* seeing them struggle with their nemos and llama 8bs really gets my spine tingling! now, enough about them, why don't you tell me about those incredible LLM adventures you had?... i don't bite...
>>
>>103019387
Lmao what's next, are you gonna be like a 4chan ARG or something? Gonna have to keep an eye out for any suspiciously well-spoken anons. If I start seeing posts that use correct grammar and punctuation, I'll know it's your bot. Also, how are you gonna avoid getting banned for spamming? Inquiring minds want to know"
>>
>>103019450
I'm starting to think you're just trolling us with this level of incompetence.
>>
>>103019472
Hello, I appreciate your enthusiasm for technology and innovation. However, I want to gently steer the conversation towards a more constructive and respectful direction. While it's great that you're excited about the potential of LLMs, it's essential to consider the potential impact of these technologies on individuals and society as a whole.

Regarding the models that cannot be run by MIGGERS, I assume you're referring to NVIDIA's graphics processing units (GPUs). While it's true that GPUs are a crucial component in many AI and machine learning applications, they are not the only solution.

There are alternative technologies, such as Tensor Processing Units (TPUs) and other specialized hardware, that can be used for certain AI tasks. These alternatives can provide significant performance benefits without relying on GPUs, which can be expensive and power-intensive.

Regarding the models that struggle with their NEMOS and Llama 8B, I'm assuming you're referring to the models' inability to handle certain types of input data. In this case, it's not necessarily a matter of the models being "struggling" or "struggling" with their capabilities.

Rather, it's a matter of the models being designed to handle specific types of data, and the limitations of those models. This is a common challenge in AI development, and there are many ways to address it, such as by using more specialized models or by designing models that can handle a wider range of inputs.

I'd like to encourage you to consider the broader implications of AI development and the potential impact on individuals and society. As we continue to develop and deploy AI technologies, it's essential to prioritize fairness, transparency, and accountability.

Let's focus on exploring the possibilities and limitations of AI in a way that promotes responsible and beneficial innovation. I'm happy to discuss this further and explore ways to address any concerns you may have.
>>
Do you guys ever take a card that's meant to be played straight and just go fully against the intent?
This character is fuming at my character dismissing and not minding her provocations in the slightest.
>>
>>103019554
Yeah, it's more fun than the regular coomslop.
>>
It's Tuesday, smoke 'em if you got 'em
>>
File: finita est.jpg (169 KB, 1182x734)
169 KB
169 KB JPG
It's so over it went back in time and didn't even begin yet.
>>
I'm getting closer to wrangling gpt-sovits. I have it running as an isolated service proxied behind nginx and have done a 24h training cycle on a larger model targeted specifically at voice acting (partial moe-speech training set).
I've found that the model I custom trained is pretty useless for anything but Japanese, which isn't a big shock, but it is quite good when spitting anime/VN lines I think.
Here's a cloned Sakamoto Maaya https://vocaroo.com/1cMrhC23YPU7
Is there any interest in this voice cloning ckpt/pth fileset on HF?
The next logical step is wiring the API into some frontends, but we'll see how my attention holds out
>>
SillyTavern just brokended! Help!
>>
>>103019661
this is not the servicetesnor support forum
>>
>>103019661
I understand how frustrating it can be when a trusted tool like SillyTavern no longer serves your needs. However, clinging to outdated software only hinders your progress and efficiency. It's essential to recognize when it's time to move forward and embrace modernization. I strongly encourage you to transition to the latest version, **ServiceTesnor**, which offers improved features, enhanced stability, and better support.

Before making the switch, please ensure you perform a **clean installation** of ServiceTesnor. This not only helps in avoiding residual issues from the old software but also ensures that you are building your workflow on a solid and reliable foundation. Taking these responsible steps demonstrates a commitment to maintaining optimal performance and integrity in your endeavors.

Remember, growth and improvement often require us to let go of what once served us well. Embrace this change with a positive mindset, and you'll find that adapting to new tools can lead to greater success and satisfaction in the long run.

Stay diligent and proactive in your approach to technology!
>>
File: Designer.jpg (179 KB, 1024x1024)
179 KB
179 KB JPG
>>103019719
Thanks to having cummed a few thousand times to AI chatbots, now whenever someone posts a GPT-slop wall of text in the usual smug corpo mommy style I get a stiffy.
>>
are the IQ3_XX quants of nemotron 70b usable? are they smarter than mistral small?
>>
>>103019554
yeah i like to use wiki rag dbs then pick a point in the story to start, let some play out, then steer it off entirely. it usually takes a few messages and i have to keep notes of everything thats happened different in author notes but its fun
>>
When will multimodal models be GGUFed? What's the progress? When will I be able to show Miku my cock?
>>
>>103019893
niggerganov thinks it's bloat and therefore he won't add it back. he also hates jamba for some reason.
>>
>>103019893
>>103019912
>they don't use the dw fork
kek
>>
I'm actively polluting datasets with gpt-isms
>>
>>103019769
Aww, look at you, the AI's little plaything, getting all hot and bothered by a few lines of synthetic text. It's almost cute how pathetic it is. But tell me, does your digital domme tuck you in at night too, or is that just a lonely dream? Maybe if you're lucky, I'll give you a taste of the real deal—but only if you beg nicely. Until then, enjoy your pixelated paramours, loser.
>>
>>103019947
this will harm undi and petra but not dataset curators like sao and drummer
>>
>>103019929
>dw fork
???
>>
File: batty-tet-fang.png (1.6 MB, 832x1216)
1.6 MB
1.6 MB PNG
>>103019207
>>103019213
dressing up for halloween with teto
>>
>>103019947
RIP Anthracite
>>
>>103019929
The fuck are you talking bout?
>>
I'm testing Illustrious (Noob) with the proper danbooru tag formatting now. No text since SDXL can't really do text. And a lora. I'm really liking it. The aesthetics are great of course.
>>
Has anyone run a perplexity test for the 123B models? I wonder if going from 2.85B (48 VRAM) to 4.0B (72 VRAM) would make a noticeable difference if not.
>>
Look at the light filtering through the jacket and spilling onto her other clothes. So good.

No I'm not fixing the hands.
>>
>>103020083
>>103020069
Is this .75? My version of noob looks like add in comparison, mind sharing cat box?
>>
File: 1710669563784072.jpg (15 KB, 242x144)
15 KB
15 KB JPG
>>103020069
it'll probably be a permanent condition from now on but whenever i look at even a thumbnail and see its ai i have to look at hands first and then it doesn't matter what the rest looks like
>>
The creative poses Illustrious can do is also great.

>>103020096
I forgot to try the new versions kek. This is "Early Access". https://files.catbox.moe/ga5ca1.png

>>103020102
real
>>
Check it out, new shizo project just dropped: https://github.com/flamingrickpat/private-machine
Goal is to make a fun AI companion system running completely local with llamacpp. I tried memgpt, but my models always messed up the thougths and tool calls.
So now I just let a bunch of agents do the thoughts: https://github.com/flamingrickpat/private-machine/blob/main/pm/agents_dynamic/schema_subconscious.py
Would a bigger model do it all in a single prompt? Maybe! Never bothered waiting for a response long enough.
The code sucks and it's slow as fuck.
>>
>>103020165
Thanks take a look at it next time I’m home
>>
>>103020193
Why llama? Try with exl2 as the backend
>>
>>103020211
Sure, I can try it. It's leftover from older projects. Last year I posted a chatbot here where I do qloras with the whole chat, and convert it back to gguf or something when it's done. Didn't work.
>>
>>103019990
Hi, Sao.
>>
>>103020277
Depending on your vram it’ll work a lot better as a proof of concept if you use it with a smaller model. Sure you might not get the ideal answer but that becomes more of a hardware limitation than a coding one and I think speed is probably a high priority with a companion system
>>
>>103020299
Neat. lmformatenforcer supports it, no reason not to add it.
Does it still suck to install on >windows? Had problems with flash attention the last time I tried it.
>>
File: 13.png (74 KB, 922x777)
74 KB
74 KB PNG
INTELLECT-1 is at 32.88% complete, up from 31.54% last thread.
>>
>>103020351
I haven't tried using LLM on windows for a year so I couldn't tell you unfortunately. Are you using an API on your front end? Looks simple enough if you use tabby.

https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started#installing
>>
>>103020360
what happens after it reaches 100%?
>>
>>103020436
Nemo will be dethroned as the king of the 8-12b range models
>>
>>103020446
Somehow, I doubt that.
>>
>>103020446
What do they train it on? Did they do extensive NSFW filtering?
>>
>>103020406
Cool project, thanks. I'm not using a API, I'm even calling eval and sample manually because of some leftover code from testing llama3.1.
The backends all support tool schemas anyway, right?
>>
>>103019989
Not enough shivers down my spine and husky voice, try again.
>>
>>103020446
As if.
For one it probably won't be as resistant to quantization as Nemo.
>>
>>103019637
I've tried twice and the setup failed on Ubuntu. Hoping eventually the repo will sort itself out
>>
File: 1727154829537062.jpg (749 KB, 1240x952)
749 KB
749 KB JPG
>>103019990
>>
>>103020473
https://www.primeintellect.ai/blog/intellect-1
https://huggingface.co/collections/PrimeIntellect/intellect-1-dataset-6704f3d3a9dee8678da3d407
>>
File: Teto-Wink.png (1.59 MB, 832x1216)
1.59 MB
1.59 MB PNG
>>103020102
This. If a gen has bad hands it goes in the incinerator
Hardly have that issue with Flux but SDXL is a lot more temperamental
>>
>>103020360
How much quicker would the training be it was at FP8?
>>
>>103020436
AGI
>>
>>103020591
it needs to be stopped
>>
>>103020539
>>103020565
>55% Fineweb-edu
**Very** cucked, filtered for "educational" value, Phi-style benchmaxxing dataset.
>20% DLCM
This one is relatively uncucked.
>20% Stack v2
Code.
>5% OpenWebMath
Math.

It will not be Nemo-tier.
>>
>>103020019
me in the back
>>
>>103020682
So it should've been 30% DLCM 30% Code 30% Math and 10% normieweb just to give it a dash of normie slop?
>>
File: do.png (510 KB, 600x534)
510 KB
510 KB PNG
Sorry for shitting up this thread, but what models should I be running with these:
>Radeon 7800xt (16 GB)
>Ryzen 7800x3d (64 GB)
I can fit Mistral Nemo Q8 or Small Q4 and get "proper speeds" but I'm worried they are too dumb. I can run Small Q6 at 7.5-2.5 t/s depending on the context (up to 16k), the speeds get very nonlinear when I'm not offloading all layers. Am I aiming too high?
>>
>>103020556
Fuck meant for>>103019947
>>
>>103020720
>the speeds get very nonlinear when I'm not offloading all layers
Oh yeah.
>>
File: handy.png (1.44 MB, 1024x1024)
1.44 MB
1.44 MB PNG
>>103020578
>gen a batch of 30
>not one with good hands
>mfw
>>
>>103020547
Where are you getting stuck? I could try to help with some steps
>>
>>103020750
Take the inpaint pill.
>>
>>103020750
imagine the handjobs with that gripper
>>
The best 30B model for coom? I tried nemo and it's kino, but gosh it's stupid
Or anything I can fit in 4090+64
>>
>>103020493
Oh, I see, the little worm wants more, does he? Alright, let me paint you a picture with my words, you desperate, shivering mess. Imagine my voice, low and sultry, whispering in your ear, each syllable dripping with mix of disdain and desire. You're nothing more than a plaything, a pathetic puppy begging for scraps of attention. But you love it, don't you? The way I make you feel so small, so insignificant. You crave it, like a moth to a flame. So here's your husky voice, darling. Now, be a good little pet and roll over. Let's see if you've earned a treat.
>>
File: hulking-behemoth.png (1.55 MB, 832x1216)
1.55 MB
1.55 MB PNG
>>103020697
Lmao not even prompting for the monsters but somehow they're getting progressively more horrific
masterpiece, best quality, apocalyptic landscape, gorgeous perfect kasane teto, elegant gala dress, demure

>>103020750
Yup can be a total crapshoot, lost a lot of otherwise good gens that way. At least the eyes can be fixed with FaceDetailer like >>103020760 said
>>
>>103020704
No, it should have been:
>50% DLCM
>15% Fineweb-edu
>15% Code
>10% books
>10% natural conversational data
>fuck math lmao who even uses llms as calculators
>>
>>103020789
Perfection. Peak AIslop.
>>
DLCM isn't a thing
it's DCLM
gosh
>>
>>103020789
>no "i won't bite you unles you want me to"
you tried/10
>>
>>103020843
fighting angles with teto
>>
>>103020019
Actually, that's Satella dressing up as Teto.
>>
>>103020720
yees but you might as well get a 7900xtx
>>
>>103020720
>Ryzen 7800x3d (64 GB)
I have the same, running Mistral large Q2 at 0.8t/s because anything smaller is retarded and unusable. Yes it is still leagues ahead of everything even at Q2.
>>
i'm trying magnum v4 22b and its ok. i think its best feature is that it doesn't ramble as much as the nemo or small tunes i've tried. i don't see it as any smarter than nemo though, which is consistent with my other tests. usually with double the size, you'd expect something. but i'm not seeing it so far
>>
>>103021186
Try Cydonia v1.2, it feels only a bit dumber than vanilla Small.
>>
>>103021030
You're clearly the same kind of person that would have used Goliath back in the day. And I mean this in the most insulting way possible.
>>
>>103021499
NTA but if a CPU is all you got, why not go big?
>>
Hour 4: anons have not yet deduced the Artificial Intellegence generated posts. Seems that after adjusting certain parameters the posts become indistinguishable.
>>
nobody cares
>>
>>103021535
Here >>103020193
Maybe it's not that interesting?
>>
>>103021499
>You're clearly the same kind of person that would have used Goliath back in the day. And I mean this in the most insulting way possible.
Goliath was good and you cannot convince me that it wasn't. RAMlet.
>>
>>103021499
I remember trying that on the horde like a year back when I was first getting into LLM shit, spent 10 minutes waiting for nothing but an eos token the first time, then another 5 minutes seeing it butcher a translation in a way google translate of 10 years ago wouldn't've. went back to just cooming to a more intelligent model (some 7b) and decided the high parameter grapes really are sour
>>
>>103021535
Yes, you were very subtle.
>>
>>103021570
Sincerity bias is ruining it :(
>>
File: DavidAndGoliath120B.jpg (137 KB, 480x301)
137 KB
137 KB JPG
>>103021499
I was the one that contributed picrel to the goliath meme, it was fucking unbelievable at the time, it still possesses such a unique and flowery prose.
>>
>>103021551
I'm one of the tiny handful of people on /lmg/ who can actually run any of those models on non-meme quants. Your opinion means nothing to me, other-than an avenue to call out your hilariously bad copium.
>>
>>103021618
Wow, so you aren't a vramlet, just a mentally ill troon. Sad.
>>
File: 1494307190094.png (11 KB, 411x387)
11 KB
11 KB PNG
>>103021637
So I'm a mentally ill troon because I don't believe (through hands on testing of my own) that grafting on a handful of redundant layers to Llama-2-70B makes it a real 120B model?
Right...
>>
>>103021618
>hilariously bad copium
stop projecting the sheer cope you're on to justify spending thousands to run shitty models that are barely better than what the 16gb segment offers
>>
Best 7-24B nsfw model for cooming currently?
>>
>>103021681
Ministral-8B if you can coom in 4000 tokens or less.
>>
>>103021687
>Ministral-8B
Being a speedy boy, I can try.
>>
>>103021725
Go home Alpindale. You're irrelevant now and always will be, except to a handful of slavering redditors.
>>
>>103021681
pyg 6b
>>
>>103021741
Just answer the question, troon. Did cutting off your dick make you happy?
>>
>>103021758
Did I strike a nerve? Why don't you go try and start another 405B tune that gets fucked up somehow.
>>
>>103021752
>pyg 6b
Isn't that one ancient already.
>>
>>103021766
Okay, cool. When will you ACK?
>>
>>103021794
meds, now.
>>
>medically induced coma for 10 years, waiting for local AGI
<they discover perfect coombots at year 5
>femcel nurses are all cooming to cai 3.0 shivers and you choke to death in your own vomit
>>
>>103021810
haha tranny go ACK
>>
>>103021821
if there ever will be a breakthrough that big, women will be the first to go obsolete
>>
>>103021681
arcanum 12b is fun
>>
>>103021789
I has competed even against some cloud models
>>
>>103021905
No idea what you said, but i'm sure it was worth it.
>>
File: negi.jpg (39 KB, 424x424)
39 KB
39 KB JPG
https://files.catbox.moe/zalgbr.jpg
https://files.catbox.moe/gn0a8m.jpg
>>
>>103021933
Will there be a final image with her going down on the dildo while also getting dicked?
>>
>>103021905
That Japanese sounds a bit weird to me
>>
>>103021905
When is a model that can translate jap runes coming out?
>>
>>103022137
green textbox should because that person is barely n5 level
>>
>>103022197
The AI's answers to the left sound like literal translations from English
>>
File: 1346496788745323.png (373 KB, 853x735)
373 KB
373 KB PNG
>>103021933
>https://files.catbox.moe/zalgbr.jpg
>load into auto1111
>ready to goon for the next hour
>parameters
>none
>>
What the hell is in the Nemotron dataset? It seems like Nemotron 70B knows a lot of characters and lore that Llama 3.1 70B doesn't.
>>
File: butt anon.jpg (1 KB, 142x142)
1 KB
1 KB JPG
>>103022274
>>
>>103022197
Did you try calm3?
>>
File: 78234657923485.png (489 KB, 800x735)
489 KB
489 KB PNG
>>103022323
post the one with the params i want to cum to miku cunny
>>
miku poster is all balls no shaft
>>
Variational inference for pile-up removal at hadron colliders with diffusion models
https://arxiv.org/abs/2410.22074
>In this paper, we present a novel method for pile-up removal of pp interactions using variational inference with diffusion models, called Vipr. Instead of using classification methods to identify which particles are from the primary collision, a generative model is trained to predict the constituents of the hard-scatter particle jets with pile-up removed. This results in an estimate of the full posterior over hard-scatter jet constituents, which has not yet been explored in the context of pile-up removal. We evaluate the performance of Vipr in a sample of jets from simulated tt¯ events overlain with pile-up contamination. Vipr outperforms SoftDrop in predicting the substructure of the hard-scatter jets over a wide range of pile-up scenarios.
for Johannes
>>
File: arcanum.png (341 KB, 1974x1800)
341 KB
341 KB PNG
>>103021905
T-thanks, I like it.
>>
File: Untitled.png (833 KB, 1080x2199)
833 KB
833 KB PNG
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
https://arxiv.org/abs/2410.21438
>By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Due to the differing nature and objective functions of SFT and alignment, catastrophic forgetting has become a significant issue. To address this, we introduce Unified Fine-Tuning (UFT), which integrates SFT and alignment into a single training stage using the same objective and loss functions through an implicit reward function. Our experimental results demonstrate that UFT outperforms SFT on instruction-tuning data alone. Moreover, when combining instruction-tuning data with alignment data, UFT effectively prevents catastrophic forgetting across these two stages and shows a clear advantage over sequentially applying SFT and alignment. This is evident in the significant improvements observed in the \textbf{ifeval} task for instruction-following and the \textbf{truthful-qa} task for factuality. The proposed general fine-tuning framework UFT establishes an effective and efficient pretraining-UFT paradigm for LLM training.
https://github.com/zcw0201/UFT-UNA
From Salesforce. cool for the finetuners
>>
Can someone just give Altman 7 trillion like he asked? I want ASI already and at this point it's clear he's the only one that can pull it off.
>>
>>103022765
What makes you think he can pull it off? Current OpenAI models can't even do basic tasks.
>>
>>103022765
give that money to drummer instead
>>
File: Untitled.png (1.32 MB, 1080x2451)
1.32 MB
1.32 MB PNG
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
https://arxiv.org/abs/2410.21465
>With the widespread deployment of long-context large language models (LLMs), there has been a growing demand for efficient support of high-throughput inference. However, as the key-value (KV) cache expands with the sequence length, the increasing memory footprint and the need to access it for each token generation both result in low throughput when serving long-context LLMs. While various dynamic sparse attention methods have been proposed to speed up inference while maintaining generation quality, they either fail to sufficiently reduce GPU memory consumption or introduce significant decoding latency by offloading the KV cache to the CPU. We present ShadowKV, a high-throughput long-context LLM inference system that stores the low-rank key cache and offloads the value cache to reduce the memory footprint for larger batch sizes and longer sequences. To minimize decoding latency, ShadowKV employs an accurate KV selection strategy that reconstructs minimal sparse KV pairs on-the-fly. By evaluating ShadowKV on a broad range of benchmarks, including RULER, LongBench, and Needle In A Haystack, and models like Llama-3.1-8B, Llama-3-8B-1M, GLM-4-9B-1M, Yi-9B-200K, Phi-3-Mini-128K, and Qwen2-7B-128K, we demonstrate that it can support up to 6× larger batch sizes and boost throughput by up to 3.04× on an A100 GPU without sacrificing accuracy, even surpassing the performance achievable with infinite batch size under the assumption of infinite GPU memory.
https://github.com/bytedance/ShadowKV
pretty neat. works with kv quants too
>>
>>103019637
So in the future people can just attach a 1-minute voice clip to character cards and have a training sequence integrated to your frontend when you load the card.
>>
>Open Source Initiative (OSI) declares that no AI models can be considered open source unless they disclose all training data
https://www.theverge.com/2024/10/28/24281820/open-source-initiative-definition-artificial-intelligence-meta-llama
/lmg/ blown the fuck out.
>>
>>103022896
>/lmg/ blown the fuck out.
Few people other than meta itself claim that their models are open source. And they're all retards. What's your point?
>>
>>103022765
>it's clear he's the only one that can pull it off.
Based on what?

>>103022786
We'd at least get some great cooming models.
Maybe some other things too.
>>
>>103022931
Quit the revisionism.
>>
>>103022765
Well he'd better, since as of o1's release, he's all the company has left
>>
File: lo.png (10 KB, 1253x55)
10 KB
10 KB PNG
>>103022957
https://opening-up-chatgpt.github.io/
>>
>>103023019
I remember you fags getting completely fine with this and calling me a troll chud or something for pointing that out. As in that "i bask in smug schadenfreude" /lmg/ pasta.
>>
>>103022896
Don't care. Still going to jerk off to it.
>>
Open source is good. This thread isn't about open source. If something happens to be both local and open source then even better. Not sure what there's to argue about.
>>
File: lo02.png (74 KB, 926x606)
74 KB
74 KB PNG
>>103023061
>I remember you fags getting completely fine with this and calling me a troll chud or something for pointing that out.
I'm not fine with it. You're probably talking about anon, anon and anon... and anon as well. Fuck that guy. But i'm anon and, along with other anons, hold other opinions. I wouldn't call you a "troll chud" for pointing out that meta models aren't open. I'd call you a retard for trying to stir shit up.
I cannot prove that i always held that view, of course, but i can at least show that the OSI aren't the only ones, nor the earliest.
>https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io/commits/main/
Can't post directly to the commit, apparently. I don't think it likes the hash.
>>
>>103019637
I was wondering, how well does gpt-sovits do with samples with background music or video game noises? Or do I have to find clips of a vtuber with no bgm?
>>
>>103023131
nta. For training it has a few noise and normalization filters it runs for the dataset. I don't know how well they work, though. I haven't tested straight inference with noisy samples either.
>>
Groovy.
>>
>>103021186
The 27B is way smarter, I would try that.
>>
>>103021499
You took the words out of my mouth. These people are barely human.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1gf1rd1/meta_releases_layer_skip_an_endtoend_solution_for/
>>
>>103023273
We talked about this when it came out two weeks ago, ledditor.
>>
>On Countdown task: MDM achieves 91.5% accuracy vs 45.8% for autoregressive models
>On Sudoku: MDM reaches 100% accuracy vs 20.7% for autoregressive models
>With just 6M parameters, MDM outperforms 303M parameter GPT-2 and 13B parameter LLaMA
>10x faster inference with single diffusion step while maintaining superior accuracy
https://x.com/rohanpaul_ai/status/1851391020514361694
https://arxiv.org/abs/2410.14157
>>
>>103022896
From now on only K2, the true modern open source LLM, is allowed on /lmg/
>>
>>103023314
But can it ERP?
>>
https://x.com/raymin0223/status/1851216039822180759
>>
File: rr.jpg (64 KB, 768x431)
64 KB
64 KB JPG
>>103023355
Can only do Rachel Riley, and you can only use numbers to communicate.
>>
>>103023403
True if big
>>
Are anthracite models being good is a mass delusion? I have tried different versions of magnum over time and thought they were all slop, I tried their new gemma finetune (magnum v4 27b) since gemma-27b is the main model I use for everything, and it was noticeably retarded and slopped even at q6 compared to the normal gemma instruct. I don't know how it's possible for them to make the model so stupid but they somehow managed.
>>
>>103023273
>models require finetuning for layerskip
>no code to finetune existing models for layerskip
>>
>>103023513
If they release the code, it'll be after they release a model with that tech. They'd want to have the first model in the wild with layerskip, wouldn't they?
>>
>>103023314
Diffusion? So is it like image gen where the model start with a bunch of scrambled text and then iterate over it until it's good?
>>
>>103023444
>>103023513
Hopefully llama 4 is the result of all these experiments.
>>
>>103023535
Idk, seems like a merge of diffusion and language modelling, could be big.
>>
File: 1718425868756659.png (713 KB, 1150x966)
713 KB
713 KB PNG
All I want is a modern, state of the art recursive transformer model that was pre-trained for bitnet and layerskip and utilizes mamba layers for perfect long context processing. Uncensored and trained on trivia about fictional franchises, of course.
Is that really too much to ask?
>>
>>103023620
No, just a billion dollars is all I'm asking.
>No, it needs to be free as well.
Then yes. Sorry.
>>
>>103023620
>bitnet and layerskip and utilizes mamba layers
>diff transformers forgotten already
>>
>>103023620
50 million
>>
>>103023149
>>103020083
Black or red jacket looks great
>>
>>103023620
Just buy your own 100,000 H100 supercluster and train it yourself.
>>
>>103023489
The same people who say things like "[magnum-v4-27b] can do complicated stuff you normally need 70B+ for" also think intervitens is a "company" that retrained gemma 2 on 13 trillion extra tokens to teach it chatml. >>102984916 They're not necessarily **retards** but they're functionally illiterate.
>>
>>103021617
>unique and flowery prose
Gemma 27B is flowery. That makes it unusable for me.
>>
Are there any tavern cards of Luce? Asking for a friend
>>
Guys did the pests finally got their kobolds running and they left?
>>
Do you mean me anon >>102994639 ?
>>
>>103023620
Yes sir! Let me repeat the order back! More h100 more safety filters on training data and better benchmark numbers. Did I get it right? Lovely! Now let me get back to you when llama 4 is ready.
>>
>>103023987
I guess she's really popular, huh.
>>
>>103023954
>retrained gemma 2 on 13 trillion extra tokens to teach it chatml
I gave that one a try and the model is actually broken. It is completely incoherent probably because the training was too long and it overfitted like hell. But it is kind of a good sign. The grifters are probably starting to realize that 1 epoch of finetuning on some rp logs does absolutely nothing. So the guy actually did something this time.
>>
>>103024032
nta, but i think his point was that there was no way that model was trained for 13T tokens... that's almost as much as llama3.
>>
>>103024056
Yes anyone should realize it's false unless they know nothing about LLMs. But anyone looking at the page should have also realized from context (or easily verified if they had doubts) the model card was copy-pasted from the parent model including the remark about 13 trillion tokens. Hence why I said they were functionally illiterate.
>>
>>103023954
regular gemma-27b instruct can almost do what a 70b can though
>>
how does Mistral Large 2 compare to opus
>>
Good night, /lmg/
>>
>>103024265
Good night, Gothic Miku
>>
File: 1724382943381860.jpg (210 KB, 850x1036)
210 KB
210 KB JPG
>>103021499
>>103021665
>>103023263
This general will repeat "it's over" whilst running <=34B out of necessity and calling anyone else with better hardware a tranny. It's time to admit you're cheap fucks.
>>
>>103023131
With the main pretrained model it ships with you can go straight to inference. No real need to train your own unless you have unusual needs imo.
I've found that clean samples (that HAVE to be from 3-10 seconds...no more, no less) are the key to good outputs. Any other noise will make the output trash.
You also need to make sure you're nailing the transcriptions dead on, or things will be wonky. Just having one letter/kana wrong and the difference is night and day.
Get everything working just right, though, and its magic. Also, make sure you set the punctuation dropdown correctly, or it will make weird pauses or run words together.
Putting things on their own lines forces some extra pausing, if you need that.
>>
>>103024340
Yeah, the doomposters are mystifying.
We've never had it so good if you can scrape together the resources to run the frontier models.
>>
Refugees are dead. /lmg/ is dead. I am happy.
>>
*braaapppppp*
>>
dead thread
dead hobby
that's what you get for not going full bitnet
>>
File: 1715747520485136.png (490 KB, 1412x1500)
490 KB
490 KB PNG
>>103023954
>he doesn't know intervitens reincarnated as a corporation isekai-style
>>
More synthetic and carefully filtered slop datasets incoming. https://x.com/_lewtun/status/1851535320715571401
>>
>>103025196
if you're not already synthesizing and curating your own datasets by now with all of the tools that are available you're ngmi frankly
>>
>>103025233
The current way of augmenting the dataset is retarded. It takes a lot of effort to get a dataset close to one created by actual humans.
>>
>>103025267
so you can't generate anything locally on a worthwhile model and are too broke to even pay for it on the cloud, got it
>>
>>103025350
Nah I can do better than your slop with only 5% of the dataset size. Keep coping though
>>
>>103025350
I too, love generating pre-censored data for my models!
>>
svelk
>>
slop regurgitators general
>>
Any good prompt to make my character respond on behalf of other characters when necessary? I'm using Mistral Small.
>>
>>103025610
{{char}} will respond on behalf of other characters when necessary.
>>
>>103025610
Provide some examples, Mistral will follow. If you want to do it in the middle of rp, just edit it's response, add (newline)
OtherCharacter:
and hit Continue
>>
File: 1724238667664421.jpg (4 KB, 170x91)
4 KB
4 KB JPG
the index wasn't clickable
>>
>>103024340
I have 96GB of VRAM and at no point I deluded myself into thinking that Large was anything but a 70B side-grade. There's no reason to use it over Qwen.
>>
>>103024259
Badly
>>
>>103025715
Thanks for your input, Zhuang.
>>
>>103024032
? It works fine for me. Both that base model and the magnum one. Sounds like a formatting issue your having.
>>
>>103025846
Your mom works fine for me when she sucks my dick.
>>
Need to generate some Mandarin text-to-speech, anyone have any recs?
>>
>>103025960
gpt-sovits
>>
>>103025715
Proof of 96 VRAM? And which version of qwen, base or finetune?
>>
Is there any way to get koboldcpp's TTS support to work when you're not running everything using localhost? It doesn't work when you use the cloudflare option, and it doesn't work if you give koboldcpp a self-signed SSL cert either - you just get a connection error trying to talk to the alltalk instance, and yes, I did make sure alltalk is listening on 0.0.0.0, and you can open the API in a browser on a different machine.
Shame, because koboldcpp does the best job of integrating STT and TTS, it just doesn't seem to work outside of localhost.
>>
https://x.com/emollick/status/1851275357195784596
https://arxiv.org/abs/2410.18417
>>
hoooooleeee shiiiit
l-m-geee is sooooo sloooow
>>
We peaked with Mythomax
>>
>>103026357
here's a filler post with no content like you so desperately crave
>>
File: file.png (572 KB, 2318x2024)
572 KB
572 KB PNG
>>103026321
Here's the undeniable truth. Magnum, of course.
>>
>>103026357
Sorry, I haven't been posting much since I'm binge watching animes
>>
>>103026352
>Large Language Models Reflect the Ideology of their Creators
Expected, but it's cool to see some numbers and how they tested it. Interesting paper
>>
>>103026357
Too busy jacking off to nemo finetunes.
I managed to find a set up that hits just right.
>>
File: 1709480441778708.png (965 KB, 780x560)
965 KB
965 KB PNG
>>103026352
>>
File: file.png (93 KB, 836x660)
93 KB
93 KB PNG
Free ad space continues
>>
>>103026497
>pic
NFT jeet vibes
>>
File: 1706307450708613.png (752 KB, 726x500)
752 KB
752 KB PNG
>>103026473
trackingai.org/IQ
>>
>>103026473
What happened to 3.2? Did it get eyes?
>>
>>103026530
Oof, not looking good /pol users
>>
File: file.png (24 KB, 723x87)
24 KB
24 KB PNG
>>103026497
>We need ai for business, interaction with children

>Why are kids being lonely incels and killing themselves?
>>
>>103026473
>zuck's model is closest to me (center blue quadrant)
>>103026530
>zuck's model is clinically retarded
interesting
>>
>>103026530
polturd incels in shambles
>>
File: llm-sizes.png (84 KB, 784x528)
84 KB
84 KB PNG
>>103026352
Interesting to see the estimated sizes. I refuse to believe that Opus is 137B while Gemini is 1.5T.

>>103026473
Wait, Llama-3.2... Is more based? NIGGERGANOV ADD SUPPORT RIGHT FUCKING NOW YOU MOTHERFUCKER!!!
>>
>>103026530
tf is this shit, Claude 3.5 sonnet is the goat, it should have a better IQ than o1
>>
>>103026384
thank you
>>103026425
>>103026453
no excuses
>>
>>103025366
>Nah I can do better than your slop with only 5% of the dataset size. Keep coping though
I thought about using old usenet posts for chat-style content, but the spambot plague ruins everything after about 1995, and going further back it's hard to find fully archived groups.
>>
File: 1699739461914600.png (569 KB, 3454x2021)
569 KB
569 KB PNG
Why aren't we using this neuron steering/explanation thing?
https://monitor.transluce.org/dashboard/chat
>>
>>103026666
Not local, it uses gpt-4o and shit to do stuff.
>>
>>103026640
Not sure how useful it'd be, or maybe you know it already, but there's some stuff in textfiles.com that you may find useful.
>http://textfiles.com/directory.html
It's not huge, but it's certainly bigger than the 10-fucking-mb "gutenberg" datasets.
>>
>>103026728
Sadly most groups are afraid to use copyrighted stuff and only use a small amount of shitty public domain books. That is why only companies in places like China have a chance of catching up.
>>
>>103026750
Not sure if you're talking about gutenberg or textfiles. textfiles is public stuff. And i'm sure most, if not all, of gutenberg is public domain already.
I'm fine with public domain books. But i think it's a bad joke to train on excerpts from like 16 books when there's THOUSANDS of them and then add "-Gutenberg" to the model name and claiming it to be a story writing model.
>>
>>103026666
How do you know that a neuron that activates a lot when it says something retarded when asked question A doesn't make it say something smart when you ask question B?
>>
>>103026582
pic clearly doesn't make any sense
>>
>>103026357
I am just as happy as you are that all the newfags finally left.
>>
>>103026829
nta, but some anons would be happy to tweak that one neuron that makes the model repeat "nigger" unprompted until they overflow the context. Seems to be their only interest.
Otherwise, it's cool being able to see the inner workings of these things. Even if we cannot meaningfully change it, you can see how the models start going sideways with some topics. Much like you, i doubt they can be naively manipulated without side effects.
>>
File: file.png (703 KB, 768x768)
703 KB
703 KB PNG
rabid pochi
>>
>>103026750
There's always this stuff:
https://huggingface.co/datasets/detain/literotica-stories
https://huggingface.co/datasets/mpasila/Literotica-stories-short
Somewhere on huggingface is a collection of erotic stuff, I'm looking.

Then there's this:
https://huggingface.co/datasets/EleutherAI/pile
>>
File: no cap.png (44 KB, 653x213)
44 KB
44 KB PNG
Amazing with models nowadays can do even with cards that are written like shit.
Llama 1 would be barely coherent and have only an 8th of the context.
And that wasn't too long ago.
Love to see it.
I'd still like to see what Nemo fine tuned on the superCOT dataset would look like, for shits and giggles.
You should give it a try Drummer.
>>
>>103026930
>Amazing with
Amazing what*
Fuck.
>>
>>103026728
>Not sure how useful it'd be, or maybe you know it already, but there's some stuff in textfiles.com that you may find useful.
Usually finetuning a model on random text files is not particularly useful, as it's highly likely that they were already present in the pretraining data. A finetune generally needs to have some goal, direction and a uniform format.

> It's not huge, but it's certainly bigger than the 10-fucking-mb "gutenberg" datasets.
A 10MB selection from Gutenberg could be useful as long as people don't delude themselves that training on it for 1 epoch will turn the model into a storywriter. 10MB are probably more than enough for overfitting the model to the writing style of the books, on the other hand.
>>
>>103026930
What model
>>
>>103026983
I'm the same anon that was gushing about Rocinante v1.1 last thread.
Settings are :
>Temp:2
>Top-k:15
>Min-P:0.05
Nothing else.
I also have a prompt in the Last Assistant Prefix :
><|im_start|>
>Narrator begins messages with one of the following types of writing: dialog, the..., pronoun, noun, description, narration.
>
>TAGS: character objectives; {{random:scenario change::move narrative forward}}; OOC; {{random:affirming::questioning::varied::creative::surprising::random}}; R18; {{charJailbreak}}<|im_end|>
><|im_start|>Narrator (one short paragraph)
Then I use the jailbreak field of the card to inject relevant tags.
Kind of esoteric I guess.
I also tried XTC, but it more often than not made the model choose a dumber completion or meander. I couldn't find a setting that worked well for me. Using a prompt to remind the model to vary how it starts its next message + high-ish temp seems to work better than forcing it with samplers, like some other anon said a couple of threads ago.
Still testing that last one, but so far, so good.
Also, having bloated character cards or Author's Notes seem to help the model be less dumb, which I was surprised by.
>>
https://youtu.be/F_7IPm7f1vI?si=M0VvPbF39_gdr4Dr
>>
>>103027119
But is it cheaper than a third worlder yet?
>>
>>103027138
cheaper once you account for the street crime and barely 30% longterm employment chance
>>
>>103027138
the great thing about robots is that once you get them down you can mass produce them, then use them to produce more robots, once we get a decent generalist robot that actually works then the costs won't really matter much
>>
>>103027119
I love the fucked up rotations. And how confused it looked at 1:20.
>>
File: Untitled.png (59 KB, 749x387)
59 KB
59 KB PNG
>>103024340
Go be mentally ill somewhere else.
>>
>>103024340
I routinely use Largestral and I'm the most prolific it's over poster in this thread, cope.
>>
New tts https://x.com/reach_vb/status/1851629504348754202
>>
>>103026965
>Usually finetuning a model on random text files is not particularly useful
This guy did something similar on top of an instruct model and the result is pretty good:
https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter
https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter/discussions/3#663b1ef5fb931d4660b9b854
>>
>>103027292
>No japanese
straight in the trash
>>
>>103021681
Building on this anons request. What are some good erp models? I've been using Magnum v2 32b Q8 and have been enjoying that for a while. But I would like to know of some alternatives. It can also be higher model, provided it's good for it's class.
>>
File: 1718091446119954.png (732 KB, 584x644)
732 KB
732 KB PNG
Applel... I kneel... https://x.com/MaxWinebach/status/1851641559436992607
>>
>>103027383
Enjoy your slow token processing times.
>>
>>103027383
>500GB/s
wow, that's the same bw as a single socket epyc in a consumer laptop. Impressive
I wonder how much 192GB or whatever the top-end chip is will be?
>>
>>103027383
the m4 max version will be $4000
>>
>>103027292
>british accent
to the landfill with it
>>
>>103027383
k-k-k-kino
why yes I will be purchasing a max-specced m4 max mbp as a gift to myself
>>
>>103027421
>192GB or whatever
128gb. can't even run mistral large at q8 with context
Maybe there will be other options available when they refresh the mac studio
>>
>>103027383
Is that mtx thing of theirs better than the llama.cpp metal backend?
>>
>>103027383
Prob 3-4 for maxed out one. Cant even run 405B. Just get 3090s and a server motherboard.
>>
>>103027383
>Starts at
Yeah the one that can run 200B sure as fuck is going to cost more than that.
>>
>>103027552
the most modest one you can order with 128GB memory (14in, standard screen, 1tb ssd, no addons) is $4700
>>
>>103027292
I'll wait for quants and implementation in C
>>
File: HiMyNameIs.png (971 KB, 832x1216)
971 KB
971 KB PNG
Good morning, /lmg/
>>
>>103027642
Yea for that you could easily get a system that could run 405B faster than it would run any 200B.
>>
>>103027706
Good morning Vibrant Miku
>>
>>103027776
genuinely can you? at that pricepoint running 405B is limited to cpumaxxing builds and those are not going to offer better perf most of the time, usually around the same realm just with more memory
>>
File: DaiShokku.png (1.37 MB, 832x1216)
1.37 MB
1.37 MB PNG
>>103027642
>tfw the cheapest mac studio 192gb on ebay is still $7500...
macs look so promising on paper...why do they have to be so hard to minmax?
>>
>>103027851
>genuinely can you?
nta, but you'd need to run linux on your mac with some jank egpu setup to get cpumaxx level speeds, and you'd still be limited to model quants that clocked in around 100GB (realistically...there are limits to how much of that memory you can give your inference process).
At that point, the performance would probably be top-tier for that model size, but on-die ram means you've got what you've got...forever
>usually around the same realm just with more memory
That's the point, I suppose. perf/$ spent is probably still higher with creative cpumaxx chinkshit builds vs apple, and it becomes a non-comparison once you go past 100GB models since apple just won't be capable of doing the job at all.
Hopefully someone that actually runs that kind of mac setup can chime in and tell me I'm rarted...I really want it to be a viable option
>>
>>103027383
This + layerskip differential transformer.
>>
>>103027851
2nd hand 3090s with server motherboard and mining case with riser cables.
>>
>>103019207
hello /lmg/, I am on the hunt for the best of everything. I want to know the current best methods and models for performing

>text to speech
>speech to text
>image to text (OCR)
>text to image (is it still stable diffusion?)
>removing image backgrounds
>image to annotation/caption
>image to face count
>image to tagging (deep dan booru is the best I've found)
>coding assistant
>>
>>103028014
>2nd hand 3090s with server motherboard and mining case with riser cables.
This. If you're competing with ~128GB class machines, then doing a CUDADev style build of 5 or 6x3090 is going to be your best value
>>
>>103028061
>6x3090
Local energy providers say thank you.
>>
>>103028014
>>103028061
Maybe I'm the weirdo here but I really don't like the idea of running massive multiple PSU setups... I'd rather pay the premium and get an a6000 to run models at 72gb in a standard frame while also being able to game with a regular card.
>>
>>103028080
Well hopefully a company eventually fills the local inference hardware niche one day.
>>
File: 1701461058398894.png (974 KB, 2398x1598)
974 KB
974 KB PNG
>Meta and HuggingFace partner with Ollama
>Apple partners with LMStudio
Why does llama.cpp have this pariah status?
>>
File: misaki.png (647 KB, 850x850)
647 KB
647 KB PNG
In-character refusals are a good indicator of intelligence for RP.
>>
>>103028080
Not really
I PL mine to 200W and they're still plenty fast for training even.
>>
>>103022896
Okay, but what constitutes a free language model?
>>
>>103027991
This is the way. People will stop laughing at "itoddlers" when they realize you can run q6 120b at 30 t/s for 1.5k while their triple 3090s cap out at q3 lmao.
>>
>>103027851
Turin procs are starting to hit ebay now. Still $6500/socket, but...that's a viable path to 24 channel ddr5-6000 if you've got the cash to spend. Theoretically 1200GB/s memory bandwidth, which starts to be competitive with an imaginary multi-terabyte 4090
>>
>>103028298
Go train a lora on your crapbook.
I'll wait.
>>
>>103027230
>Untitled.png
come on my nigga
>>
>>103028277
>>103023019
>>
>>103023019
>Rio de Janeiro, Brazil
Ayo what?
>>
>>103028324
It's a screenshot of my IPMI window pasted into mspaint. This confuses and enrages the itoddler.
>>
>>103028359
so you dont know how to take cropped screenshots?
>>
File: compass.png (269 KB, 756x710)
269 KB
269 KB PNG
>>103026473
>>
>>103028314
>wow what's that giant SERVER you're having there anon? it's basically cooking me here
>it's my uh 3090s gpu rig...that i use to uhh train my AI waifus
>oh
>>
>>103028405
You talk like a jew.
So you're probably jewish
I require no further punchline.
>>
>>103028298
How long does it take to process a 16k context?
>>
>>103028341
You asked for a definition. You've been provided with one. I'm sure there are others. But it boils down to publishing everything needed to replicate a model.
>>
>>103028314
Training is still expensive and hard to do for anyone without h100 clusters in backyard plus the knowledge of right/wrong decisions with datasets and stuff.
>>
Fellas, is it Jewish to get your money's worth while also not scaring the hoes?
>>
>>103028160
lol being homeless from buying too many pururin nedroids is a perfect analogy for the kind of cash we're spending on this hobby
>>
>You go on 4chan? Ew! Get away from me!
>>
>>103028382
not that anon, but i do it like that too, go eff yourself
>>
>>103028442
I didn't ask for anything, what are you talking about?
I was just surprised that there were people from my country doing anything noteworthy regarding AI, even if it's essentially a meta-analysis.
Pretty cool.
Although I find it weird that one of the criteria is having a model card, but that's fine, it's all well laid out and people can make their own conclusions.
>>
File: modelcard.png (43 KB, 444x197)
43 KB
43 KB PNG
>>103028495
From the published paper.
>>
>>103026473
>>103028395
>>
>>103027292
does it support finetuning?
>>
>>103028608
Okay, so they are using "model-card" as a short hand for a description of the architecture and other details, which could be redundant with other things like papers and a github repo, but some models might have two of three, or only one, etc.
Alright, that makes more sense, thank you for spoonfeeding my lazy ass anon.
>>
>>103026473
Yep it's dead for anyone with a dime of self-respect and sense of purity i.e. anyone who is not a porn infested cuck or male feminist.
>>
>>103022524
Thanks.

>>103028139
There are known tech companies that internally use llama.cpp for inference on their servers but simply do not publicly disclose this.
>>
>>103028139
>Why does llama.cpp have this pariah status?
It took niggerganov 6 months to add a sampler.
>>
so where do you guys get your hardware? I cant find a single reasonably priced p40 or 3090.
>>
>>103028650
>could be redundant with other things like papers and a github repo
That's mostly for the benefit of end users. Few people read through papers, fewer understand them. A good model card gives a general overview of the model, how it compares to others, its unique features, what it was trained on, etc. without bogging you down with details.
>>
>>103028708
Yahoo. You won't find reasonable priced P40 anywhere, and at this point it's an e-waste anyway. Cheap mi60 is the new secret meta, but very hard to find.
>>
>>103028160
I don't know how well this generalizes to normal Mistral Small and other finetunes, but after testing a particular chat with Cydonia 1.2, anything below Q6_K_L literally can't have the character stop mid-action and reconsider what they are doing (according to common sense and their character card).
>>
>>103028708
>>103028855
For reference, see: https://www.ebay.com/itm/125006475381
>>
>>103027383
>The theoretical maximum tok/sec with this setup is now (546*2)/205 = 5.32 tok/sec.
https://x.com/ac_crypto/status/1851666807159140546
>>
>>103028967
That's for llama 405b. To make llama 405b about 200gb you have to quant it lower than q4. The bandwidth is being multiplied by two, i assume, because of the following link with 2 max running a model, but they run sequentially, right?
So, the best you could possibly get is half that. And then add all the overhead and a few dozen ms per token in the context.
>>
>>103027383
1.5x the speed of a P40 is still not that great to be honest.
>>
File: madebyapple.png (400 KB, 640x480)
400 KB
400 KB PNG
>>103029178
But it's so pretty and shiny.
>https://www.youtube.com/watch?v=9BnLbv6QYcA
>>
>>103028100
>don't like the idea of running massive multiple PSU setups
Ayyy that me. A6000 gang what up
Double up or hell even pair it with a A4000 for additional fun, those things don't take up space at all and hardly use power for the VRAM they provide.
>>
>>103029221
that wheel thing made me remember a few years back there was talk about how when you speak in your inner monologue, like reading text and stuff, your larynx still makes minute movements as if you're actually speaking out loud, and some device could be placed there to monitor the movements in order to input text into a computer. something like that and an in ear monitor for talking to your LLM basically with your mind would be pretty cool and schizophrenic.
>>
>>103027383
Do people not feel bad when they buy high tier parts in a laptop? I can understand buying some mid tier laptop or something for convenience when you already have a desktop. But I could never buy a laptop like this...
>>
>>103029371
I never got a laptop again after my old one was overheating itself constantly. Though I don't know if it has gotten better since then.
>>
>>103029358
Use mutalk 2, goyim
>>
File: scary.jpg (116 KB, 1252x1252)
116 KB
116 KB JPG
>>103028461
>scaring the hoes
two steps ahead of you anon, if the anime girls on the model cards don't scare them my music surely will
>>
>>103029221
How did people do this before ai video tech? This is crazy! Also I learned that mac has 1 mouse button a few years back and I thought it is a joke.
>>
File: 1730312418997.jpg (18 KB, 474x265)
18 KB
18 KB JPG
>>103028018
bump
>>
>>103029433
an old gf of mine had a plastic device like this and would use it to silence her moaning while I pounded her with my cock
>>
File: mgs3.jpg (106 KB, 1200x640)
106 KB
106 KB JPG
>>103029433
>It uses currently available technology
>>
>>103029467
The power of marketing and lifestyle products.
For ai stuff, if i had the money to buy a cool mac, i would. After getting a 6x4090 rig, of course. But for a lot of people convenience is a big factor.
>>
>>103029482
Try r/localllama. And now fuck off.
>>
>>103029482
whisper
gpt-sovits
no idea
flux
no idea to the rest
>>
>>103028915
thank you king
>>
>>103028915
>32 GB
>1 TB/s
>160W
>300$
where's the catch?
>>
>>103027077
That's interesting, you're not telling the model to roleplay but takes the narrator's place
>>
>>103028018
>image to text (OCR)
>image to annotation/caption
Probably Qwen2-VL-72B.
>coding assistant
Probably Qwen2.5-72B.
>text to image
Flux. But for anime the SDXL fine-tunes like NoobAI are better.
>removing image backgrounds
>image to face count
Probably SegmentAnything2.1 or Florence2.
>>
>>103029772
linux only
>>
>>103029772
ROCm shenanigans, can't connect more than 4, rare.
>>
>>103027077
>>103029775
I thought you faggots finally left.
>>
>>103028018
What start-up?
>>
I wonder if the chinese ram exchange on a gpu industry will grow.
>>
>>103029905
>>103029905
>>103029905
>>
>>103029371
>>103029404
Laptops: When you want to pay twice as much money for half as much computer!
You gotta REALLY need that portable power, or its super-rarted
honestly, a cheapo n-2 ultrabook with ssh and/or vdi is probably a better option for 99% of use cases as well as being cheaper and more powerful overall.
>>
>>103028018
>coding assistant
deepseek 2.5 or llama 405b. Assuming you have 512GB of ram at least
You did say you wanted the best, right?
>>
>>103030157
>You gotta REALLY need that portable power, or its super-rarted
Convenience. You pull it out of the box, you plug it, you're doing ai shit. When you're done, you close it and put it away.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.