[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1711169217932003.jpg (187 KB, 1024x1024)
187 KB
187 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102552020 & >>102544848

►News
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102552020

--Papers: >>102556485 >>102556658 >>102556704
--Techlet seeks advice for running smut models on RTX 3060 and 32GB RAM setup:
>102553022 >102553034 >102553095 >102554016 >102554051 >102555169 >102555266 >102555360
--SDL input example may support whisper.cpp voice recognition on Linux:
>102552838 >102552967
--Molmo 72b local execution challenges and workarounds:
>102552240 >102552305 >102552320 >102552354 >102553463 >102554973 >102552544
--90B model issues and potential improvements with quantization:
>102552694 >102552834 >102552873 >102552940
--90B model fails to interpret hazard symbols, while chatgpt endpoint succeeds:
>102553581 >102553614
--Llama3.2 3B passes ShaderToy test and generates working code:
>102554042 >102555022 >102555072
--Llama 3.x improvements are incremental, increasing context length and adding vision:
>102552674 >102552688 >102552744
--usecublas mmq 0 is now default and makes a big difference:
>102552283 >102552332
--Nala test with 90B model shows improvement but raises questions about test design:
>102552440 >102552512 >102552522 >102552505 >102552520 >102552723
--LlaMA 3.2 3B one-shots Snake Game:
>102552587 >102552652
--L3 Tenyx Day generates working pyqtgraph plot of scrolling sine wave:
>102553616 >102555234
--Alternatives to 90b vision model for captioning and image generation:
>102552399 >102552424 >102552437 >102552443 >102552509 >102552667 >102552679 >102552715 >102552733 >102552745 >102552843 >102552783 >102552824 >102553106 >102553151 >102555291 >102555313 >102555335 >102555367
--Adjusting batch size and ubatch size for prompt processing and layers:
>102552065 >102552157
--90B 4bit bnb model struggles with image description accuracy and spatial orientation:
>102554144
--Miku (free space):
>102552059 >102553803 >102554159 >102554179 >102556227

►Recent Highlight Posts from the Previous Thread: >>102552037
https://rentry.org/lmg-recap-script
>>
I hate the anti christ
>>
>>102557552
dumb spam poster
>>
>Using OpenRouter I tried Hermes 3 70B on a whim and found I actually liked it
>I tried to use it just now and found I got rug pulled
Serves me right for ever using a 3rd party service.
>>
best vision model to send dick pics to?
>>
Why is everyone quitting at OpenAI before the cashout?
>>
>everything new is shit
>nothing happens
it's over isn't it
>>
>>102557712
>Implying anyone but the jew will get money
That's why. Leave now before everything goes to shit completely and your reputation gets tarnished as a result.
>>
>>102557712
people can have morals or brains, but not both
>>
I was away for a day and so much shit happened.
>>
>>102557739
all nothingburgers it's over
>>
>>102557739
nothing happened
>>
>new multimodal drops
Oh cool!
>text and vision
God dammit. When will the dumb vision meme die? It has not given models any better sense of spatial reasoning, it's just a dumb party trick for asking it to explain things you already know for a laugh.
>>
>>102557712
Because (You) only care for money, and don't have a shred on integrity.
>>102557739
Indeed. A lot of SHIT happened.
>>
>>102557712
They plateaued and had to obfuscate the fact. Expect some youtube documentary retell that as big revelation in a couple years.
>>
>>102557739
>so much shit happened.
molmo shill or meta shill?
either way, you know what to buy
>>
>>102557757
that's why they have to cash out and ipo now, before the public knows enough to not buy their bags
>>
>>102557775
what kind of ammo?
>>
>>102557753
Its not even image out. Very underwhelming.
Multimodal for local always means text/image in, text out. BORING.
>>
"multimodal" more like shittimodal
>>
How can i try a 4bit quant of llama3.2 with a pascal multi gpu since there is no gguf?
dont say exllama, that has insane prompt processing time. probably because of pascal.
aphrodite engine? i'm serious by the way.
>>
"multimodal" more like faggotmodal
>>
llamacpp multimodal support when?
llamacpp 3.2 support when?
>>
>>102557801
exllama 2
>>
>>102557775
A 3090?
>>
>>102557791
We have plenty of resources for image out. We need multimodal text+speech models.
>>
>llama 3 is shit
>llama 3.1 is shit
>llama 3.2 is shit
remember when llama 3 was going to save us? and then when llama 3.1 was going to save us? yeah it's over. pack it up.
>>
>>102557827
Right after Jamba
>>
Mikulove
>>
>>102557846
like what..
chameleon and the poor attempts to reimplement what has been cut out. lol
there is no text+image out model as far as i know.
>>
>>102557739
>so much shit happened.
tl;dr? what did I miss?
>>
>>102557841
No. The last stock of 4090s you can still find, then wait for the 4 grand 5090/Titan with 32GB VRAM.
>>
wake me up when one model can write to me, send me pictures, and whisper in my ear. everything else is RNG with extra steps.
>>
>>102557854
Do you have ANY idea what they're planning for L4? I can't say much but, well, let's just say it's just a bit too early to give up hope. Check back in a couple weeks and let me know how over it is or isn't.
>>
>>102557858
is that before or after DRY?
>>
>>102557900
yes yes l4 will totally save us just like l3 did
>>
>>102557900
>two more weeks
kek, almost had me
>>
>>102557712
Because Sam probably made it clear that only him will get the bag, so they didn't see the point on staying and then having to clap for his ass when he'll get the 140b, that's fair
>>
>>102557787
>that's why they have to cash out and ipo now, before the public knows enough to not buy their bags
this, it's probably soon over for OpenAI, it'll probably be brought by Microsoft after that
>>
zoomer doomers are so fucking unbearable lmfao at least they'll all troon out sooner rather than later
>>
File: .png (29 KB, 734x265)
29 KB
29 KB PNG
>>
>>102557992
I have my doubts that Microsoft even needs, or wants to period. It's all about the datasets and staff anyway, which they can get easier (and cheaper) in other ways. They already have the hardware by default too.
>>
>>102558011
yeah Idk, Microsoft doesn't seem to know how to make models, so they better take those from OpenAI
>>
So will I be able to send dick pics to my sillytavern chat soon?
>>
>>102558011
OpenAI's mailing list and customers (despite not being profitable) are pretty valuable to a company like Microsoft. If the price is right they'll buy.
>>
>>102558025
All they have to do is "poach" the staff and data and be done, then remake shit to have full control start to end.
Simple taking the model wouldn't fix their lack of knowledge or skill in how to use, make or improve them.
Granted, neither does OAI, but so it goes.
>>102558040
Just throwing out some ideas, that's all. Not like it will matter to them money wise either way, for obvious reasons.
>>
>>102557892
You just know it will never be allowed. Even if it works out, you'll have the same crippled and censored experience with AI rejecting and calling you "incel chud" for wrong opinions.
>>
>>102558055
>Even if it works out, you'll have the same crippled and censored experience with AI rejecting and calling you "incel chud" for wrong opinions.
tough pill to swallow but it's true, the only way to get out of this is to get a great base model and finetune this shit with based text, but that's pretty unlikely it's gonna happen
>>
>>102558077
>get a great base model
extremely unlikely
>>
File: 1725697593000880.png (92 KB, 717x352)
92 KB
92 KB PNG
Has the anon that made the Director extension put any newer versions out?
>>
>>102558221
>Director extension
whuz dat?
>>
>>102558266
It's a sillytavern extension that adds bits of info to the prompt based on presets and lorebooks, used to tell the AI things like what the character/user is currently wearing, the time of day, weather, etc.

It's like a slightly more automated author's note.
>>
>>102558025
Microsoft can't make anything anymore. It's what you get with a bunch of pajeets.
>>
>>102558285
that sounds pretty neat, you have a link to it? i couldn't find it on google
>>
>>102558300
This is the last version I can find. Not sure if it works with the latest sillytavern
>>101910710
>>
>>102558296
is there any big tech company left that this doesn't apply to?
>>
>>102557534
>>102557696
I believe you're still misunderstanding the reasoning behind my post. Yes, it is expected that for any normal LLM, its performance would decrease with the more difficult problems. That is indeed obvious. I am suggesting that despite that, o1 is still under-performing because of this or that (which may be more clear as you are showing details from the paper I didn't look at, because I was just looking at the summary). My implied reasoning was that if o1 is able to dedicate more tokens to thinking about problems, and its performance improves generally without foreseen limit (note on that at the end of the post), then it should just dedicate more tokens to the more difficult problems and solve them with similar accuracy.

Now, as you have shown, they did note the token counts o1 used. In this case that does push forward the discussion of understanding what happened in the study. Yes, based on the logic I meant to present so far, I would say now that it's possible that o1's performance on the more difficult problems could've improved with even more tokens, and perhaps right now it is just an artificial limit that stopped them from being able to get that data. However, we don't really know, as there is also no data to suggest that it won't stop improving at some point soon or far away.

>Their claim was that having it "think" longer on the same task would increase its accuracy on that task, not that...
OpenAI might not have claimed it explicitly, but it's kind of the idea, that, if allowed to, potentially the model could just get better and better to ridiculous lengths by being allowed to think more. They only said that they would investigate this new scaling behavior, but didn't say anything to quell the implication (and general tone of the article) that it's some new scaling paradigm that will lead us to crazy lengths of improvement.
>>
File: _06136_.jpg (1.86 MB, 4096x4096)
1.86 MB
1.86 MB JPG
>not an eldritch horror
>>
File: 1696589969478603.png (205 KB, 512x467)
205 KB
205 KB PNG
>>102558522
>4096x4096
>that quality
>>
File: file.png (71 KB, 682x554)
71 KB
71 KB PNG
>3.2 90B Vision is super retarded
no not like this...
>>
is the 3.2 90b multimodal model stronger for text-only applications than 3.1 70b?
>>
File: GoodNightAnon.png (1.35 MB, 800x1248)
1.35 MB
1.35 MB PNG
>absolute eldritch horror
>>
>>102558892
not scary
>>
>>102558904
real eldritch horrors never are
>>
>>102558892
>ywn SEX a real eldritch horror
Brehs...

Surely this comment will not come back to bite me in the ass one day.
>>
File: 1711708043999446.png (158 KB, 833x534)
158 KB
158 KB PNG
Wonder how's shitma 3.2 in safety, the most important thing in this world.
>>
Has anyone tried the 1B or 3B for speculative decoding of 70B, and compared it to using 8B, for the draft model?
>>
I've been reading that they kept the text part of the models the same, and just added on the vision adapters, but is that really true? Is it possible to download only the adapter and stick it onto my existing 70B? Also, I feel like this should present some interesting optimization options. Like the adapter's weights are only going to be used when encoding the image, right? So in theory you should be able to get some good overall gains by having the adapter's weights in RAM, assuming a RP use case.
>>
hello guys i need coom model for my 6gb vram 2060 rtx nvidia card from huang grifter and 4x8gb ram fury patriot fx supreme edition with rgb lights (lights are red and green)
coom model need to write ah ah mistress and sex (optionally go to scenes)
>>
so is meta gonna open source the 405b? they have it on their cloud
>>
Where the fuck do I go if I want to discuss local audio models.
You're telling me image, video and text gets their own generals but not anything for Audio generation? Been trying to find some up-to-date audio models for various stuff like Text->Audio or Audio->Audio, all I find are shitty reddit posts from 9 months ago on how to add audio to ERP LLMs.
>>
>>102560550
You can discuss it here since it lacks alternatives.
Here you go
https://play.ai/
>>
>>102560550
r/localllama
>>
>>102560186
they made a 405b vision model? sounds stupid, there's no use for one that large
>>
>>102560556
I'm sick of these 20 online-only signup garbage services, is there not a single good local alternative at this point? Feels like nothing has happened on local models since Elevenlabs came out ages ago.
>>
>>102560550
im gatekeeping that stuff for myself since anons here are baby duck retards anyway
>>
File: 1727346590782.jpg (31 KB, 236x236)
31 KB
31 KB JPG
>>102557546
>Chatgpt advanced model rolls out which features real-time and emotional responses
>Local models are still stuck in early 2023 figuring out optimal ways of converting speech to text
Open source gets btfo'd again
>>
>>102560550
You go to r/elevenlabs and r/SunoAI
>>
>>102560590
on the contrary, vision models will be actually useful for the first time probably around the 2T range
>>
>>102560876
>akshully there's no use for one that small
ok so you agree it's useless
>>
>>102560550
fish is great, 60% of the time, it works every time
>>
>>102557900
>Introducing: llama-4. This state of the art model now uses an improved tokenizer that prevents model from outputting any adult oriented material. We just removed all the dicks, blowjobs, loli etc. And if the model realizes the safety measures were circumvented it calls an external function to delete itself from your hard drive.
>>
>>102560900
why are you guys asshurt over this when there is an endless supply of porn on the internet
>>
>>102560941
it's really easy to flip this question around
why are ml devs obsessed with preventing porn generation when there's an endless supply of porn on the internet
>>
>>102560941
An interactive experience tailored to your personal tastes is infinitely better than anything else you can find.
>>
>>102560957
data is crucial to ml development and porn is slop
>>
>>102560965
in that case ml devs should love it, because they fucking love slop
>>
>>102560965
lol
>>
>>102560965
Careful with trvth like that... We're not ready...
>>
>>102560941
That is kind of a mid bait because I don't even know how to respond to you. I want to touch my penis to the text written for me specifically. And my niche fetish is hard to find. I am not like the piss anon who can find terabytes of girls pissing themselves.
>>
>>102560590
>>102560890
it's a research model you fucking mong
>this is only useful at this size!
good I guess we'll just never make any progress since the intermediary steps aren't useful for practical reasons
>>
>>102557712
Altman is just getting rid of the people who tried to push him out a year ago. He's preparing his step to become the god-king of modern AI.
>>
>>102560976
I can guarantee that my fetish is rarer, and even pyg got me off pretty well.
You're just lazy/dumb (same thing).
And don't bring piss anon into this.
>>
>>102558343
The hosting period is expired, any chance you mind sharing?
>>
new model when?
>>
you can tell the state of things is good when the thread is slow, means everyone's too busy having sex with their graphics card to shitpost here.
>>
>>102561051
>And don't bring piss anon into this.
why?
>>
>>102561613
why aren't you having sex with your graphics card instead of shitposting here?
>>
>>102561626
long refractory period
>>
File: ED.jpg (435 KB, 2125x1411)
435 KB
435 KB JPG
>>102561613
>busy having sex with their graphics card to shitpost here.
I had to stop. I can't perform.
>>
Did we ever had a good comparison point between a MoE and a monolithic (close to) equivalent or are these Molmo models the first time we can do a somewhat like for like comparison?
>>
File: 463912767.webm (176 KB, 438x256)
176 KB
176 KB WEBM
On my coding challenge from yesterday (create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)): Qwen 72b succeed at it, deepseek coder v2.5 also doesn't quite get it, llama 405b also fails, so far only qwen 72b and gpt 4o did it, it seems to be a problem similar to when you ask a question that the model has seen a lot in its training data but tweak a detail, it ignores the detail and defaults to the more "general" behaviour
>>
>>102561717
In general dense will always be better quality wise but the point of Moe is that you only run part of parameters so you can offload to regular ram and still get usable speed. Mixtral was the best example where I could run Q5 on 24GB vram at 5T/s
>>
>>102561633
if you are so weak you can't even overcome the refractory period, you'll never be able to handle the next gen of locals
it's over for you
>>
>>102561744
nta but its going to take over a day for mine. local is single thread and slow when you do a lot at once
>>
>>102561725
You can easily push outside distribution when programming, it's very information dense and rigid, not like creative language where it's easy to mask the simplistic mechanics in the model. On the other hand, they're great for automating repetitive boiler plate bullshit, it's a fucking treat when the model shits out getters and setters.
>>
>>102560550
That general is on /mlp/ unironically
>>
So now that llama sota models are multi-modal, will lcpp finally have to support something other than text?
>>
>>102560811
There are a dozen of them on Github, but you can't code for shit zoomers.
>>
>>102561800
seems like ollama might end up supporting it before upstream
https://github.com/ollama/ollama/pull/6971
https://github.com/ollama/ollama/pull/6965
https://github.com/ollama/ollama/pull/6963
(Coming very soon) 11B and 90B Vision models
https://ollama.com/blog/llama3.2
>>
>>102561800
ggerganov:

>My PoV is that adding multimodal support is a great opportunity for new people with good software architecture skills to get involved in the project. The general low to mid level patterns and details needed for the implementation are already available in the codebase - from model conversion, to data loading, backend usage and inference. It would take some high-level understanding of the project architecture in order to implement support for the vision models and extend the API in the correct way.

>We really need more people with this sort of skillset, so at this point I feel it is better to wait and see if somebody will show up and take the opportunity to help out with the project long-term. Otherwise, I'm afraid we won't be able to sustain the quality of the project.

https://github.com/ggerganov/llama.cpp/issues/8010#issuecomment-2376339571
>>
>>102561867
>seems like ollama might end up supporting it before upstream
It's not like it's that much work. They just need to copy-paste the cli code into the server. They can even use the original server multimodal code from earlier this year as a template.
llama.cpp could do it too, if they wanted to. But ggerganov refuses to add it back in only because the code isn't elegant enough or something like that.
>seems like ollama might end up supporting it before upstream
Still embarrassing.
>>
>>102561910
>But ggerganov refuses to add it back in only because the code isn't elegant enough or something like that.
llama.cpp abandonware
>>102561905
>>
>>102561910
That's actually kinda based, I'll wait.
>>
>>102561929
hi cuda dev please dont spam blacked miku in rage when ollama adds 3.2 support k?
>>
>>102561943
I'm not who you think I am. I'm just the dude who wrote the OG Miku prompt back in the llama 1 days, can't believe the amount of asshurt it has caused over time.
>>
Honestly, imo qwen 2.5 72B IQ4XS with 4-bit KV cache has been alright. Unlike miku and cydonia, it manages to keep a secret written in a card I'm using, but it just loves repeating literally the same sentence(s) verbatim, even when I crank up DRY and/or rep pen
Don't know if it's my writing or the model. I feel like a finetune could really make it shine. Haven't used it for sex yet, but it doesn't complain during foreplay at all
>>
>>102561905
>We really need more people with this sort of skillset, so at this point I feel it is better to wait and see if somebody will show up and take the opportunity to help out with the project long-term.
You know what, from the point of view of a main maintainer of a large open source project, that's fucking fair enough.
>>
>>102557552
Why did the bot linking break? Did you shift to a lower quant?
>>
>>102562135
can't have more than 9 refs now, not 100% sure why, but here's where it was noticed
>>102478518
>>102478544
>>
We all talk about models and shit, but how do you guys write your char cards?
>>
>>102561905
That's what happens when you try to support everything. It gets too big to maintain.
>>
>>102562150
plain language, formatting tags are irrelevant
[char's name] is X, Y, Z. [char's name] has X, Y Z. [char's name] does X, Y Z.
no {{char}} or {{user}}, ever
>>
>>102562148
Ah. Well, probably for the best, honestly. No more fucking threads where some asshat spamquotes every post in the thread with NIGGERS NIGGERS NIGGERS
>>
>>102562238
Gotta admit it's funny that LLAMA.cpp wants to wait for supporting LLAMA 3.2 of all things. Guess they want to avoid another llama3.x incident and weeks of bugfixes.
>>
>>102562260
but {{char}} and {{user}} is converted to plain language in ST with their appropriate names........
>>
File: Untitled.png (65 KB, 713x718)
65 KB
65 KB PNG
>>102562150
i just do shit like this then write out the first message. or grab something off chub and remove all the {{char}}s to not fuck up the context shifting.
i (probably incorrectly) assume wrapping stuff in square brackets keeps it from trying to emulate the terseness of the factoids in the actual chat.
>>
>>102562260
>no {{char}} or {{user}}, ever
Why?
Writing Nala is a lioness is the exact same as writing {{char}} us a lioness, so I guess it's a wash in this case, but for {{user}} at least it makes more sense if you want to use different personas and have it be referenced in the card itself.
>>
File: latest.png (315 KB, 680x459)
315 KB
315 KB PNG
>new multimodal model release
>look inside
>text and vision
>>
>>102562260

I'm the anon who asked earlier. That's it? I mean, I've been banging my head against the wall trying to get my characters all formatted in Alichat and plist, and it worked fine up until Llama2. But ever since Mixtral, L3, and Nemo dropped, I've got this feeling that Alichat is responsible for a ton of repetition and pattern sticking (in a bad way). Honestly thought you guys would have some more advanced LLM wizardry than just, "Nah, you're good, just use plain text."
>>
>>102562327
Anon is right in that clear, concise, plaintext is the way to go.
Some models seem to react well to tab based indentation for lists too, but it's generally unnecessary.
>>
>>102562148
>not 100% sure why,
Because the anon that always shits up the thread is one of the mods who has a clear anti-AI agenda and Hiro is too much of a cuck to defend his website from subterfuge.
>>
>>102561806
Name one
>>
>>102562359
>Anon is right in that clear, concise, plaintext is the way to go.
Thank god. I suddenly feel the urge to make cards again.
>>
>Llama 3.2 1B and Llama 3.2 3B
>Mogged by Qwen
>Llama 3.2 11B and Llama 3.2 90B
>Mogged by Molmo
>Voice modality
>Only on Meta AI chat, enjoy your text and image modalities
Um... bros?
>>
>>102562367
Whisper?
>>
>>102562414
Thats not mutlimodal, its just speech to text.
>>
>>102562298
>>102562312
{{char}} refers to the name of the card, not necessarily the name of the character(s)
same for {{user}}, requires changing persona even if you just want to use a different name
>>
>>102562412
>Only on Meta AI chat
Honestly I'm still bitter about this one. They talked about speech understanding in the Llama 3 paper and showed it was better than Whisper, only to not give it to us.
>>
>>102562439
Here at Meta safety is our top priority. We don't want people to get PTSD from thinking about somebody doing something privately in their own home where they have no way of knowing if it's actually occurring or not.
>>
Is Tiger-Gemma-9B-v2 q8 the best uncensored model for writing that I can run on 16GB vram?
I cant get nemoremix 12b q8 running with ooga.
>>
>>102562437
That's enough for most usecases
>>
>>102562479
>12b q8
Have you tried q6?
Specially for nemo it should have very little degradation over q8 thanks to quantization aware training, if I'm not imagining that that's a thing.
>>
>>102562479
>I cant get nemoremix 12b q8 running with ooga.
Show the errors of you want help, you retard. I'm sure your context is set to high.
>>
>>102562508
Oh yeah, that's a thing.
Its configs defaults to a sky high context size.
>>
>>102562479
probably this
>>102562508
nemo default to 1 million context for some reason
>>
File: 1727327157322068.jpg (49 KB, 512x512)
49 KB
49 KB JPG
>>102562491
>no use case
>Just wait 30-60 seconds for your speech to be converted to text then wait again for the main llm inference
>>
>>102562517
>>102562521
>nemo default to 1 million context for some reason
It's what config.json says.
And yeah. We've only had 672314 anons with that problem so far. Very rare. And none of them can read the terminal output.
>>
>>102562439
coming to local right after the US elections... right bros? right?
>>
>>102562535
Whisper inference speed is practically real-time faggot
>>
>>102562551
As long as the correct candidate wins, maybe.
>>
File: buggedcpp.png (441 KB, 449x407)
441 KB
441 KB PNG
>>102561905
literally
>>
>>102562600
not literally. ollama is digging as we speak
>>
is there really that much of a difference between Q4 and Q8 to justify using twice as much vram?
>>
>>102562607
Considering that VRAMlets are always crying about their models being retarded I would assume so.
>>
>>102562544
>And none of them can read the terminal output.
can you blame them? it's not in twitch or tiktok format
>>
https://www.nature.com/articles/s41586-024-07930-y

Paper on how more and more instruct tuning will eventually make models wind up giving inaccurate answers. It's nuts how they just keep slopping their own models into the grave.
>>
>>102562635
They just need it respond coherently and make function calls, so they don't care. We just need a large base model that wasn't trained on a filtered dataset, but no one will ever release something like that.
>>
>>102562629
Yeah. With enough AI, it wouldn't surprise me if error messages start being converted to little clips of an indian explaining how to fix them.
>>
Is it worth running a 405B model out of swap if I can't do it in RAM, but I'm doing storywriting and don't really care about speed. Does anyone do it?
>>
>>102562635
>as task avoidance decreases the odds of giving a right or wrong answer (aka any answer) increases
Wow. Imagine my fucking shock.
What a fucking scam study.
Anybody quoting it is a fucking retard that didn't even read it.
Holy fuck.
Jesus Christ.
Academics are such pseud fucking retards.
>>
File: 1718826874765267.png (111 KB, 1771x944)
111 KB
111 KB PNG
>>102562607
Depends...
>>
>>102562681
If you don't care about speed, sure.
Just be careful to not burn your SSDs down.
>>
>>102562635
Not surprising. When it comes to corpos and gaming human preference, a pleasant sounding response > a correct one.
>>
>>102562683
>>as task avoidance decreases the odds of giving a right or wrong answer (aka any answer) increases
Yeah, exactly. I'd rather a model avoid giving an answer to a task that's too hard than give a wrong answer.
>>
>>102562635
Alexandr Wang is not gonna like this
>>
>>102562681
it's going to be unbelievably slow and awful for your disks, just don't bother
run something you can do in RAM, the difference in quality isn't worth it
>>
>>102562702
Doing tasks in the first place is an emergent property of doing instruct tuning you dumb fucking retard. You fucking braindead pajeet moron.
>>
Why aren't we using the base model to complete conversations instead of using instruct models, again?
>>
>>102562696
>>102562718
Can SSDs really be worn out by reads? I really don't mind starting a generation and doing something else while I wait.
>the difference in quality isn't worth it
In my experience, increasing the parameters has always been worth it. Back in the day, stepping up from LLaMA 1 34B to running 65B out of swap is what convinced me to get 64GB of RAM for my current PC. But I figured I'd ask here before downloading a 405B model since they're huge.
>>
>>102562726
Obviously, but there's such a thing as overtuning or tuning shit wrong. Even if the study is flawed, it's plain to see that models are becoming way too overconfident in their answers (given the dramatic rise lack of variability on rerolls), and that some training make them avoid answering questions they have low certainty on (or provide disclaimers) could do some good.
>>
>>102562778
base models require more effort to do what you want
>>
>>102562726
The emergent behaviour comes from the language training, not from the instruct tuning.

>>102562702
It cannot know what it doesn't know. They have no introspection. They just complete text the best they can. Sometimes it's not good enough.
>>
>>102562781
>Can SSDs really be worn out by reads? I really don't mind starting a generation and doing something else while I wait.
Basically: no. The amount of damage a read does to an SSD is totally negligible, you'd have to be reading for years straight to cause any sort of harm. The reason using paging with SSDs is bad is because it's effectively RAM, which is constantly being written to and changed. For volatile memory, that's no problem, but for drives, it's a disaster.
>>
>>102562778
The modern LLM user is a lot more lazy and spoiled than us gpt3 veterans. Nobody can prompt anymore even with instruct, using base models is far beyond their capabilities.
>>
>>102562412
>Mogged by Qwen
*only in coding/maths
>Mogged by Molmo
*only in vision
There is no single model with the coding, math, language, vision, and everything knowledge of the world, all in one, as GPT-4o and 3.5 Sonnet. Though it's nice that there are now finally ones in each category that are on par with them. Well except voice, but even Sonnet 3.5 doesn't have voice like 4o.
And honestly 4o voice is not that great now that it's been censored to hell and has an hour daily limit. Yes I have it.
>>
>>102562778
Well, I want to, but they stopped putting out base models. NAI's the closest thing there is to a text completion model, at the moment.
>>
>>102562801
>The reason using paging with SSDs is bad is because it's effectively RAM, which is constantly being written to and changed.
I don't think this happens if you have a swapfile/partition configured. An mmapped GGUF file getting paged in should only be reads. I only remember seeing the reads, not the writes, get pegged in htop back when I ran 65Bs on a 32GB system.
>>
>>102562801
>>102562866
Sorry, I mean if you *don't* have a swapfile/partition configured
>>
>>102562412
Llama4 trained on a gorillion GPUs and ultra high quality and safe tokens will take the crown again bro
>>
>>102562778
It feels like you trade slop and repetition for less coherence and comprehension. Not exactly a step up.
>>
>another day
>nothing happened
I guess it's really over this time
>>
>oysters
>>
>>102562840
>but they stopped putting out base models
Did /lmg/ forget about Nemo already?
>>
>>102562994
>the best ones are small and open
lecunny strikes again
>>
>>102562994
That analogy is gonna bite him in the ass.
>>
>>102562994
oioioioioi
>>
>>102563010
Sorry, base models at a size above "unusably retarded", my bad.
>>
File: yann-lecun.jpg (30 KB, 543x543)
30 KB
30 KB JPG
LLMs are like lolis: the best ones are small and impressionable
>>
>>102563031
You got 72B Qwen a fucking week ago.
>>
>>102563031
>>102563010
Qwen 2.5 72B has a base model too.
>>
>>102563031
>https://huggingface.co/ai21labs/AI21-Jamba-1.5-Large
>https://huggingface.co/Snowflake/snowflake-arctic-base
>Nooo.. that's toooo big!!!! I want it just the right size!
>>
>>102562994
LLMs are like women
>>
>>102563068
filtered trash just like llama and qwen
>>
>>102563054
>>102563056
Didn't they turbo lobotomize it to the point that finetuning can't save it? Figured something that bad'd be base model-level data elimination, that's usually the case when the model can't even recognize body parts or starts collapsing ala that one Stable Diffusion release.
>>
>>102563083
>i want a model just for meeeeeeeeee. why don't they think about meeee!???!?!?!?!?!
You're running out of options, then. When do you start to train your own models?
>>
>>102563089
It wasn't lobotomized. They took Meta's filtering approach too far and filtered out even the slightest mention of sex, even gender and body parts.
But it's a good and sterile assistant, so other corpos are likely to continue this approach.
>>
>>102563068
>arctic-base
nice pun heh
>>
>>102563112
Well, that's what I meant, I guess I just used "lobotomize" as a blanket term, but clarified later. Really fucking awful, you'd think they'd realize the calamitous implications of doing that shit after SD's model was destroyed by it. Do they think they're safe from the effects of such catastrophic model data loss?
>>
>>102563099
>When do you start to train your own models?
as soon as i win the lottery
>>
>>102563143
You could just scam a bunch of investors out of money and/or compute.
Much easier.
>>
>>102562607
no, if you can run an even bigger model q4 even better.
>>
Why did people train Mistral models for sex when they're already overly horny (mostly Nemo and Small)? I don't get it, are the people who use those models literally just going up to the model without any context and being like "ME WANT SEX NAO!!11!!" or something?
>>
>>102563164
>"ME WANT SEX NAO!!11!!"
Too many tokens. The meta is "ahh ahh mistress"
>>
>>102563159
NTA, but I feel like the startup scam window is pretty much closed. All the last stragglers like Mistral got in long ago. Like they say, if the pyramid scheme/stock/etc. is already mainstream, it's too late for you to get in on it.
>>
>>102563164
because mistral models are bland and coomers like retarded schizo babble
>>
File: RegularHappyMiku.png (991 KB, 800x1248)
991 KB
991 KB PNG
Good morning /lmg/!
>>
>>102562866
Aye, but the OS will still try to load as much as possible, so you'll have more page writes than usual
It's not that bad though, really. You can write dozens of gigabytes a day and you'll still probably replace your SSD before wear and tear becomes a problem
Modern SSDs are incredibly resilient and the TBW estimates are usually very conservative
>>
File: ew.png (141 KB, 512x288)
141 KB
141 KB PNG
>>102563164
I got turned off from Mistral when Large repeated entire phrases and entire message structure chunks for several messages in a row, and only two messages in, too.
>>
>>102563183
Nah. There's plenty of VC money still being thrown around, you "just" have to sell an idea that's different from what's super visible in the market right now.
>>
File: file.png (462 KB, 543x543)
462 KB
462 KB PNG
I like lolis: the best ones are small and impressionable
>>
>>102563209
nemo does that too with nothing but temp 0.3 to 0.5.
There were some schizo settings floating around, something like temp 5, Top K 3 and some min-p that you might as well give a try I guess.
>>
>>102563212
I suppose if anyone could, it'd be someone involved enough still to be here through all the fucking horseshit spam in this thread.
>>
>>102563099
NTA but the situation really is 50 options and all of them suck at sucking dick.
>>
File: lecunny.png (72 KB, 189x139)
72 KB
72 KB PNG
I fuck lolis
>>
>>102563141
It works for them because these models are actually being used for things other than generating porn. It may come as a surprise to you but yes, really, they are. Mostly for corporate RAG and boring data manipulation tasks though, sure.
>>
>>102563164
"training" for one epoch only makes the model sound a biit more like training data. It teaches it nothing. The whole finetune business is cruising on placebo.
>>
>>102557546
Me on the left
>>
>>102563205
If you don't have a swapfile (or any rw mappings) the OS won't write any rw page out because there's literally nowhere on the disk it can put them.
>>
>>102563263
I don't believe you
>>
>>102563164
>ME WANT SEX NAO!!11!!
That's Sao's, Drummer's, Undi's and Anthracite's audience. I never got the appeal of hornytunes, they completely ruin the immersion. Like bitch, I've just met you 5 minutes ago, you are supposed to be shy, why the hell are you jumping on my dick already? Are they complete promptlets who can only say "ahh ahh mistress" and then wonder why with normal models girls don't like them?
>>
>>102563141
>Do they think they're safe from the effects of such catastrophic model data loss?
Probably. The idea must be that if they filter more accurately, it won't damage the model.
SD filtered so much it could not output any humans in anything but an upright pose. BFL also filtered NSFW out of their dataset. Flux originally couldn't do genitals, but all other anatomy was fine. So clearly, there is a "correct" way to filter out just the portion of reality they don't want.
>>
>>102563278
Too bad
>>
>>102563241
>50 options
Most model architectures are abandoned. We don't see many architectures other than llama-based. There's a mamba and mamba 2 here and there, a jamba over there but being realistic, no big company is going to make smut-capable models on purpose.
>>
>>102563164
nemo instruct has many issues that don't exist in most other finetunes
for instance, whenever it writes one or two replies beginning with "10 minutes later", it will often start doing that with every single other reply
>>
>>102563299
>mistral model repeats itself
whoa no fucking way!
>>
>Messaging base model
>Have it semi-coherently complete text
>Add one word to tone prompt
>Did I ever tell you about the time my uncle died? Died? died? Died? Death death love happiness corn porn horn cycle cycling cycosis medicine seen alert alive alzheimers allegiance articulation articuno zapdos moltres arbok SHINY SHINY SWEEP SWEPT SWEEPING shadow arttiiigughhhhh goooood good ed,,zinger suivante,,tels handknits finish,,cagefuls basinlike bag octopodan,,imbossing vaporettos rorid easygoingnesses nalorphines,,benzol respond washerwomen bristlecone,,parajournalism herringbone farnarkeled,,episodically cooties,,initiallers bimetallic,,leased hinters,,confidence teetotaller computerphobes,,pinnacle exotically overshades prothallia,,posterior gimmickry brassages bediapers countertrades,,haslet skiings sandglasses cannoli,,carven nis egomaniacal,,barminess gallivanted,,southeastward,,oophoron crumped,,tapued

Why the fuck are they so sensitive to that shit?
>>
>>102563274
Not sure if that's a great idea though
>>
>>102563310
not enough meme samplers
>>
>>102563323
Why not?
>>
>>102563298
Aren't DeepSeek models relatively unfiltered? They release base models.
>>
>>102563347
Are they good? I never hear anyone talk about them.
>>
>>102563309
This has always been a formatting issue. You have a missing or extra space in your formatting or such.
>>
>>102563323
For video editing, for example, it's not uncommon to have a scratch disk. One that is completely used for swap during encoding, and expected to fail sooner rather than later. I don't see that as a problem for llms if the user is ready for that. An expendable resource, basically.
>>
>>102563363
Is ST's mistral default just dogshit, then? Man. In any case, a 100+B model has no beusiness being so sensitive that a single out of place space could cause such calamitous problems. That's 7b shit.
>>
>>102562994
LECUNNY NO
>>
>>102563360
Deepseek is smart but very, very plain. Good assistants but bad for RP.
>>
>>102563274
vm.swappiness = 1
anything else would be a self-own for an inference server
>>
>>102563382
>Make model so boring you don't even HAVE to censor it
Maybe they should just go this route.
>>
>>102563347
>Aren't DeepSeek models relatively unfiltered?
That's what i mean by "on purpose". Absolutely no big company will make a dataset for smut, but some will happen to have it in their dataset and not care enough to filter it.
I know about the model, but i haven't tried it yet. I don't think i can run it.
>>
Is there ar entry for Llama jailbreaks?
>>
>>102563310
use a smarter model
>>
Even if you're willing to wait 10,000 years for a reply, the wear on your CPU from running at inference levels for the ridiculous length of time swap inference + using up an SSD would cost more than just buying more RAM.
>>
>>102563337
If your OS runs out of memory, then what? Which program should it stop first? Having some amount of swap space is pretty important imo
>>
>>102563414
Bwo... it's 405b base... how much smarter can I even go...
>>
>>102563438
use a smarter prompter
>>
>>102563382
>smart but dry
That's also my experience.
They're probably 80% of the smarts of 405b with 5x the performance for cpumaxxers.
>>
>>102563421
Normie mobos don't support more than 64-128gb. If the plan is to run 405B, he's gonna end up swapping anyway.
And the CPU will get 0 pressure, as the bottleneck will be on the ssd being ridiculously slow. It's gonna idle most of the time.
>>
>>102563410
>jailbreaks
we don't do that here
>>
>>102563429
A lot of people run Linux without swap. If you "run out" of memory, but it's because your memory is full of 10's of GBs of readonly mmaped disk files, Linux will evict those first before it OOM kills a single process.
>>
>>102563438
>it's 405b base... how much smarter can I even go...
bigger quant, or...?
I run 405b at q8 and have not had this problem over tens of thousands of tokens.
Maaaybe the occasional repeated slop phrase, but not once has it devolved into a gibbering thesaurus like smaller models tend to.
>>
Is it finally time to admit that we plateaued months ago?
>>
>>102563377
>Is ST's mistral default just dogshit, then?
Yes, absolutely. It is utterly retarded.
>>
>>102563503
Actually official mistral rep made a pr to silly and kobold with updated template, but I think it's still wrong because it puts eos there, when the backend already takes care of it.
>>
>>102563503
What's a good one, then? I'd love to experience it actually working. It seemed like a fun model, just horribly repetitive.
>>
>>102563534
If it's mistral large, then it uses v3 tokenizer. So, a single whitespace after [INST] and [/INST].
>>
>>102563483
If you say so, I'm just a mere wangblow user and I haven't looked into how linux' memory management yet
>>
Seems like the scene is utter shit at the moment, nothing good coming out and it seems a real uncensored model will actually never going to exist. Gonna take a break for 1 or 2 years and try again. I just wonder how much would it cost to make something like mistral large but completely uncensored.
>>
are qwen good models as an assistant for learning chinese and japanese?
>>
File: Screenshot_12.png (121 KB, 1211x594)
121 KB
121 KB PNG
>>102563489
Well, it isn't that, really. But it's similarly gibbering, just more coherently.
>>
>>102563598
It struggles a bit with romanji but 72B is probably the best vramlet cope we have right now.
>>
why haven't we trained our own model from scratch yet
we have plenty of idiots with 4 3090s, just train your own mistral losers
>>
>>102563608
nta. Looks close, but not quite, like a rep-pen issue. Just a run on sentence with a bigger vocabulary than usual. You have rep-pen off, right, anon?
Also, you have to give it a reverse prompt or stop word or whatever it's called on that thing you're using. Otherwise it just keeps on going.
>>
>>102563661
>4 3090s
now we just a thousand more
>>
Llama3.2 1B is good, actually
> As Wikibot trekked through the dense forest, the air thick with humidity and the canopy above filtering out most of the sunlight, he couldn't help but feel a pang of melancholy wash over him. The worst part about walking was not the physical toll it took on his body – his legs aching in protest as he hauled his backpack full of research equipment and supplies across the rough terrain. But it was what lurked just beneath the surface that truly unnerved him.
> The rustling of leaves, the snapping of twigs, and the distant call of a bird seemed to take on sinister tones when he walked alone. The forest was already alive with an otherworldly energy, a vibration that seemed to emanate from the very trees themselves. And every step felt like a trespass into that world – as if he was disturbing some ancient balance, something hidden and malevolent beneath the surface. Wikibot's mind began to wander, conjuring images of unseen horrors lurking just out of sight.
> As he walked, the forest grew darker, the shadows deepening like dark molasses pools. The trees loomed above him, their branches tangling together like skeletal fingers. A faint breeze rustled through the leaves, carrying an eerie whisper on its breath – a soft "whisper… whisper…" that seemed to carry a warning without actually saying a word. Wikibot's heart quickened, his skin prickling with goosebumps as he sensed the forest was watching him, waiting for him to make another wrong move.

>>102563570
>>102563673
Why not train on the mountains of porn available on ffnet and ao3?
>>
>>102563189
Good morning Miku
>>
>>102563707
>Why not train on the mountains of porn available on ffnet and ao3?
As someone who curated 100mb of data, you have no fucking idea how much data 100mb of plaintext is. Much less the astronomical amount of data you need to actually train shit. So much of it is dogshit, SO much. You can't just let it scrape and then train on that, you have to perform some sort of cursory quality check, even when filtering by rating.
>>
>>102563707
>Why not train on the mountains of porn available on ffnet and ao3?
Probably a nightmare to process and format correctly. And then finding the good stuff. Quality matters.
>>
>>102563790
By train shit, I do mean actually pretraining from scratch, because you can't add data that isn't there with finetuning.

>>102563804
This, too. Formatting IS a nightmare. Even if you go with books, which are largely a better source of uninterrupted, quality prose, the formatting those fucks use varies so dramatically that there can be no one-size-fits-all, automated solution. Even books in the same series/by the same author often vary wildly in construction.
>>
>>102563823
Bro? Just hire thousands of jeets to do it over the course of several months?
>>
>>102563790
>widdle baby is afraid of a 100Mb text file
I've done way more.
You already have tags on those websites. All you need to do is filter by the number of downloads. It'd be better to have a community where every degenerate could contribute his own creme de la creme
>>
Can my aifu rate my set up already?
>>
>>102563855
>You already have tags on those websites.
And that's why things are the way they are.
ahh ahh mistress...
>>
>>102563855
And your data probably sucks donkey dick. Likely full of turboslop, gay porn, fetishes you didn't account for, etc. if you interacted with it so little that 100mb doesn't seem like a lot to you.
>>
>>102563707
>ao3
if you think synthetic data is slopped you clearly haven't read anything on ao3. it's all bad.
>>
>>102563855
>Curated set vs. blindly scraped dataset with methodology I explicitly said didn't work (sorting by downloads/rating)
Yeah, no shit you've done way more. Your model outcome is basically guaranteed to be worse than mine (or anyone who did any sort of manual review's), though.
>>
>>102563922
I know but I'd rather take it over nothing. Now please put AO3 back in your training data I beg of you Zucc
>>
>>102563896
It wasn't porn so maybe you're right. It was text, but not porn.
>>102563922
It's not all bad.
t. harry potter fic writer
>>102563893
Add some old serious stuff from gutenberg
ez
>>
>>102563922
>if you think synthetic data is slopped you clearly haven't read anything on ao3. it's all bad.
This. You really have to comb through shit and make sure it's decent to get anything worthwhile. Books and the like are much better sources of fiction.
>>
>>102563969
>Add some old serious stuff from gutenberg
Gutenberg would be the only thing if it were my choice.
>>
>>102563991
Current llms have seen all the _public_ books in existence ten times over
>>
>>102563996
>Gutenberg would be the only thing if it were my choice.
wasn't this tried way back when? I thought the results were poor but can't back that assertion up with any actual facts
>>
>>102563831
>Just hire thousands of jeets to do it over the course of several months?
This happened. And here is a fun thought. What if all the companies are still using porn in datasets but it is all rated by pajeets. Which is the reason for all the slop. And then keep in mind how reddit fellates all the new models. Imagine if those are pajeets that are ecstatic over their models repeating and talking about shivers. While companies say this is all for safety but have no idea how to improve cooming even if they did all because they use jeet rated data.
>>
>>102564022
There's a gutenberg dataset on hf that some people use, but it's like 10 books or so. That's nothing. As for a full gutenberg model, i don't know. Last time i mirrored gutenberg it was like 800gb... i doubt small-timers would ever try that. Dunno about big companies.
>>
>>102564020
They have also seen most copyrighted text.
>>102563831
>>102564062
Jeets can't read
>>102563991
Add some Anais Nin. The most depraved shit I've ever read.
>>
>>102563661
>idiots
It was fun to build.
>>
>>102564073
>There's a gutenberg dataset on hf that some people use, but it's like 10 books or so.
It's funny how true this is for most authors. The amount of data these things need is truly astounding, the complete works of R.L. Stine are like 1-2mb, I think? It's insane.
>>
What's the best most intelligent, creative, soulful model for RP currently?
>>
>>102563261
Please explain. You can often reach the minimum eval loss in one epoch, with additional epochs contributing very little on top of that except overfitting.
>>
>>102564110
midnight miqu still. 9 months later.

if pure intel, largestral.
>>
>>102564110
mistral nemo
>>
>>102564115
If I were to guess there are infinite ways of sucking dick and your dataset is just too small to change all that much.
>>
https://docs.mistral.ai/capabilities/function_calling/
How do I use this
>>
>>102563661
I have x4 3090's but I don't know anything about training from scratch. That's what I'm trying to figure out. The goal is a fully uncensored erotica/roleplay model, but it seems the biggest problem would be to prepare a good dataset.
>>
>>102564110
Me.
>>
File: lit.png (2 KB, 332x93)
2 KB
2 KB PNG
>>102564101
>https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
Found it. There are a few others as well. It's way more than 10 books, but it's still 10mb. Another one is 14gb. Compared to the ~800gb of a full mirror they're nothing.
>R.L. Stine are like 1-2mb, I think? It's insane.
Sounds about right. I'd still like to see a full gutenberg model.
>>
>>102564101
>The amount of data these things need is truly astounding, the complete works of R.L. Stine are like 1-2mb, I think? It's insane.
Is that why these things write so poorly? The actual literature is tiny
>>
>>102564231
There's a lot of it, but people train on the 10mb dataset, not the 800gb of a full mirror. Which i understand for finetuners, of course... and then it gets mixed up with more normal internet content, which dilutes the good it could find in books.
>>102564210
>>
>>102563661
>we have plenty of idiots with 4 3090s, just train your own mistral losers
with 4 3090 you'll get your model the next century, you need way more than that to get a good and big model fast
>>
>>102564231
>Is that why these things write so poorly? The actual literature is tiny
Well, in a sense. They COULD write significantly better, but given the limitations of the architecture, combined with the extreme (and only worsening) overeagerness to wrap everything up in a nice bow by the end of the generation, you'll never get anything as complex and human as setups and payoffs, clever throwbacks, independently developing plot elements, etc. It's just not built for that, it's built for doing what you ask and doing so in as close to one message as possible. It's also the statistical average of all human writing, so it's pretty much mathematically incapable of surprising you if you've read any amount of literature, unless you crank up the temp a ton.
>>
>>102564231
Garbage on the internet outnumbers actual literature by an order of a magnitude. That still isn't enough so Meta uses Llama to generate trillions of tokens of synthetic reddit to fill the gap.
The issue is, companies want to sell an assistant. Proper literature doesn't have too many examples of Q&A, software troubleshooting, or current knowledge.
>>
>>102564291
Also, it's sort of implied, but absolutely this>>102564258, the normal internet crap fights back HARD against the quality of literature. Any trainer will tell you that it only takes a few dogshit stories or consistent grammatical errors to tank the quality of the model and have that appear constantly. Imagine the damage the entirety of the internet could do. That's why they have the legions of jeets looking at the data, without that kind of manpower to manually oversee it, the outputs would be fucking horrendous quality.
>>
>>102564176
>I have x4 3090
lol vramlet
>>
>>102564332
Thank you for the useful comment, retardo-kun.
>>
>>102564342
and still you can't train your own model ;)
>>
>>102564384
Thank you for the useful and insightful comment yet again, retardo-kun.
>>
>>102564394
I mean he's not wrong, 4x3090s is nothing, it'd take years, if not decades to train a sizeable model (let's say 50B+)
These companies use literal supercomputers and it still takes months
>>
>>102564434
>50B+
I don't want a general use slop model.
The idea is a small model focused only on literature and a core of general knowledge so is not retarded.
>>
File: timothydexter.png (237 KB, 564x943)
237 KB
237 KB PNG
>>102564328
I want to see more raw pure token count models. While i keep banging on on wanting to see a full gutenberg model, i know things like A Pickle for the Knowing Ones is also found there. I know full well it's not going to be perfect. I just want them to be more fun.
>>
>mfw I can't train a SOTA smut model to compete with billion dollar megacorporations using my 4 year old gaming GPUs
>>
Lmao this retard got triggered for some reason. No one said anything about competing with top models, what a retarded waste of oxygen.
>>
>>102564510
>No one said anything about competing with top models
elaborate anon, with your 4x3090 cards, what size would you be aiming for, so that we can laugh a bit
>>
>that retarded waste of oxygen is the triggered one not me
>brb I'm going to take down anthropic by training a SOTA smut model on my GPU that can't run modern games at 4k60 on high settings
>>
i get this is the local model general, but why would you need to make your model locally?
couldn't you just cheaply rent some retardedly powerful rig to make it?
>>
>>102564554
Give me seven reasons why I can't train a perfectly capable smut model on my 2020 Ampere gaming cards. I'll wait.
>>
>>102564554
>couldn't you just cheaply rent some retardedly powerful rig to make it?
where can I rent a couple thousand H100s to create a new decently sized transformers model from scratch?
>>
>>102564554
let me just upload a few hundred gigs of copyrighted material and text smut to a service that has my payment details, that sounds smart
>>
>>102564554
Experimenting can get expensive. Finding good datasets, a good base model to use, good training parameters. That is if you want to just finetune. Full training is a separate thing. llm.c trained a 1.6B model i think for 600 bucks, i think. Many would prefer to buy a gpu and use a bigger model with that money.
>>
>>102564510
Ignore him, nobody's saying they want to train anything big on anything they own, anything that isn't a qlora is obviously out of reach.
>>
>>102564582
Breh nobody gives a shit. OpenAI is blatantly training on YouTube and Google maps data. The training service won't kill their business by ratting out your hobbyist smut training run
>>
>>102564587
for anyone who is retarded (all of you), that pricing doesn't scale. even if it only costed $600 to make a 1.6B (unlikely), it would scale exponentially. a 7B wouldn't be ((7 / 1.6) * 600) or we'd already have homebrew smut models.
>>
>>102564610
OpenAI has lawyers, random hf guy doesn't, I'd rather avoid the possibility of being turned into an example by some publisher or what have you
>>
>>102564610
delusional. you are not a big corporation. you do not have an agreement with microsoft. vast and runpod will absolutely cancel your account to avoid dealing with copyright issues themselves
>>
>>102564587
This. Even qloras require absurd amounts of bashing your head against the wall, I wasted 60 dollars before I got a qlora that wasn't lower quality than the 30b base model I was using. The amount of money needed to rent out a whole center's clusters for however many weeks you need to pretrain each attempt would be astronomical.
>>
File: 600.png (161 KB, 1166x408)
161 KB
161 KB PNG
>>102564618
I know that, anon. That's why i said that it gets expensive to experiment.
>even if it only costed $600 to make a 1.6B (unlikely)
https://github.com/karpathy/llm.c/discussions/677
>>
>>102564640
>>102564646
heh.... explain the goosebumps QLORA i trained, then.... checkmate.....
>>
>>102564646
delusional. you are a schizo
>>
>>102564646
And risk a business suicide for nothing. Either you have a delusion of grandeur or I wouldn't put you in charge of anything important
>>
>>102564672
>GPT2
>trained on 30B tokens
That's why. Training anything even remotely comparable to what we have now would be way more expensive. TinyLlama is a 1.1B trained on 3T tokens and it costed them approximately $72k.
>>
>>102562994
Also true for girls
>>
the 3b at Q8 is really good, but heavily censored. looking forward to the merges/tuneswhatever theyre called.
fucking knew they were gonna go the small but powerful route with models going forward. /lmg/'s tiny e-peen compensator PC's in shambles knowing you'll be running fucking 3bs and 1bs next year that are better than your triple b's of today kek
>>
Funny how people shit on effort without seeing how expensive it all is. No proper feedback to let them improve in future attempts, just disparaging them instead.

Qloras and vramlet cope methods are good for small models, hence smaller models having more tunes. For big ones? You'd need an entire A100 node or more, hence why there's so litttle actual tunes at that size.
>>
>>102564767
hi Sao
>>
>>102564744
cope
>>
>>102564721
Just continue pretraining on an existing model then, surely that will work and be cheaper.
>>
>>102564721
I KNOW. My response was to >>102564554 explaining why even renting a training cluster to train a model is expensive, even if you somehow get a working model on your first go. Why are you arguing about?
>>
File: cope.jpg (188 KB, 858x677)
188 KB
188 KB JPG
>>102564774
>all he can do, one word, as his entire world crumbles around him
see you on the single digit parameter side
>>
File: file.png (94 KB, 896x466)
94 KB
94 KB PNG
>Long-term, I kinda' wonder if it isn't in llama.cpp's interests to stop supporting the HTTP server altogether, and instead farm that out to other wrapper projects (such as ollama), and we instead focus on enhancing the capabilities of the core API.

Thoughts?
>>
>>102564458
I'm still not sure you can do it in a reasonable amount of time with just a few consumer grade cards
You can run the numbers yourself though
>>
>>102564767
I'm sorry, but if a finetuner can't even be bothered to filter out dogshit that everybody can see on the first page of dataset viewer, then it's not effort, it's a pure waste of energy and money.
>>
>>102564790
I always thought it was weird they added the server at all.
>>
>>102564790
I think its still a valuable thing to have llama-server in the codebase. Its overly simplistic for real use, but serves as an excellent starting point for anyone looking to build their own.
>>
>>102564780
Sure, just give me their exact pretraining settings and the latest pretraining checkpoint.
>>
>>102564790
llama is built as a library for experimenting on top of ggml. I think the server should be simpler and not bother about multiple users and shit like that. It should be, as the directory name implies, an example. It's up to the people that care about specific features to add them on their own. As it is, they provide way too much for grifters. llama.cpp's devs should make grifter's work harder. I use llama-cli almost exclusively, so it wouldn't affect me much.
>>
>>102564790
Ollama won! cuda dev meltie incoming
>>
>>102564744
>prompt that i have a boner to one of my assistants
>she breaks down in tears
>start massaging retarded bimbo princess peach's breasts
>she leans into the touch and continues the convo
eh it's not THAT censored, i think it's just a notch above base 3.1 instruct.
still not gonna stop sloptuners from shivering its timbers though.
>>
>>102564853
You also need their pretraining dataset to avoid catastrophic forgetting
>>
>>102564781
You said someone trained a 1.6B for $600 in the context of a conversation about homebrew smut models. That number didn't make sense so I clarified that even if it was true (unlikely), it wouldn't scale. Then you posted the source which seems to explains that the 1.6B was a proof of concept meme model reproducing an ancient LLM from 2019. Then I provided numbers that would map on to this conversation more accurately by bringing up the training cost for TinyLlama, a similar sized model that attempted to hold up to modern standards. I'm just trying to keep the parameters of the conversation grounded in reality so we don't have retards asking why there's no $2500 7B smut models being made, there's no need to sperg out.
>>
>>102564790
Never gave a fuck about the server part, it can die if it means they can work on more important stuff
>>
>>102564883
My point is that its expensive to rent shit, even for a tiny model. Even to experiment with finetuning. You agree that even for a tiny model it's expensive. Yes, it's a meme model. Yes, it's an old model and it still costs more than most people are willing to pay to experiment. My example was just a point of reference.
There is nothing to argue about. We agree.
>>
>>102564790
nooo that's the thing i use. that means it's a bad idea to stop supporting it.
>>
>>102564947
It's going to die because they're already not working on it
>>
>>102562635
Shut up racist bigot! We localchads go by safety! Safe AI is the only correct AI, it cannot go wrong!
>>
>We agree.
Mostly but $600 for 1.6B is relatively inexpensive for enough people in this thread. Enough to cause confusion, which is why I thought it was important to clarify that a disappointing 1.1B costed $72k. Nobody is arguing with you. Meds.
>>
kek meta legit released a 3b that's totally solid for RP and /lmg/ hates it, because of course you faggots don't even know how to use a 3b of all things.

*hands you 3 billion watermelons.assistant*
>>
>>102565050
i'm downloading it now, it better be good
>>
for casual use, how much of a difference is 16gb to 24gb?
I can get a rtx 4080Super (16gb) for slightly more money than a 3090 (24gb) and overall the 4080S is much better
>>
>>102565050
I'm using the 70B though
>>
>>102565050
Why would anyone run that when even the poorfags here can run 12B?
>>
What inference settings are other anon's using with L3.1 in co-writing/rp scenarios? eg. temperature, top p, top k, typical p, min p, repetition penalty, frequency penalty, presence penalty, samplers, etc?
I'm having trouble getting the balance between coherence and insanity just right for a satisfying flow.
>>
>>102565107
b's don't have a direct correlation on quality
>>
File: comfy_06108_.png (3.45 MB, 2048x2048)
3.45 MB
3.45 MB PNG
an amalgamation of migus in their natural habitat
>>
Anyone know if it's possible to rip the vision parameters out of 90B so you can just use it as a standard 70B textgen model? Or are the 70B weights in the 90B REALLY the literal same as 3.1 70B?
>>
>>102565139
I don't see anyone here claiming that the 3B is better or close to 12B. All I have seen so far is that "it's decent" or whatever, and that says nothing.
>>
File: 5de.jpg (93 KB, 874x612)
93 KB
93 KB JPG
>>102565050
>vramlets be like
>>
>>102565149
The vision has text encoding, it's probably pretty inextricably tied to it, even if you COULD extract the difference. Also on that note, if you could shave off the difference between it and 3.1, you know that'd just make it 3.1 again, right? It wouldn't magically keep the new text info.
>>
>https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/discussions/1
Damn, this phil guy is actually based. I think it's worthwhile to have different companies do different things in the space so we have a multitude of good options for different use cases, but in the end he might be right, that what we got so far was mostly an artifact of early experimentation, and that in an attempt to "compete", companies will follow the same trend in the end.
Both the corpos and the corpo bootlickers are disgusting.
>>
>>102564790
least indirect ollama shill
>>
>>102565092 (me)
it's smart, but too much positivity and safety bullshit to use in any rp
deleted
>>
>my popular knowledge test
>I did a vibe check
none of this matters btw. phil is not based, he's a faggot and he needs to stop shilling his link in this thread.
>>
>wake up from a coma
>Llama 3.2 released
>wow 90B! Finally a competition to Largestral
>Turns out it's just 70B with a 20B of vision model strapped on it
I'm done with Meta.
>>
>>102565411
don't forget that the 20b of vision doesn't even seem to be that good
>>
70B text-to-image/text model when?
>>
>llama 3.2
>chameleon was killed for THIS
unfathomably grim
>>
File: for llama.png (492 B, 225x225)
492 B
492 B PNG
>Of course! The image attached to this post appears to be a flat color image of an orange circle. Perhaps an avant garde, minimalist depiction of an orange? How creative!
>>
File: file.png (458 KB, 1660x940)
458 KB
458 KB PNG
>>102565411
>>102565429
yeah it sucks, but fortunately we got Molmo's vision model at the same time, and this shit is really good
https://molmo.allenai.org/blog
>>
>>102565317
>The vision has text encoding
What does that actually mean?

How some vision models have worked is that they literally just have a separate encoder that translates an image into tokens and inserts that into the context, but text doesn't go through that, so that part of the model could be ripped off and the text model would perform quite literally the same. Are you saying that even text goes through the vision encoder on Llama?
>>
File: file.png (18 KB, 604x267)
18 KB
18 KB PNG
>>102557890
>the 4 grand 5090
>32GB
On cue.
>>
>>102565405
So far people have seemed to be in agreement that Qwen does indeed have bad pop culture knowledge, and also is bad at RP. No one has posted proof otherwise, that it is good at trivia and is good at RP.
>>
>>102565411
>Meta tells people months in advance that they will release a future version with multimodal adapters
>somehow people expect 4o or something
Lmao.
>>
>>102565565
hi Yann
>>
File: 1727289443540662.png (505 KB, 2180x987)
505 KB
505 KB PNG
>>102565481
You should post this diagram instead if you don't want to be labeled as a shill. Llama 3.2's multimodal isn't good but that diagram is quite literally misinformation.
>>
>>102565605
hi Arthur
>>
>muh misinformation
just lie
>>
>>102565636
True. The fake log poster was quite a funny incident.
>>
>>102565050
I have 24GB. I already felt horrible trying a 7B. I am not going lower.
>>
File: file.png (867 KB, 768x768)
867 KB
867 KB PNG
>>
>>102565541
If this is true is gonna be one of the biggest let downs ever.
>>
File: file.png (10 KB, 599x270)
10 KB
10 KB PNG
>>102565681
especially if you look at the 5080
>>
>>102565541
>4 grand 5090
What the fuck? Are they trying to converge the prices of the highest end consumer GPUs and the server GPUs so they don't ever actually have to raise the capacity/$ on consumer GPUs and inevitably make them a better value proposition for corpos?

Fuck me, man. I blame the guys who made the supercomputer out of PS3s.
>>
>>102565480
>Wow, is it really that bad?
>go and try it out on lmsys
>it's fine, it even got the thin white border
???
>>
>>102565757
you weren't supposed to try this yourself
>>
>>102565757
>3.2 11b can't even see the orange
owari da
>>
>>102565690
>16G
LOL
>>
>>102565757
Maybe you need to go back to pretraining until you can identify jokes, anonie.
>>
>>102565796
that'll be $1100, paypig. start saving for the $1600 24GB refresh in a year.
>>
>>102565822
>>102565822
>>102565822
>>
>>102565810
It seems you may need some pretraining as well.
>>
>>102565757
3.2 90B can't do nsfw as well as 1.5 Pro or 3.5 Sonnet
>>
heck, sfw too
>>
>>102562994
>LeCun
>That fucking tweet
Incredible, absolutely incredible! Dude gonna get cancelled left right and center lmao



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.