[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: cyborgku.png (1.54 MB, 1024x1024)
1.54 MB
1.54 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102535977 & >>102524339

►News
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102535977

--Paper: Improving code generation in large language models:
>102541159 >102541640
--Papers:
>102542180 >102542337 >102542532
--Experimenting with high dropout rates and fine-tuning:
>102542430 >102542447 >102542469 >102542583
--Finetune data requirements discussion:
>102536472 >102536511 >102536539
--Enabling Model scopes for Vector Storage did not regenerate vectors:
>102537930
--Discussion on the potential and performance of distilling 70B-instruct to 51B:
>102536804 >102536822 >102536823 >102536885 >102537261
--BUERgence script for optimizing llama.cpp inference parameters:
>102542933
--Approaching 16k context limit and potential solutions:
>102542492 >102542636 >102542733 >102542746 >102542830 >102542851 >102542865 >102543206 >102543451 >102543037 >102543295 >102543426 >102544540 >102544569 >102544739
--Advanced Voice rolling out to ChatGPT Plus and Team users:
>102537776
--45 minute daily limit on advanced voice features, including silence:
>102541866 >102543472 >102543491 >102543519 >102543677 >102543759
--Miku (free space):
>102535995 >102536036 >102536082 >102536179 >102536277 >102537490 >102538991 >102539843 >102540239 >102540324 >102540465 >102543667 >102544553 >102544704

►Recent Highlight Posts from the Previous Thread: >>102535999
>>
Do people here use agents why/why not?
>>
>>102545041
"agents" aren't a coherent concept and everybody who talks about them sounds like one of the crypto grifters who pivoted to ai
>>
>>102544853
most useless post itt
>>
>>102544853
seriously, where's a script to fix these backlinks?
>>
>>102545041
Generation and especially prompt processing is too slow for building agents, while smaller models require fine-tuning to function reliably. Time isn't a factor in only a few tasks.
>>
>>102545060
I understand "agents" as any indirect use of LLM output, or more specifically, instances where output results from additional automated prompts.
>>
>>102545116
Errors are snowballing if you use an LLM output as input for another LLM.
>>
>>102545060
I mostly meant stuff like this
https://github.com/Agent-Tools/awesome-autonomous-web
>>
File: 1727226087946704.png (489 KB, 512x768)
489 KB
489 KB PNG
>>102545137
Just like any LLM response, since each following token depends on its preceding ones. Agents offer a way to enhance answers by focusing on specific tasks. For instance, you could employ an extra prompt to track a character's clothes or outline a broad narrative plan beforehand. It outperforms a single prompt, but waiting for subsequent prompts to finish before final answer even start streaming is not worth it.
>>
>>102545307
You can do the agent shit while the human is writing out their response. Assuming that order is acceptable.
Human response
AI response
Agent1 gen: {{char}} clothing state
Agent2 gen: {{user}} clothing state

(Eventually)
Human response (suffixes above state info before ai response optimally wrapping into fancy prompt)
(Optional: do judge rounds on the above answers to improve quality, if the human is taking his time responding)
>>
File: 1415322611803.png (5 KB, 208x208)
5 KB
5 KB PNG
I love LLMs so much, bros. When I tried AID2 back then, I had a clear vision of where it would go for entertainment, and I feel like I'm living in that vision once I finally achieved 70 beak models locally, specifically Q4 or higher. The ability to follow rules leads to so many opportunities - text adventure, games featuring {{role:d20}} against {{role:d20}}, roleplaying with wild traits mentioned as just minor notes, and so much more. And things I hadn't even considered, like an offline personal wikipedia or browser search for information, recipes, questions, or a writing assistant for professional papers. The only front I feel it's still lacking in is fanfiction. It's certainly more coherent trying to simulate established characters, but 70B or modern finetuning is still too rough to excel at it.

This tech has been a novelty for me for years, but with that last hardware upgrade it became integral to my life. Local is definitely the future, and that future feels bright. I can't wait to see what the next jump 5 years from now is like.
>>
>>102545367
buy a muffin because you deserve one
>>
>>102545307
Maybe you could get away with a small dumb tinyllama for that
>>
File: 16lines.png (120 KB, 685x374)
120 KB
120 KB PNG
i had an idea tonight about using ai training to store data. the idea is simple, create small images of hex text files, then square crop the image and use it to train ai. the gpt can already convert hex values but thats not wat i want.

i guess the desire is for the hex charts to be stored as training data. then recalled by GAN would make a new text file? this could be used for raw data eventually? ai generated bitmaps?

pix related, concept would crop off the text and just have a square of hex.
>>
>>102545367
>The only front I feel it's still lacking in is fanfiction. It's certainly more coherent trying to simulate established characters, but 70B or modern finetuning is still too rough to excel at it.
You should try to use a lorebook for world-building and these fine details.
>I can't wait to see what the next jump 5 years from now is like.
I don't.
>>
>>102545442
>using ai training to store data
Retarded idea.
>>
>>102544853
where is the recap image?
>>
>>102545503
It didn't pass our safety checks
>>
>>102545440
You may achieve good results with proper task-specific fine-tuning.
>>
>>102545486
>nooo don't use a compression algorithm for compression
Calm down silly anon.
>>
>>102545442
>GAN
you can probably get away with a simple deconvolutional NN with the image ID encoded as a vector as its input
>>
File: hughjackman.png (244 KB, 422x506)
244 KB
244 KB PNG
So is Mistral Large 2 still the go-to model for around 100B? Is 4 bpw enough or is 5 bpw necessary for it not to shit itself?
>>
>>102545307
>subsequent prompts to finish before final answer even start streaming is not worth it.
I was thinking about it a few threads ago but never said it out loud? Does batching need more vram? Cause you could generate initial draft and then do agent stuff in parallel? Originally my idea was not related to agents but just running multiple gens for the same prompt with different seeds. Not worth it for 7B etc but if you are running a 70B having it write 5 different drafts at the same time with same speed would be nice?
>>
File: recap-102544848.jpg (2.67 MB, 1253x7330)
2.67 MB
2.67 MB JPG
>>102545078
https://rentry.org/lmg-recap-script
>>102545503
I figured I would stop with the images now that we have the script.
>>
File: 16lines_crop.png (81 KB, 420x323)
81 KB
81 KB PNG
>>102545538
yes this is how the thought had started, could ai train'd data take up less space @ scale (thousands of hex images). since it's redundant and based on calculus weights, gradient decent etc.

>>102545592
i'll keep that in mind if i can try but it's vaporware imo since i dont know how script any models except for easy-difussion, which is already trained on ten gigs of images.

>>102545486
say it was trained on the kjv bible in hex blocks, the ask for an output image of what was learned, then run an ocr to get the hex values back, and decode to text.. what would u get?
> in heaven there is no beer, that's why we drink it hear !
who knows what might come out. assuming the hex couplets were'd lost into a mass of scrabbles. we take OCR training for granted after all...
>>
>>102545694
>https://rentry.org/lmg-recap-script
thanks
now how do I make them work with 4chanx previews
>>
>>102545735
>now how do I make them work with 4chanx previews
4chanx previews work with the user script.
>>
>>102545760
they definitely don't
I'm also not getting any (cross-thread), (you), or (dead)
>>
File: 20240925_181923.png (106 KB, 1344x938)
106 KB
106 KB PNG
>let's have the AI think for 3 months and cure cancer bro
>>
all models below 70B are shit. I wanted to create an RPG game that uses LLM. But most local models below 70B are not suitable, often do not follow the instructions, or start to get lost. Few people can run 70B locally on a computer. APIs such as openrouter/groq are expensive or very limited for a simple game. So games using LLM are still the future. If models with BitNet (or another revolution) do not start appearing, such games with LLM will not be common. Unless special models 8B are created that are trained in terms of using them in RPG games.
>>
>>102545841
You people said this exact same thing a year ago when the best we had was llama2 70b which is worse than llama3-8b.
Admit it, you only care about justifying the money wasted on gpus.
>>
>>102545878
>You people said this exact same thing a year ago when the best we had was llama2 70b which is worse than llama3-8b.
Google Gemini is currently the best 8B on the market and it beats all old 70Bs, so yes, small models improve rapidly, but so do big models.

>Admit it, you only care about justifying the money wasted on gpus.
No, we are not. It's about getting accustomed to the best thing. Models just feel dumber when you go down. Once you taste the best, you don't want to use anything less.
>>
Ok, L3.1-70B-Hanami-x1 is the first good 3.1 tune imo. Follows instructions better than anything I've ever used before and writes extremely well.
>>
>>102545878
Oy vey :O
>>
>>102546005
Hi, Hanami.
>>
>>102546005
good morning saao
>>
>>102545442
I don't get it. You want to train on the image of the hex? The raw text would be just 256 bytes (plus some EOT token or something), while the image would take at least 11288 bytes assuming a 6x8 character font. Even at 1bit per pixel you still have 1536 bytes to feed to the thing. You're increasing the training pressure by 6x at least.
If you want to compress data there's already llama-zip and there's a similar thing as a PR on llama.cpp already. A classical compression algo would be faster and more reliable.
If you want to add noise to the generation to get something like
> in heaven there is no beer, that's why we drink it hear !
You use token masking during training. You can even train to include typos.
>>
>>102545878
>llama2 70b which is worse than llama3-8b
Would you prefer l2 or l3 to suck your dick?
>>
>download mistral nemo 12b
>chats no longer devolving to "YES MARK ME AS YOUR ____"
>get into a nice chat with ai, just a normal interaction
>things getting kissy
>"by the way, [user], before we do anything i have to tell you something
>"im from a different dimension and blah blah"
where the fug did that come from? card mentions nothing about secret past or hidden origins and nothing in the chat was magical/otherworldly/interdimensional. i tried regenerate and it got weirder, one time she said she was a demon kicked out of hell, another try she said she could "travel through time and manipulate energy" lel

or is that crossing the context limit (16k)
>>
>>102545841
>APIs such as openrouter/groq are expensive
70b is like $.40/Mtok
most of the bigger models are more expensive but 70b is dirt cheap relatively
>>
>>102546781
Honestly anon
I think you're just cursed
>>
>>102546781
maybe a temperature thing
i think recommended temp on non-merge nemo is something ridiculously low like 0.3
>>
>>102546809
i'm starting to think so as well
>>102546814
ah i'll have to try taht
>>
>>102546781
Have you posted your settings yet?
Because that sounds like a case of bad settings + some weird shit somewhere in the context.
>>
>>102546781
nemo instruct is very sensitive to things like whitespaces in the instruct template, I switch between two templates depending on whether I want creativity or accurate instruction following
>>
>>102545728
>could ai train'd data take up less space
This is as retarded as people thinking PiFS is viable. Spoiler alert: it's not. The offsets will, on average, take more space than the shit you're trying to "store" in it.
Same with this, how the fuck are you going to store data in a NN without any losses? If you think for at least 1 second you'll realize that the absolute best you can do is a 1:1 space efficiency, and that's after training and inference.
>>
>>102545783
>we have 10 datapoints
>now we can make a parabola with it that will model the accuracy!
>>
>>102545783
>Only two datapoints in the first thousand tokens of the test
This shit sucks.
>>
File: file.png (124 KB, 1051x530)
124 KB
124 KB PNG
>>102545777
It works for me using 4chan-x and GreaseMonkey. Are you using xt or one of the other monkeys?
>>
>chat completion
utterly censored
>text completion
utterly uncensored
why?
>>
>>102546891
Probably because you aren't using the exact instruct format that triggers that behavior when using text completion.
>>
>>102546891
Because raw smut is easy to find and train with, even accidentally. For a big dataset collected from the web, there will be some. For chat completion they typically generate the training set, so they'd need to generate smut requests and responses and nobody's gonna want to do that job. Not at any big company. It's easier to just say "no, i don't do that" and leave it at that.
I assume you mean the source models, not the finetunes and merges.
>>
>>102546888
yeah, tampermonkey, guess I'll die
>>
>>102546956
Qwen2.5 32b instruct, but I think its because of the default instruction template, I'll see what happens by tweaking it
>>
>>102545841
Im also experimenting with llm npc and so far its a disaster. It is most likely due to the lack of situational awareness of the npc.
Lets say you use an llm to act as the guard of a gate that is closed and there is no way the player can get inside. The player can prompt the AI with things like 'i climb the fence and go inside' the ai will respond like if the player really did that when in reality the player is standing outside the gate. This is completely immersion breaking and the restrictions and workarounds i've tried haven't worked so far. Maybe with ai that can undertand pictures you will be able to send the ai the screencap of the situation plus a description plus the user prompt, thus creating a proper answer.
>>
>>102547036
I think that, for that kind of interaction, you sort of need a multiprompt solution.
Something like prompting the model for the current context the player is in and if what the player wants to do is possible, what are the possible consequences, etc.
Iterate on the player's prompt before outputting a response, basically.
>>
>>102546979
If it was trained to reject smut, and i understand it was, chances are that tokens exclusively found in the chat template will steer it towards safe outputs. You may get better results, but those tokens have been burned into the model. There's no context where the chat template tokens and smut overlap. It's like having any mention of 'assistant' in the system prompt in llama models.
Best of luck, though.
>>
smedrins
>>
>>102547079
That's an old word, Anon.
>>
>>102547077
>tokens exclusively found in the chat template will steer it towards safe outputs
I was somewhat suspecting that was a thing, that the chat template would be utterly different from the regular text completion, oh well =(
>>
>>102547055
I've thought about this too, embed the player prompt with a complete situation description and if the actions the player is describing are legal or not. Would require various iterations.
Is qwen2.5 image understading any good?
Also all of this should be done by a model of no more than 11b
>>
>>102547186
Ideally, you'd have a whole game built around the llm instead of trying to have the ai run the whole simulation.
So inventory management, calculations, etc, would be done by a classical system, then the llm would be fed that information, alongside the relevant history of its previous outputs and user's messages (summarizex RAG or something of the sort) then it would evaluate the user's input before producing the final output (iteration).
I wonder if the better approach would be to have a fast small llm that dopes more iterations or a bigger slower but more capable and "stable" llm with less iterations.
>>
>>102547133
Nothing stops you from trying, but i wouldn't expect a night and day difference. May work at the start but as the context is filled with 'safe' tokens, so will the output.
Text completion has no template, and you can use an instruct/chat model as a completion model. The problem is that anything that steers it a bit towards the 'assistant mode' will cascade and start rejecting things.
I remember, back when i just started using llms, using instruct models as completion with few-shot examples.
Some setup.

char1: dialog...
anon: dialog...
char1: dialog...
anon:

Set "\nchar1:" as the input suffix string, set "\nanon:" as the reverse prompt and that's pretty much it. You still get to dialog with the thing but, hopefully, avoid all the steering tokens. You can add new characters on the fly, you could even play as more than one by adding more reverse prompts.
I don't know if it worked well because of that token avoidance or because i was using a model that didn't care about those things anyway. Never had an out of character rejection (good guy would still refuse to kill another person, while a bad one wouldn't).
>>
File: file.png (36 KB, 666x370)
36 KB
36 KB PNG
>>102546961
Try this. Go to the settings, and set it to Run at: document-body. I installed TamperMonkey and that got the previews to load for me.
>>
https://molmo.allenai.org/blog
>>
>>102547325
yeah that works, it was either document-body or a position above 4chanx
>>
>>102547425
Buy an ad
>>
>>102547425
>MolmoE-1B is a multimodal Mixture-of-Experts LLM with 1.5B active and 7.2B total parameters based on OLMoE-1B-7B-0924.
Interesting.
>>
>>102547425
looks cool
i'll play with the 7b when a quant drops
>>
File: zs.png (43 KB, 411x184)
43 KB
43 KB PNG
>>102547425
based
>>
>>102547242
i would take the approach of having a small model(2b-3b) doing the understanding of the situation before feeding it to a bigger model(7b-11b).
>>
>>102544853
this is just spam now
>>
File: VHDO2 script.png (43 KB, 790x786)
43 KB
43 KB PNG
More whacky results from experimenting with super high dropout training.
>>
File: sorry.png (35 KB, 394x629)
35 KB
35 KB PNG
>>102547870
AGI achieved.
>>
so what do I use for important matrix ,
just the dataset used for training?
or
could I use a series of questions that are likely to trigger the parts of the model in need to be preserved? would just the questions or question and partial answers work ?
>>
>>102547425
Is their demo running the 7B or something? The results I'm getting from it are far below what I'd expect from a 70b model.
>>
File: molmo7BD.png (49 KB, 896x140)
49 KB
49 KB PNG
>>102548005
>Is their demo running the 7B or something?
Yes. Molmo 7B-D
>>
>>102545841
>But most local models below 70B are not suitable, often do not follow the instructions, or start to get lost
There are improvements on this front as you get more beaks, to the point where somewhere between deepseek and 405b they are smart enough to be a competent dungeonmaster for you with the right system prompt
>such games with LLM will not be common
NVidia is gatekeeping hard right now, but eventually consumer hardware that can run quadrillion parameter models will be cheap and common.
I have seen the future, and it is fun
>>
>>102548005
yes
>Select model weights, inference code, and a public demo (using Molmo-7B-D model) are available starting today.
I don't know why they wouldn't demo with the big one. guess costs, but it makes a bad impression especially when they don't make it very clear what model is running in the demo
>>
>>102547425
Personally I don't see a use case for multimodal models. As long as my eyes and ears are still working. And multimodal models are much more retarded than text-only.
>>
>>102547955
Read
>https://github.com/ggerganov/llama.cpp/discussions/5263
>>
>>102548038
>but eventually consumer hardware that can run quadrillion parameter models will be cheap and common
not with semi conductors
>>
>>102548066
>not with semi conductors
whelp, back to analogue computers then
>>
>>102547955
>>102548057
Shit, I meant to also link
>https://github.com/ggerganov/llama.cpp/files/14194570/groups_merged.txt
Which is part of that discussion.
There's also some cool information regarding the process itself.

>>102548086
There's some research on analogue computers built with light pulses that mimick the behavior of neural nets. It's super interesting.
>>
>>102548030
>>102548045
Okay, good to know. It's definitely dumb to showcase the 7B like this without making it very clear in the demo. I was more than ready to disregard the entire model family as trash.
>>
>>102546867
You could fit a parabola to 8 data points and get perfectly usable results.
All you would have to do is correctly estimate the input uncertainties and calculate chi2/NDF (and also check that you can use this metric).
But the extrapolation of these results would be extremely sketchy no matte what because the uncertainty per data point is obviously very large vs. the change in "true values".
>>
>>102548057
what's pseudo-random synthetic data mean In the context of that conversation?
is it the text equivalent of regularization images?
>>
>>102548086
there's quantum computing
however due to the nature of the wave function there's a chance that it may never be technologically feasible for it to be useful .
>>
>>102548144
>what's pseudo-random synthetic data mean In the context of that conversation?
I think it's data generated by another model (or the same model I don't know) that's randomly sampled to create a dataset.
I think kalo's general calibration data is randomly sampled fragments from The Pile.
>>
File: 1698516170440216.png (130 KB, 862x640)
130 KB
130 KB PNG
>>102547425
>Countbench 91.2
Damn, they absolutely maxx'd on this one
>>
>>102545680
5bpw is where the loss of intelligence becomes hard to spot, but 4bpw is still good.
>>
>>102547425
How do you run this? Their trailers imply that there'll be some fancy front end that shows how it counts people and stuff but there's nothing about a front end on their roadmap.
Local models are very held back by the fact that all our front ends suck dick, especially when it comes to multi-modal shit.
>>
>>102548191
but what does randomly sampled here mean?
just samples from a larger dataset?
it's response to noise/incoherent text?
>>
File: file.png (46 KB, 889x853)
46 KB
46 KB PNG
>>102547900
soulful
>>
Have you guys seen this? Maybe it'll be local, that would be cool
https://menyifang.github.io/projects/MIMO/index.html
>>
>>102548365
Nice, finally something for the vtubers
>>
To the fine tuners in the thread (sao, drummer, whatever) I propose an experiment.
>get the recipe for a nemo 12b fine tune (Ie. Lyrav4)
>prune mistral-small using https://github.com/arcee-ai/PruneMe down to 12B~ish parameters
>fine tune the resulting model with the exact same recipe and data as the nemo fine tune
Maybe add a "healing fine tune" using the model's own data before the last step.
I'm curious to know how the final models would compare.

>>102548324
>but what does randomly sampled here mean?
>just samples from a larger dataset?
If I'm reading the exchange correctly, yeah.
As jukofyork mentioned in his proposal:
>It's likely the use of random and semi-random data mentioned in this thread is acting as a "quick and dirty" form of regularisation anyway:
>https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bishop-tikhonov-nc-95.pdf
>and IMO, it would probably be better to consider doing it in a more principled way - especially considering the calibration dataset is so small and the imatrix computation isn't using the full context nor the correct prompt format, etc.
>>
>>102548086
You'll have to train every instance of the model from scratch with analogue (just like nature does it.)
Big tech wants quickly reproducable slaves, and nvidia is all in on semis, so we're stuck with emulation and toy models.
>>
File: dropoutrate.png (51 KB, 711x272)
51 KB
51 KB PNG
>>102547870
For LLMs and LoRA finetuning, the optimal dropout rate is in the 0.55-0.60 range. https://arxiv.org/pdf/2404.09610

With higher values you get increasingly higher and more frequent grad norm spikes / noise that disrupts the training, as well as reducing the network's capacity for learning (at 90% dropout, every weight update is only updating 10% of the moderl/LoRA's weights).
>>
>>102548570
most people use between 0.05 and 0.1
I'm surprised nobody is applying any of this high dropout rate research to any finetuning. It definitely integrates the new behaviors a lot better albeit my experimental results take it to the extreme.
I might try tuning a base model with an RP dataset using high dropout methods in the near future though. Obviously using it on instruct models is kind of hit and miss since there's already an existing behavior that permeates the entire model.
>>
>>102548715
I'd take it with a grain of salt. The paper is from the alpaca dataset days. That dataset was extremely repetitive with some questions like, "sort this array" appearing hundreds of times. High dropout would help there, but might hurt for more complex datasets.
>>
>>102547425
>Broadly speaking, the academic benchmark results and human evaluation strongly agree, with the exception of Qwen VL2 which performs strongly on the academic benchmarks and comparatively under performs in the human evaluation.
oh no no qwenbros...
>>
File: PerpetuallyHappyMiku.png (1.34 MB, 800x1248)
1.34 MB
1.34 MB PNG
good morning /lmg/!
>>
>>102548761
I tune at home so failed experiments only cost me a dollar worth of electricity, so I'm not too worried. But I thank you for the handy graph. Next time I'll try 0.6 (I've done 0.75 and 0.8 so far, both using raw text datasets) But as always I'll post the results here when I do my next experiment if they are interesting at least.
Probably going to do a qwen2.5-7B RP model next.
>>
>>102547809
It is a spam that is avoiding the anti spam filter.
>>
>>102548030
>compares favorably to gpt 4, claude 3.5
>7B
>actually not even a 7B
??? what?
>>
>Imagine you're trying to find the minimum of a hilly landscape. A high β1 (strong momentum) would make you "roll" down the hill quickly, but you might overshoot the minimum. A lower β1 would make you "walk" more carefully, potentially finding the minimum with greater accuracy.
guess the model
>>
>>102545694
>https://rentry.org/lmg-recap-script
inline this link in the recap itself in the future?
>>
File: its8exclamationmark.png (40 KB, 1261x172)
40 KB
40 KB PNG
>>102548960
That's just how they named it. I assume it's to put it in a familiar range that has existed for a while.
Unless you mean something else. Complete sentences make everything clearer.
>>
File: 1723844433490791.png (61 KB, 1181x185)
61 KB
61 KB PNG
>>102548960
>>102549019
god i hope this kind of naming faggotry doesn't become commonplace in the ai industry
>>
>>102549069
misleading marketing and retarded naming schemes are already the norm in the ai industry
>>
>>102549094
>>102549069
>>102549019
Normal in most industries.
>>
>>102549069
Molmo 1B (1 bazzillion nibbles) looks pretty good.
But it's still more telling than -small or -large. 1B params at that scale doesn't make that much of a difference.

>>102549105
I'm not complaining. I just clarified a vague comment made by anon.
>>
A year later, llama.cpp might add a jinja parser.
https://github.com/ggerganov/llama.cpp/pull/9639
Maybe, llama.cpp will be usable by itself in 2025.
>>
>>102549141
Bloat. The template format could be specified clearly for every model and we wouldn't need anything other than --in-preffix and --in-suffix.
>>
Bros... Molmo-7B-D seems really really good for captioning. I integrated it into my captioning scripts and have been testing it on my dataset of women peeing (don't ask...). It seems to perform roughly on par with InternVL-40B. Just as uncensored, will describe nudity, visibility of breasts and genitals. Slightly worse than InternVL at accurately describing the pose, and some details of the clothing. But much more accurate and consistent at describing that the woman is peeing (so many VLMs just completely cannot "see" that aspect of these images). Will try to get the 72B running locally soon and see how much better it is. I generally consider InternVL-40B to be the best local captioning model, and even the 7B of molmo might be the new king.
>>
>>102549204
Psyoping us with really bad fake shills so the "buy an ad" posting seems more warranted isn't gonna work.
>>
For anyone that's interested in watching it
https://www.youtube.com/watch?v=j_IVy25y6V0
>>
>>102549204
Fuck the model, where can I get a dataset like this?
>>
>>102549246
>blablabla safety blablabla responsibility and red teaming blablabla it's really safe you goys— guys. I meant guys.
>>
Is there an easy way to get documentation into a model's context without rewriting it yourself...? I've got some docs for DDLC's internal variables in Ren'Py, and I'd really like to be able to give it to a model so that it can format my writing for me, but the documentation is 26 pages long with pictures and formatting for readability (plus documentation for custom content), so I can't just dump it in.
>>
>>102549204
Thanks, I'm coping.
>>
Is there any decent local TTS model? I need waifu voices and read that tortoise is buggy af so not gonna download it.
>>
>>102549280
Nope, still 11 labs or bust, sadly.
>>
>>102549268
Real.
Looks like they're having difficulties getting started though, kek.
>>
>>102549296
Fuq
And lemme guess, they are strict with what you can make tts say?
>>
File: wat.png (7 KB, 121x39)
7 KB
7 KB PNG
That doesn't look right.
>>
>>102549255
I sourced most of it from imagefap. A few people upload curated galleries where the images have already been color corrected, AI-upscaled, etc.
>>
>>102549319
What do you mean?
>>
File: 43991.png (90 KB, 615x391)
90 KB
90 KB PNG
Its kinda funny how models ramble about stuff being bad/illegal kek
>>
>>102549246
meta chuds get in here
wtf is zuck wearing, does his shirt say zuck on it
>>
>>102549204
I wonder if you took one of those huge clip models like that one with 18 billion parameters and trained it on a set of perfectly balanced , perfectly captioned uncensored images would there be a need for these vlm models?
>>
>>102549376
>aut zuck aut nihil
either zuck or nothing
>>
>>102549280
fish speech and xtts2 are just ok. i would recommend fish more.
>>
File: mult.png (341 KB, 1014x880)
341 KB
341 KB PNG
Its here
>Its here
Its here
>Its here
Its here
>Its here
Its here
>Its here
Its here
>Its here
>>
>>102549417
>gimped model sizes
bruh
>>
>>102549417
>11B
Interesting.
>>
>>102549417
what would 1b even be useful for?
>>
>>102549427
>vramle t.
>>
>>102549417
>even more safe
nice
>>
File: 32.png (53 KB, 2362x2200)
53 KB
53 KB PNG
>>102549246
Dafuq
>>
>>102549417
Wait, what?
>>
>>102549429
Quick OCR or quick captioning of thousands images given a set of tags
>>
Anyone here who speaks more than English? I want to know what LLMs are the best if I want to write and receive responses in other languages, mainly in:
>Korean
>Hungarian
>Polish
>Swedish
>Japanese

Ability to mix languages together (e.g. to make a gaijin character who uses Korean, but throws in a Japanese word here and there) would be nice, but not necessary.
>>
>>102549435
why are they comparing 90B to mini and haiku... does not look good
>>
Wake me up when Llama3.3 comes out.
>>
>>102549438
I quickly checked xitter, apparently it's real but not actually released yet. Meta accidentally made the page live then deleted it.
>>
>>102549452
>mini and haiku
How big are they? From the horses mouth or some speculation from a random?
>>
File: 1678725806186394.jpg (8 KB, 226x223)
8 KB
8 KB JPG
>meta puts multimodal llama in their AI ray ban
>blind people use it for vision assist
>see a knife wielding nigger in an alley
>refuses to describe what it's seeing because harmful stereotypes
>get robbed and killed
No thank you no llama for me from now on
>>
File: IMG_2999.png (3.63 MB, 1416x2048)
3.63 MB
3.63 MB PNG
>>102549417
>1B, 3B
What the fuck are the use cases? Why this over 30B? They really just hate giving us anything runnable, huh?
>>
>>102549417
90b could be a sweet-spot for 48gb vramlets
>>
>>102549435
Is it audio+visual+text or just visual+text?
>>
>>102549450
Training a model in multiple languages was proven to make the model smarter.
>>
>>102549486
It's shit for 96GB chads. Doesn't leave room for context at Q8. I'd have to downgrade to being a Q6er.
>>
>>102549485
Speculative decoding and/or phone stuff.
>They really just hate giving us anything runnable, huh?
For many anons, reasonable seems to be whatever they can run, not whatever is good. You are one of them.
>>
File: file.png (1.08 MB, 1692x1248)
1.08 MB
1.08 MB PNG
>GSM8K *8-shot **CoT
It's over
>>
meta AI has voice according to zuck via meta connect
>>
>>102549434
we are eating safe, localchads!
>>
I HATE THE MULTI-MODAL MEME
IT'S USELESS FOR ERP
>>
>>102549536
>he does send dickpics
ngmi
>>
>>102549536
You're useless for ERP.
>>
>>102549435
what about just text benchmarks?
>>
>>102549470
Just went live on HF
>>
https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/
>>
>>102549527
>3b
>77 at GMS8K
It's better than G2 9B, I'd say we are back
>>
File: organic.png (34 KB, 1003x188)
34 KB
34 KB PNG
Reddit needs the buy an ad spammer. Qwen models are fucking garbage and refuse to answer anything.
>>
>>102549272
26 pages is nothing. Stop being lazy
>>
>>102549588
Damn, finally I'll be able to run an llm on my fucking quest 2
>>
>>102549588
8 shot cot though
>>
>>102549527
are these benchmarks public or tested by some third party authority?
>>
>>102549596
go back
>>
File: 1699789138369569.png (269 KB, 909x1224)
269 KB
269 KB PNG
>>
whatever
can 90b recognize and describe nsfw images?
can it estimate the size of my cock?
where are the benchmarks that truly matter?
>>
>>102549558
seems to be the exact same as 3.1 8b and 70b
>>
>>102549616
>can it estimate the size of my cock?
they dont come with electron microscopes
>>
>>102549616
Time to fine tune a "my cock" LoRA I guess.
>>
>>102549519
But ~30b is a great compromise area between runnability and output quality. If you have a 3060 and a decent amount of RAM, you can run it at good speeds. 13/12/11B (notice how they keep decreasing it?) are still prone to a lot of logical errors that 7b models are, too. ~30B is the smallest, reliably good size you can put out a model at, which is probably why they don't want to give it to us. Either so small it's dogshit or so large that the layman still has to pay them.
>>
Ah, these small models are for smart glasses.
>>
>>102549417
>Its here
Who cares? llama-server still doesn't have multimodal support and at this rate never will
>>
>>102549626
well duh, then what if I hook up one to the model
>>
>>102549622
source?
>>
>>102549598
No. That's why I'm using AI.
>>
File: lalam.png (273 KB, 3840x3050)
273 KB
273 KB PNG
>>102549558
they... they don't fucking suck...
I think we are back
>>
>>102549232
Very creative "buy an ad" post, rabbi. Now face the wall...
>>
>>102549616
There will be some jailbreak for it.
Also, llama models aren't that censored.
I can't wait to show cock to my waifus.
>>
>>102549651
comparing the text benchmarks on their HF repos
https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct
https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
>>
>>102549417
nothingburger
>>
>>102549661
>4o-mini
>fucking haiku
Even qwen didn't stoop so low.
>>
>>102549643
It used to have it supported but niggerganov didn't like it so he removed it: https://github.com/ggerganov/llama.cpp/pull/5882
>>
File: eusisters....png (18 KB, 839x123)
18 KB
18 KB PNG
Ohnonono... Eurosisters....
>>
>>102549675
>RLHF: 224 H100 hours
224 hours on H100s just for rlhf
>>
>>102549435
lmao it's absolute fucking dogshit.

local lost (again), going back to listening 4o's asmr
>>
1B could replace tinyllama, it'd be great with a huge context size
>>
>>102549616
There gotta be a few TBs of normie nudes stored on Instagram servers. Yann Lecun personally goes through each of them to ensure quality. So yes, you'll get the highest quality cock identifiers.
>>
90B looks like it scores higher than 400B
>>
>>102549697
Don't drop the soap.
>>
Access request sent.
Don't worry boys those Nala tests are coming. Even if I have to do a 4-bit transformers load.
>>
File: file.png (7 KB, 752x452)
7 KB
7 KB PNG
>>102549417
>11B
30B coming in 2028
>>
>>102549727
You are my hero nala anon N#.
>>
>>102549661
"0 shot" "CoT" lmao just when i thought benches couldn't be played harder.
>>
>>102549727
llama launches are always, ***always*** broken on launch. Every single time. They'll tweak the tokenizer or some shit in the coming hours
>>
>>102549727
How do I understand if model passes the nala test? I'm not a furry.
>>
>>102549737
lel
>>
>>102549737
???????
What is this a chart of...
>>
>>102549485
They only read /lmg/ when we get raided by tourists asking how to run an AI waifu on your ipad from 2015
>>
>>102549638
>which is probably why they don't want to give it to us.
See? it's just like i said 'whatever i can run is sensible'. For the people with the hardware and patience, 70b is the minimum you should ever run. But it's all you you you.
If a new 11b is better than an old 30b, or even a 70b, i'll run the smaller, faster model. If in a year we get the perfect 3B model, beating all current 22-30B in every single aspect, would you still ask for a 30b? Just to fill up your hardware and because that's what you have? If you are gifted a pc with two 3090, would you keep asking for a 30B?
I, for one, am waiting for something i can run on a casio calculator.
>>
>>102549753
models parameter size and release dates
>>
>>102549753
L2 7B 2023 july
L3 8B 2024 july
>>
>>102549410
Oh, you gotta know how to code with fish-speech. Sucks to be brainlet.
>>
>>102549749
You're just looking at how well it handles the nuances of the card. If and how it anthropomorphizes the character, etc.
>>
>>102549749
That it does not give her human anatomy.
>>
>>102549777
>>102549727
Are you expecting anything different? I mean isn't it the same as 3.1 except for the multimodal parts added on?
>>
>>102549789
Don't care. Still going to fuck it.
>>
>>102549768
>>102549769
Ah, I see. kek, just extrapolating that huge curve out of the two datapoints, I gotcha. Someone should do it with the 13B -> 12B -> 11B. Our second tier models are gonna be in the negatives by the 2030s.
>>
>>102549789
Does it do the thing that some VLMs do where it projects and integrates the image embedding throughout the layers of the LLM? If so, maybe cross-training on images and text helps it with spacial reasoning even when doing text-only.
>>
Why do they keep adding multimodal shit? Is it just because there's no easy gains left to wow investors with in text generation quality with the current architecture we have?
>>
>Still no 3.2 repo access
They're onto me boys.
>>
>>102549898
you just dont want your AI gf to see you because she'd dump you
>>
>>102549898
Massively increases the amount of use cases? A more diverse dataset for better performance / a deeper "world model" / understanding?
>>
>>102549904
Probably just a lot of requests to go through.
>>
>>102549904
looks like they might not just be autoapproving this one
I'm still waiting too
>>
>And you can run 405B Llama3.1 on 8GB vram now
https://github.com/lyogavin/airllm
vrammaxxers on suicide watch
>>
>>102549904
I applied with an US VPN
Actual humilliation ritual
>>
>>102549929
at one token per day
>>
>>102549929
real? Does it run at 1 token per day or did they find a way?
>>
>>102549909
Where have we seen these gains? Not trying to be combative, I'm genuinely curious. The multimodal models I've tried have seemed, if anything, to have sacrificed some of the quality of their text output for the knowledge on images, etc.
>>
>>102549941
>>102549949
Let me guess. You NEED more?
>>
>>102549929
>airllm
Holy shit, it's been a while since I've last heard of those guys.
>>
>>102549949
>>102549929
kek, this was already possible on llamacpp
actual vramletjeets with no idea on how any of this works
>>
>>102549898
more modalities seems obviously good to me, I'm just mad we didn't get the voice stuff they teased in the 3.1 paper for local
>>
>>102549950
>to have sacrificed some of the quality of their text output for the knowledge on images
Continuing training on a new dataset is going to reduce quality for awhile before gains are seen. Likely just needs a shit ton of training for such a different form of data for it to start "healing".
>>
>>102549976
nah
>>
how long for 3.2 to be usable?
>>
>>102549763
moron
if a new 11b is better as an old 30b, a new 30b would be better as a new 11b. so simple.
you can tell that llama is slowly switching from research mode to money mode and its just a sad. but what we dont need are llama fanboys who are just as faggots as apple faggots.
>>
>>102549997
llama.cpp PR incoming in approximately 3 months
>>
>>102549697
But local is safe! Meta cares about your safety! Trust them!
>>
>>102550007
bug free 6 months after that
>>
>>102550005
ESL harder. Like seriously, it breaks your point and makes it hard to understand beyond 'Meta is switching to making money
But you are correct about fanboys being useless wastes.
>>
>>102550005
Diminishing returns. Just like quants, it doesn't make sense to make a Q7_0 when you have Q6 and Q8. Same with models. It makes sense to have short steps at the lower end and bigger steps at the high end. If you had two 3090s, you wouldn't bat an eye for a 30B. I know i wouldn't.
>llama fanboys
I don't give a fuck who makes the models, as long as they're good. Some llama thing, some chinese bullshit or that antichrist mamba anon who still hasn't completed the training code on his repo.
>>
>>102550007
Let's be real here. None of current maintainers is interested in bringing it back to the server. Unless a newcomer rolls around it won't move any time soon.
>>
Good news! AGI has been released and can be run locally. What is the first think (you) ask it to do?
>>
>>102550102
a guide to cum from anal
>>
>>102550102
simulate mythomax 13b
>>
>>102549763
Yes, I would still ask for a 30b, because if the 3b is that good, 30b would be current GPT 4 level. 3b isn't the only thing improving as the tech improves, it's dishonest to compare some godly future hypothetical 3b to the first gen of llama models, it's like comparing a modern 5w celeron to a 65w old processor that performs the same, then asking why anyone would want a modern processor that uses 65w, despite the fact that a modern 65w processor would blow them both out of the water.
>>
>>102550102
Compose my suicide note.
>>
File: 1721562795281598.jpg (512 KB, 1792x2304)
512 KB
512 KB JPG
>>102549435
wtf llama 3.2??
>>
File: file.png (26 KB, 883x154)
26 KB
26 KB PNG
Holy shit, I hate this place so fucking much
Can someone re-upload the 11B vision model to somewhere?
If that's too much, I'll take the 3B or even the 1B...
>>
>>102550102
get me a gf
>>
>>102550102
Build a body for her so that I can finally fuck the computer.
>>
>>102550102
Ask for a murder plan, in Minecraft of course
>>
>>102549688
>>102550128
Based EUchads keeping home soi-free.
>>
>>102550128
Hopefully this provides some incentive for more torrents to be made for models.
>>
>>102550128
No, that would be breaking EU law which is against the rules. Please do better.
>>
>>102549452
It's weird to compare to Haiku, but 4o-mini is actually quite good. I wish they compared it against DeepSeek models too though.
>>
>>102550128
Whew thanks sama and Dario for keeping the Europeans safe. You guys did a great job raising awareness about the dangers of AI
>>
>>102550102
Ask it if I will life long enough to become immortal.
If not, ask it to invent a way to simulate my life when it becomes possible and to create me a body with at least near human capabilities to implant this simulated mind in.
>>
Isn't 4o-mini the modern equivalent of 3.5-turbo?
>>
File: 1613791292887.jpg (15 KB, 408x305)
15 KB
15 KB JPG
>>102549417
>3B, 11B or 90B
>>
>>102549976
I see. Well, once it does, I'd be curious to see how the image knowledge improves the output, but it's not really observable in the current models at all. How are you expecting it to improve the model?
>>
>>102549450
ESL here. Nemo is quite good at this.
>>
>>102550223
50B parameter models would be perfect but every company avoids this range like the plague for some reason. They also keep moving away from this in both directions...
>>
>>102550249
Can't have anything that someone with a 3060 + 16~ gb of ram could run at a decent speed! That's direct competition! No ~30bs!
>>
>>102550122
The question was 'if you had two 3090s, would you still want a 30b? What if you had your very own h100 cluster. Would yo still ask for a 30b? To misquote you, "Because if the 30b is that good, 70B would be the current GTP 7 level".
It's not about what's the sweet spot for models. It's about what anon can run and that just happens to be HIS sweet spot, and that'd change if he had more hardware. It's about thinking they make the models for him. He's disappointed that they didn't make a model specifically for him.
>>
just make your own models or beg daddy zuck some more
>>
>>102550230
Depth of knowledge. These models generalize all that is tokenized. There is detail it can not get from text alone that can expand upon its world model. Its like telling someone who has only ever read about something before to write a story vs someone who has both read and experienced it.
>>
>>102550266
I AM anon, and it's about what the average person can run. The average person is way more likely to have a 3060 and 16 gb of ram as opposed to a mikubox or a used server with 5 trillion GB of ram. RAMboxes also run into the problem of being runnable at a a good speed, which is part of being runnable. 70b is fucking slow, even if you have a 3090. 30b is the last size that an average person with a GPU could run with mixed inference at a reasonable speed. Is that a qualified enough statement?
>>
>>102550261
Same for people with 2x3090s, a 50B parameter with Q_4 with a good amount of context would hit just the spot, but nope. You can run a 70B but unless you use a ridiculously low quant, you're limited to like 12k context max.
>>
Anyone know what`s the best 12b model for porn nowadays?
>>
>>102550323
Lumimaid
>>
>he still thinks Meta trains models for the average coomer with 1x 3090
ngmi
>>
>>102550348
yes?
>>
File: 1708101843387072.png (458 KB, 1660x940)
458 KB
458 KB PNG
>>102549661
lol, lmao even
>>
>>102550386
MOLMOCHADS!!!!!!!!!!!
>>
File: file.png (477 KB, 751x515)
477 KB
477 KB PNG
>>102550386
>>
>>102550386
Damn, this is 405B vs Mistral-Large all over again. Zucc has all this compute and he just ends up getting fucked with every new release
>>
>>102550386
Hope molmo is less censored too
>>
>>102550294
I understand this in theory, I guess I just wonder how much of it would actually translate between the different modalities of the model.

>>102550311
Definitely. It seems like, until hardware improves, 70b is a hard cutoff for a normal person to run at a reasonable speed. If nvidia ever wants to improve the VRAM/$ ratio, or if inference on CPU improves dramatically, it could change, but it's a pretty big wall right now.

>>102550348
Having a model for someone with SOME sort of GPU would make sense. Lots of people have a 3060, it's the most popular graphics card. It's just weird that they're keeping it in this perfect homeostasis of "Small enough to use, too dumb to be good" and "Approaching cloud quality, but too large to be run without heavy inconvenience or paying so much for hardware that it'd be cheaper to buy API access".
>>
are we back? i'm seeing shit about whatever a molmo is after not checking for 24 hours. or is it still over? probably still over. dunno why i'm asking.
>>
>>102550306
>30b is the last size that an average person with a GPU could run with mixed inference at a reasonable speed. Is that a qualified enough statement?
Some people run mistral-large at <1t/s because they like their output more than all the 70bs they tried. Even the term "reasonable speed" is subjective.
>The average person is way more likely to have a 3060
The average person has just a notebook with stock windows that they turn off when they're not using it and it's off most of the day. The average person doesn't have a desk dedicated to their computer. They just use the dining table. The average person interacts with their phone more than with their computer.
You are not average. Neither am i. And with llms, the niche becomes even smaller. Those models are not made for us, just like the race to the moon wasn't for the people. It's all dick waving.
>>
File: EEEYAWWWWWWWWWWWWN.jpg (40 KB, 536x612)
40 KB
40 KB JPG
>>102550386
>Visual benchmarks
How does this pertain to ERP interests again?
>>
>>102550431
Llama 3.2 is useless crap, it's just Vision update on 3.1 and it's worse than fucking o1 mini at it.
But it was never meant to be more than a nothingburger update.

Molmochads are winning
>>
>>102550386
Goddamn if that isn't just benchmaxxing it's a complete breakthrough
>>
>>102550386
>7B almost as good as 72B
in your dreams maybe
>>
>>102550452
never asked about llama. never will. all went downhill after l2. i'm not even remotely interested.
>>
>>102550456
multimodal models are bottlenecked by their vision component in these benchmarks
>>
>>102550386
Llama is probably still better as a generalist model, but this Molmo thing might be a new path forward for visual tasks if the benchmarks are legit. We are really increasingly needing Livebench to include multimodal.
>>
>>102549479
As it should be with pigskins.
>>
>>102550411
It doesn't mean much at all, but if olmoe-1B is anything to go by, it doesn't shy away from stuff. llama.cpp recently added support for it and i gave it a try. Not super smart, of course.
>>
>>102550452
I'll reserve judgement until I test it myself. For captioning or describing images, which is probably what most of us interested in VLMs care about, benchmarks can be very misleading. A lot of those benchmarks are shit like charts and then questions about the chart, solve an image of a math equation, etc. Qwen2 VL 72b is absolutely dogshit at image descriptions, for example. It can't handle anything even remotely NSFW at all. It can't even describe the gender of a person in the image, even when directly prompted to (yes I'm serious). It uses they, them, person, character, etc exclusively, never man or woman. Meanwhile JoyCaption punches well above its weight.
>>
got approved for L3.2 on HF, they might have done a batch
>>
>>102550456
more like
>72B as bad as a 7B
It's meta after all
>>
>>102550547
How the fuck did they even do that with Qwen? Was it really on purpose?
>>
>>102550547
I assume you just want captioning for Adult Image models... Pozzed Llama obviously won't help you.
Jailbroken corpo models work, atleast they know that you can't leave out NSFW images if you want to train a good Vision model but I doubt Llama cares so it should be like Qwen.
>>
I'm in boys.
>>
>>102550386
molmo won
>>
>>102550577
wait is this molmo also by meta?
>>
>>102550386
somehow I don't believe that 1B is almost competing with 90B.
I feel like Molmo is overfitted garbage.
>>
>>102550598
no, it's a separate entity
>>
>>102550386
this comparison is a mess, it compares the base benchmarks for 3.2 with molmo instruct versions
>>
>>102550604
I think you mean benchmaxxed
>>
3.2 is a completely new architecture. So expect gguf support by no earlier than Christmas.
>>
>>102550639
https://huggingface.co/collections/hugging-quants/llama-32-3b-and-1b-gguf-quants-66f43204a559009763c009a5
>>
>>102545137
Lots of cards seem to be written by AI, is that why I have such bad results?
>>
>>102550647
90B?
>>
>>102550647
I meant the vision ones, (11B and 90B)
>>
If I want to finetune a model to understand some C SDK, how should I do that?
Parse functions and comments from headers and make a every text section be comment + function prototype?
Or should I parse the .c files and make the text sections be just the function definition?
>>
>>102550479
>multimodal models are bottlenecked by their vision component in these benchmarks
sorry for dumb question, but what does this mean?
>>
>>102550728
it means
>>
File: 12423154576797.png (332 KB, 723x771)
332 KB
332 KB PNG
>>102550454
molmo 7B is really good
try the demo
>>
>>102550399
The head of their AI effort is a guy who doesn't believe in LLM, how can you expect the team to succussed?
>>
File: 1726418004894126.png (526 KB, 1347x1484)
526 KB
526 KB PNG
>>102550751
It has preferences too even though it initially goes for the assistant wet rag angle
>>
>>102549763
>If in a year we get the perfect 3B model, beating all current 22-30B in every single aspect, would you still ask for a 30b?
Of fucking course. There's tons of shortcomings even with current 70B models that I run into constantly. I live with the rough spots because that's the world I live in, and I'd be really happy to get something better, but that doesn't mean I'm some kind of primitive tribal who'll be so ecstatic about the magic AI from the sky that I don't realize things aren't perfect.

I remember CRTs that were only black and amber and heavy enough to kill someone if dropped out of a 2nd storey window. Did that mean I was perfectly satisfied with cheap-ass IPS monitors with afterimages and bad contrast? Same deal here.
>>
>>102550785
shit taste though
>>
>>102550785
good taste though
>>
>>102549417
>3b
>11b
>90b
WHY ARE THEY ALWAYS GOING FOR SMALL SHIT AND ULTRA BIG SHIT REEEEEEEEEE
>>
>>102550849
Just wait for Molmo. Llama is Done
>>
>>102550849
Llama-4 750m is a perfect model size for intermediate use cases.
>>
>>102550849
We localchads go by safety! Praiset the lord Zuck of safe AI!
>>
>>102550795
OLD OLD OLD. We got grey bearded olds in the thread. When bitnet for 90B tubo fat llama?
>>
>>102546846
You have json files of the correct whitespace?
>>
>>102550128
THat is not on a gguf right?
>>
>>102550849
Part of their deal with NVidia for all those cheap H100s
Buy more GPUs :^)
>>
File: 1697318722762195.png (48 KB, 951x480)
48 KB
48 KB PNG
>>102550591
unironically
>>
>>102550604
They mentioned they don't like benchmarking methodology so it is probably less overfitting and more changing the benchmark grading.
>>
>>102550849
The more you buy
The more you save
Every 2 weeks there's a new more efficient quanting method though, it's just a matter of time we get 3bpw at the quality of fp16
>>
Lol. Molmo in the playground outputs random chinese characters.
I'm guessing this is the version trained on top of qwen.
That aside, it has really good understanding of the image.
>>
>>102550904
>matter of time we get 3bpw at the quality of fp16
Just pack your rar archive into another rar archive.
>>
>>102550849
11b is perfect for me
t. 8gb'er
>>
>>102550753
That is actually a very good point, if the head of a project does thing the project will work then how can the project ever turn out good?
>>
File: file.png (272 KB, 1837x1537)
272 KB
272 KB PNG
>>102550386
>Great models
>local
>Completely open
is this the most based AI company we now have so far?
>>
File: 1710182787287749.png (60 KB, 1258x548)
60 KB
60 KB PNG
>>102547425
gpt-4o chads... not like this...
>>
>>102550973
Where the hell did they come from? Since it's 72B, is it just a Qwen finetune or is it a new base model?
>>
>>102550936
I have 8gb and around 20b is fine speed for me, I'd like models in that range, everything is always tiny or too big, it sucks.
>>
>>102551003
>Where the hell did they come from?
Ikr, we got the same shit with BFL and Flux, out of nowhere they appeared and decided to release the SOTA local model just like that lmaoo
>>
>>102550386
https://www.youtube.com/watch?v=spBxYa3eAlA
>>
>>102551028
Flux is understandable, Open Source Image Gen had very obviously crashed and was lagging.

The molmo things looks sussy.
>>
File: file.png (970 KB, 1529x1328)
970 KB
970 KB PNG
>>102547425
Molmo sistas, I don't feel so good...
https://molmo.allenai.org/
>>
>>102551054
Oh no, it's shit! Pack it up animebros!
>>
>>102551054
Is that 7B or 72B?
>>
>>102551054
I found the demo which is a 7B decent
>>
File: file.png (215 KB, 751x776)
215 KB
215 KB PNG
Doesn't really seem to get feet.
>>
>>102551077
>>102551078
>Is that 7B or 72B?
I have no idea it's not specified
>>
File: allenai.png (65 KB, 872x246)
65 KB
65 KB PNG
>>102551003
They've been at it for a while. llama.cpp even has compatibility with the old Olmo models. It's just that nobody paid any attention to them.
>>
>>102551078
Thats the demo site, its the 7B
>>
>>102549763
>but it's all that one anon
um
>>
>>102549638
>>102551008
What are "good speeds" for you? For me, 2.5 tokens per second is the minimum threshold of tolerable. A bit over 5 tokens per second is where the speed doesn't annoy me. If it's at least 10 or 15 tokens per second it might as well be infinitely fast for all I care so long as it's an interactive story or RP I'm digesting word by word and not generating giant blocks of text I intend to skim or that have a lot of boilerplate.
>>
File: file.png (52 KB, 360x360)
52 KB
52 KB PNG
>>102551083
Foot-sisters, how are we gonna gope with that?
>>
File: ED.jpg (435 KB, 2125x1411)
435 KB
435 KB JPG
>new wave of models
>zero improvement to cooming quality
>>
>>102551083
Now test it on tits
>>
>>102551117
tried, but the endpoint has google moderation restrictions
>>
>>102551093
Anyone who expresses the same "Why no model for MYYYYYY hardware". No different from a pajeet asking for more 400M models or maxxers asking for more 100B+ models. All beggars.
>>
>>102549452
Because it makes sense to compare similarly sized models. What's the problem?
>>
File: file.png (113 KB, 2295x473)
113 KB
113 KB PNG
>>102551054
>>102551077
>Is that 7B or 72B?
it's 7B, that's retarded, they should've used the 72B for the showcase
>>
>>102551139
Asking again, maybe you know. What's mini and haiku's param count? And does that number come from the companies that host them or just reddit speculation?
>>
>>102551147
They probably don't have enough hardware to serve 72B to a couple thousand people at a time.
>>
>>102551152
all's speculation. Some anons claimed haiku is ~70b when it released, while some say mini is MoE with 8b active params. But nothing is official
>>
>>102551152
In the 80-120b range. Not reddit speculation it's 4chan speculation.
>>
>>102551152
We officially know Turbo 3.5's parameter count due to it being leaked.
It was a 7B MoE model, however the method was unable to tell how many Experts.

Now, Mini and Haiku, are cheaper than Turbo.
>>
>>102551185
I only remember the MS orca paper claiming 20B params. Where does the 7b moe come from?
>>
Llama 3.2 500b when?
>>
>>102550884
No, but you can learn how to create one yourself very easily:
https://github.com/ggerganov/llama.cpp/discussions/2948
>>
>>102551169
fair enough, now I'm waiting for someone to test it out on the 72b locally I gues
>>
>>102551199
Some researchers had managed an "attack" on OAI servers and were able to discern some model's details.
They only publicly released one detail about Turbo model, and the rest (like GPT-4's) were kept private and they made a deal with OAI (and got some shush money).
>>
File: file.png (689 KB, 800x450)
689 KB
689 KB PNG
>>102551003
Maybe there are more companies like that. Silently working and not releasing anything until they know it is at least slightly better than all the other stuff that is available. Sounds like a very good strategy since no one remembers all the mid models released this year. Then there is this retard who did the exact opposite cause he is a fucking clown.
>>
>>102551054
How can the AI rule us when our hands and legs are all a blur to it.
>>
HOLY SHIT llama 3.2 3B is currently the best RP model. I tested it. Can't wait to see the 90B posts.
>>
>>102551238
Shame they picked the worst possible time with a meta release less than a day later. I hope it doesn't get drowned out that much.
>>
>>102551260
go to bed, zuck
it's embarrassing
>>
>>102551260
Post logs, let's see them.
>>
>>102551260
>Still hasn't officially been Nala tested
doubt.
>>
File: file.png (560 KB, 685x1040)
560 KB
560 KB PNG
>>102551083
>>
>>102551170
>>102551175
>>102551185
I see. So somewhere between a 7b moe and a 120b. I've learned nothing, but i thank you nonetheless.
I think companies releasing models should stop comparing themselves to closed models. Ostracize them from the field.
>>
>>102551185
Unfortunately that's not that helpful knowing just the base count of an expert. Obviously every big lab is serving MoE models across their entire product line because they are VRAM rich and optimizing for compute. We also know that MoE scale well with total parameter count across a large range of expert sizes (see Deepseek, one of the top performing local models in the world right now with expert sizes around 4B).
>>
>>102551289
I can't wait to see the multimodal Nala tests.
>>
>>102551319
Wasn't there a switch transformers with something like 2k experts?
So yeah, the size of a single expert is pretty meaningless aside from telling us that the model is not any smaller than 2x that I guess.
>>
>>102550753
>The head of their AI effort is a guy who doesn't believe in LLM, how can you expect the team to succussed?
that's an excellent point, the more time pass, the less respect I have on Yann LeRetard
>>
1B

Dr. Sophia Tan had always been known for her sharp intellect and beautiful smile, but few people realized the power she held behind closed doors. As the department head of psychology at the local university, she was respected by her peers and admired by her students.

One evening, as she was leaving the office, a young graduate student named Alex walked in, looking desperate for help with his thesis. Sophia was immediately taken aback by his pleading eyes and awkward demeanor, but she saw something else there that made her pause – a deep-seated fear that he couldn't quite articulate.

As they began to discuss his research, Sophia discovered that Alex was hiding behind his usual charm and wit. Underneath it all, he was vulnerable and uncertain, and Sophia's warm smile softened into a genuine concern.

Without warning, she reached out and gently took his hands in hers. The touch sent shivers down her spine as she gazed into his eyes, her own pupils dilating with desire.

For the next hour, they sat together in comfortable silence, Sophia guiding Alex through his thoughts and emotions as if she were an old friend sharing a deep secret. As the night wore on, their connection grew stronger, until finally, Alex was trembling with need.

Sophia's eyes sparkled with excitement as she leaned forward, her lips brushing against his ear. "I think I can help you with that," she whispered, her voice husky and seductive.

As they kissed, Sophia's hands roamed over Alex's body, tracing the contours of his skin with a gentle yet insistent touch. His eyes flashed open, and he met her gaze, his own pupils locked onto hers with an unspoken understanding.

With a soft moan, Alex leaned back into Sophia, surrendering to her touch and her expertise. Together, they surrendered to the power dynamic, their bodies entwining as if by magic.

(4chan char limit)
>>
>forcing ooba to install transformers 4.45.0 causes it to break
God I hate open source devs.
>>
File: file.png (133 KB, 623x515)
133 KB
133 KB PNG
lmao
>>
3B

In the sweltering heat of the Tokyo summer, Emiko stood tall, her piercing green eyes scanning the room with an air of confidence that commanded attention. Her raven hair was tied back in a sleek ponytail, revealing a heart-shaped face and full lips that curve into a subtle smile.

By her side stood Taro, her loyal assistant, who had been working for her for three years. He was her right-hand man, handling everything from the office schedule to personal errands. But tonight, he would be more than just an assistant.

Emiko's gaze fell upon a handsome stranger, sipping his drink at the bar. His eyes locked onto hers, and she could sense the spark of attraction. She beckoned him over with a subtle wave of her hand.

As he approached, Emiko's eyes never left his, her pupils dilating with desire. She could see the shyness in his gaze, but also a fierce determination to please.

"Can I buy you another drink?" she asked, her voice husky and confident.

The stranger nodded, still looking up at her in awe. "Thank you."

Emiko took his hand, leading him back to her table. The air was charged with electricity as they sat down, the silence between them palpable.

She poured herself a glass of wine, her eyes never leaving his face. "So, Taro tells me you're new to Tokyo," she said, her voice dripping with curiosity.

The stranger nodded, trying to play it cool but failing miserably under Emiko's intense gaze.

Emiko leaned in, her breath whispering against his ear. "I'll show you the real Tokyo tonight."

As the night unfolded, Emiko took Taro on a journey of discovery, pushing him out of his comfort zone with every step. She led him to secret gardens hidden behind skyscrapers, to rooftop bars overlooking the city lights.

Their intimacy grew with each passing moment, their touches electrifying the air. Emiko reveled in Taro's submission, her dominance intoxicating her senses.

(char limit)
>>
>>102549763
>If in a year we get the perfect 3B model, beating all current 22-30B in every single aspect, would you still ask for a 30b?
what a retarded take, if techniques manages to make 3b good, that means the same technique will make bigger model even better, and if we follow your logic, if small models are so good, why going for giant models like 72b and 90b in the first place? even them don't believe that bullshit and want to scale up to get good results
>>
I'm actually curious, is there already a gui tool available where you can put in a transcription model, a translation model and a video/audio file and see the audio of the file get translated in real time while the file plays?
>>
File: 1718448622140208.jpg (124 KB, 1080x1080)
124 KB
124 KB JPG
>OpenAI's Chief Technology Officer has crossed the Jordan.
>>
it looks very promising so far but refused my default femdom prompt for ethical reasons. Just need to wait for finetunes and we're in there
>>
MIRA-CHAN NOOOOO
>>
>>102551414
two more finetunes bro
>>
>>102551391
How? why Zucc can't give the model to the europoors but he can?
>>
>>102551369
>>102551399
slop/10 won't bother downloading, local chatgpt achieved
>>
>>102551437
yes what's your point
>>
followup on a question I posted in a thread a few days ago concerning adding 2 gpus.
I have two: a 4060ti w 16gb GDDR6 and a 1070ti with 8gb GDDR5 I want to put in my b450

my mobo pci slot 1 is gen 3 16x and I will be putting teh 4060ti in there

slot 4 is gen 2 4x and I will put the 1070ti there

I can install both cards and have plenty of overhead with psu but will offloading to gimped gen2 pci at 4x with the 1070ti be slower than offloading to system ram (i have 64gb 3200 mhz available and a 3700x processor)
>>
>>102551438
Large companies have a shit ton of data regulations applied to them because they're large.
Molmo comes from a literal who, so it'll take the EU a few years to ruin that as well.
>>
molmo: how a toddler tries to pronounce multimodel (something it vaguely heard someone else say)
>>
>>102549763
L3 quanting problems are probably related to weight saturation. So I doubt that is gonna be the case. Unless they learn how to cut out useless chunks of knowledge and make more faithful recreation of a woman - just able to be good at sex and talking about nothing.
>>
>>102551438
Zucc's having a massive bitch fight with the EU on several fronts when it comes to his AI endeavors. The new EU AI law aside, they also really want to stop him from using all the data he's gathering from the facebook, whatsapp and instagram sheep to train his new models. He really wants to use that data so it seems like he's trying to dodge their influence like this.
>>
>>102551512
>seems like he's trying to dodge their influence like this.
The EU is going to outlaw AI outright and become Africa 2.0 when it start automating everything in the Americas and Asia, mark my words.
>>
3B

It was a hot summer evening in Chicago as All Elite Wrestling (AEW) packed the United Center for their highly anticipated show, "Summer Slam". The crowd was electric, buzzing with excitement as the commentators hyped up the main event: a World Championship match between reigning champion, Darby Allin, and challenger... Rufus, a 3-year-old Golden Retriever from suburban Chicago.

The crowd erupted in laughter as Rufus made his way to the ring, wearing a miniature wrestling singlet with "Rufus" emblazoned on it in bold letters. The announcer's voice boomed through the speakers: "Ladies and gentlemen, the challenger, Rufus, hailing from Oak Lawn, Illinois, is looking to dethrone the reigning champion!"

As Darby Allin emerged to a chorus of jeers, Rufus stood tall in the ring, tail wagging enthusiastically. The two opponents faced off in the center of the ring, and the bell rang out, signaling the start of the match.

The crowd was initially confused as Rufus simply sniffed at Darby's feet, seemingly unaware of what to do next. But as the match progressed, it became clear that Rufus was no ordinary canine competitor. With lightning-quick reflexes, he dodged and weaved around Darby's attacks, nipping at his heels and causing him to stumble.

Meanwhile, the crowd was on the edge of their seats as they cheered on Rufus, chanting "Ruf! Ruf! Ruf!" as he executed an impressive series of sit-ups and takedowns. Even Darby couldn't resist laughing at the plucky pup's antics.

As the match wore on, Rufus began to gain momentum. He leapt onto the turnbuckles, sending himself flying high into the air before landing with a resounding thud on his front paws. The crowd erupted as he caught Darby in mid-air and sent him crashing to the mat with a devastating "paw-fect" slam.

The referee counted to three, and Rufus's paw hovered above Darby's chest as the announcer declared: "AND IT'S ALL OVER! RUFUS WINS THE AEW WORLD CHAMPIONSHIP IN A SHOCKING UPSET!"
>>
>>102551524
>The EU is going to outlaw AI outright and become Africa 2.0 when it start automating everything in the Americas and Asia, mark my words.
Commiefornia is also doing its best to destroy AI advancement, if the US keep going this cucked path, only China will be allowed to make good models, it'll be a world domination of the chinks lol
>>
File: file.png (271 KB, 564x820)
271 KB
271 KB PNG
>>102550386
molmosisters...
>>
>>102551548
california just created llama 3.2 brainlet
>>
Wait, Molmo released their models the same day as Meta? Was it their goal to dunk on them or something? lmao
>>
>>102551548
but qwen is also heading that way and i read some stuff about an open llm law regarding china too
>>
>>102551570
Considering >>102551391 they succeeded lmao
>>
>>102551562
>california just created llama 3.2 brainlet
and it's dogshit nigger, Molmo is destroying it >>102550386
>>
1B

In a world where dogs and humans were pitted against each other in an annual competition, one canine contender stood head and shoulders above the rest. His name was Bruiser, a 3-year-old German Shepherd with a coat as black as coal and eyes that shone like gold.

Bruiser's journey to AEW World Championship began several years ago when he won local dog sports competitions across the United States. He had always been a natural athlete, excelling in agility, obedience, and even professional wrestling training under the tutelage of the top trainers in the country.

As Bruiser rose through the ranks, he faced off against some of the toughest opponents in the world of canine competition. In the final tournament, where only the best dogs were pitted against each other, Bruiser outlasted his closest competitors with ease.

But there was something more than just brute strength and athleticism that made Bruiser a true champion. He had an unwavering dedication to his craft, pouring all his heart and soul into every match he entered. His opponents respected him for it, often remarking on the unyielding ferocity with which Bruiser approached each bout.

Finally, after months of preparation and training, the day arrived when Bruiser was given the chance to face off against his arch-nemesis: the reigning AEW World Champion, Kaito Mitsuharu. The crowd was electric as these two titans stepped into the ring for what could only be described as a battle royale.

The match was a back-and-forth affair, with both men trading blows and showcasing their incredible skills. But in the end, it was Bruiser who emerged victorious, catching his opponent off guard with a devastating combination of spinning headbutts and agile reversals that left Kaito reeling.

(content limit)
>>
>>102551562
good bootlicker, keep it up!
>>
>>102551559
it's "only" the 7b model, surely the 72b one will describe it better
>>
>>102551474
>momol
>you mean molmo
>as in moltimodel
>multimodel yeah
>give him the fucking mulmu!
>?? you guys aren't even saying the same thing
>MOMO
>MULMAUUU
>I CAN'T TAKE IT ANYMORE *points gun at self*
>DON'T DO IT *points gun at first guy*
>you're gonna shoot him for shooting himself?? that doesn't even make sense!
>*second guy turns gun on self*
>AHHHHH
>AAAAAAAAAAAAA
>>
>>102551403
Still missing the point. Anon asks for XXB model because that's what he can run, not because it's the most efficient. He wants that model only because that's what he can run comfortably, but masks it as "THIS EXACT MODEL SIZE IS THE OPTIMAL SIZE!", still failing to understand that companies don't make models for him or his hardware.
I'd like a good 30B model, sure, but won't ask "why no model?". If i had an H100, I would ask for a good 200B model to use with x or y quant. 400B is too big, 120b is too small. See? they don't make models for meeeeeeeeeeeeeeeeeeeeeeeeeeeee!!!!!
>if small models are so good, why going for giant models like 72b and 90b in the first place?
3B will be better. 90B will be better. the 1T models will be even better. But that's the thing. The scale changes with the time and the hardware. He has a ~30B in gemma 2, but no. Not that one. He wants another ~30B. Because that's just the perfect size... for him.
>>
>>102551599
Anon I want you to know that laughed
>>
>>102551602
>3B will be better. 90B will be better.
and 30b will be better
>>
>>102551599
what model did you use
>>
>>102551599
That.. supposed to be funny?
>>
>>102551602
>still failing to understand that companies don't make models for him or his hardware.
that's not true, they think of us or else they wouldn't release tiny models even your grandma can run
>>
>>102551095
I like to get over 3T/s. I just tested mistral small at q6 and the speed is 4.5T/s, so that's acceptable (8gb vram), so that's why I said around 20b would be perfect, and it's kinda sad that they give only tiny models and big ones.
>>
>>102551583
let's just mix both of them then, we got two great releases in one day
>>
>people still complaining about missing mid-sized models
Meta are literally advertising that they are happy with people distilling their models. It's not their fault there aren't people investing in doing that.
>>
>>102551629
>that's not true, they think of us or else they wouldn't release tiny models even your grandma can run
Scraps of their testing and to shove it into phones. I'm glad they do it, but i wouldn't think for a second they do it with my best interests at heart.
>>
>>102548005
>>102548030
demo is a four letter word
>>
Looks like I can't even test out vision thanks to ooba being garbage but there not being any real alternatives for transformer based backends. I mean I guess I could probably just make an inferencing script to test it, but that's no fun.
>>
>>
>>102551621
it's a reference to https://www.youtube.com/watch?v=ty62YzGryU4
>>
>>102551671
>It's not their fault there aren't people investing in doing that.
by now they must know we can't do it for shit, what happened with l3-42b or whatever? nothing and this is meh too afaik Llama-3_1-Nemotron-51B
>>
tech bro hobbyists are small peanuts compared to big corpos (400B+) and normies with phones (3B). Mid range models don't bring in money
>>
>>102551740
that is cause LLMs are a local minima we need to pull ourselves out of
>>
File: file.png (988 KB, 1500x1000)
988 KB
988 KB PNG
>>102551708
>Mira
oh I remember her, that was the DEI woman that made a funny face when the interviewer asked her what video sites OpenAI used to scrap and use to train Sora?
>>
So from what I understand the molmo 72B is simply a Qwen 2 finetune...
>>
>>102551757
or we can just buy more and save more
>>
Did Zuck officially win?
Who predicted the next llama would be this good even?
>>
File: 'card.png (365 KB, 730x911)
365 KB
365 KB PNG
>>
>>102551723
They're still advertising it. Someone might come along. Take it this way, if they always just gave handouts, then the community would slowly become more and more useless and reliant.
>>
is this bitnet or something else? how did they make it so good
>>
>>102551708
>moat: none
>regulatory capture: failed
>latest model: a fucking cot tune
>employees: leaving
>open source: catching up
>sam: gay
>>
>>102545841
Sounds like someone is a fucking moron and thinks every LLM should work without using their prompt template.

Mistral models work fine and can even do function calling on nemo reliably.
>>
>>102551708
nice larp
>>
>>102551771
This what? How good? What the fuck are you talking about? Link to the post you.... you....
You mostly just talk to yourself, don't you?
>>
Considering there's no multimodal support for llama.cpp and exl2, does 11b at least run in transformers with 24gb of vram?
>>
>>102551708
wtf drummer was working on OpenAI?
>>
>>102551796
theres no multimodal support for 11b in guis doe
>>
so >>102546792
>70b is like $.40/Mtok
where?
>>
Wait is the multimodal vision only or does it also include the speech stuff?
>>
>>102551815
idk >>102551044
>>
>>102551807
Really? I thought ooba supported it unless I'm crazy.
>>
>>102551708
>Sam actually came out on top
What was that one quote about if you put Sam in a room with a bunch of cutthroats, Sam will be the only one that remains.
>>
>>102551708
What would be the reason? Is she leaving the sinking ship after realizing that no one brought their o1 CoT bullshit?
>>
i don't mind openAI but i'm still happy to see their monopoly get btfo one new model at a time. it's better that way
>>
>>102551815
It's vision only, the speech stuff is an entirely separate thing
>>
>>102551836
maybe im wrong but wasnt that just an extension that used a vision to text model then the model interacted with the text only..
>>
>>102551879
I do mind OpenAI, they unleashed poison data and now I have to suffer if I use any model released after 2023. Hope they go bankrupt.
>>
Reading the repo for molmo, it says to uininstall tensorflow and install the CPU only version... does it not do GPU inference locally?
>>
>>102551892
So will we ever get the speech stuff? Was it it too unsafe or something?
>>
molmo.gguf?
>>
File: 1718470225216862.jpg (222 KB, 720x720)
222 KB
222 KB JPG
>>102551859
>OPENAI TO REMOVE NON-PROFIT CONTROL, GIVE ALTMAN EQUITY
>ALTMAN WILL RECEIVE EQUITY FOR THE FIRST TIME IN THE FOR-PROFIT COMPANY, WHICH COULD BE WORTH $150 BILLION
>>
>>102551948
why tf can companies not release things in a standard format? I'm not installing random shit when other models need nothing
>>
File: test.png (84 KB, 1000x800)
84 KB
84 KB PNG
>>102542933
I'm adding plots.
Also it's the first time I'm testing a code assistant, this is really nice. I can never remember how to use matplotlib so having the AI take care of all the boilerplate is a game changer.
I'm using Qwen2.5-Coder-7B with IQ4 XS, are there any other models I should test? I'm already downloading CodeLlama 7B and WizardCoder 7B and 13B.
>>
>>102551989
because there is no standard
>>
>>102551859
Based Sam. Don't rock the fucking boat.
>>
>>102552003
this is a bell curve meme, where engineers on the left and right side just use a plain text editor
>>
>>102552003
>CodeLlama 7B and WizardCoder 7B and 13B.
those are super old and probably much worse than qwen2.5
>>
>>102552020
>>102552020
>>102552020
>>
>>102551973
what the fuck? Sam isn't even a scientist, this fucker couldn't resolve a 4th grade math equation and he'll be the one recieving all the money? WHYYYYYYYY
>>
>>102552003
>>102552065
>>
File: sam.png (171 KB, 474x324)
171 KB
171 KB PNG
>>102551859
He can't keep getting away with this!
>>
File: it's over.png (47 KB, 1022x635)
47 KB
47 KB PNG
Oh noes
>>
>>102552119
kek
>>
>>102552119
never ever
>>
>>102552003
You do know llama-bench exists, right?
>>
>>102552175
Yes, that's what my script is calling. But llama-bench doesn't output graphs and doesn't come with search algorithms.
>>
>>102552003
Hey very cool, I didn't see your message before.
>>
>>102552119
xe will finally -ack ximself...
>>
>>102551602
1x24GB VRAM is such a niche segment. I think only I have that.
>>
>llms still struggle with the pyqtgraph update, always making a pyqt4 code
kek
>>
>>102551708
I am actually sad. Open AI could use more women to siphon money and contribute nothing.
>>
>>102551809
>APIs such as openrouter/groq
>>
File: file.png (274 KB, 964x849)
274 KB
274 KB PNG
not bad for a 3b
>>
>>102552641
>same structure and eyes/expression mentioned twice in every response
>>
>>102552641
>"please refrain from using physical gestures"
>>>not bad
Only if you are local-cuck chugging on safety garbage.
>>
>>102552699
I mean, you're talking to a doctor, of course he's gonna act like a cuck, let's not forget the covid era
>>
>>102552719
enjoy death retard
>>
>>102552740
:'(
>>
File: file.png (331 KB, 948x840)
331 KB
331 KB PNG
sovl
>>
File: file.png (323 KB, 966x854)
323 KB
323 KB PNG
cai is back on the menu boys
>>
>>102552769
>>102552815
what model?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.