[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1707244174783552.jpg (149 KB, 500x500)
149 KB
149 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103153308 & >>103135641

►News
>(11/12) Qwen2.5-Coder series released https://qwenlm.github.io/blog/qwen2.5-coder-family/
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
>>103164575
>>103164575
>>103164575
Actual thread.

This thread was made by a thread splitting troll that has genuine mental issues about his ritual posting.
>>
>>103164659
>tuesday
>not teto
shame on you
>>
>>103164687
Shut up racist
>>
>>103164707
>(embed)
>old news
hi petra
>>
File: 1703071671972056.jpg (293 KB, 984x2084)
293 KB
293 KB JPG
total tranny cleansing can't come soon
>>
>>103164748
let them cook
>>
>>103164687
Get out.
>>
>>103164659
>Thread Theme:
https://www.youtube.com/watch?v=hlQ4IM1qzlk
>>
>>103164817
>Qwen 3.5 coder model review and impressions
>>
File: 2489892.png (328 KB, 484x820)
328 KB
328 KB PNG
>>103164659
>>
File: tetrecap1.png (1.96 MB, 1536x1536)
1.96 MB
1.96 MB PNG
►Recent Highlights from the Previous Thread: >>103153308

--Paper: When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization:
>103160371 >103160493
--Papers:
>103160243
--Testing LLMs with provocative prompts and discussing prompt engineering and filtering:
>103155576 >103155674 >103155779 >103156002 >103156059 >103156672 >103156221
--Running 72b on a 3060 GPU with 12GB VRAM, and the need for high-end hardware:
>103161742 >103161813 >103161830 >103161836 >103161831 >103161872 >103161882 >103161917 >103162116 >103162214
--Voice AI and voice cloning discussion:
>103158298 >103158310 >103160612 >103160683 >103158368
--Updating model parameters during inference and its implications for AGI:
>103156465 >103156572 >103156985 >103157866 >103157900 >103157023
--Specifying GPU for speculative decoding in Tabby/ExLLaMA:
>103162124 >103162223 >103162386 >103162441
--Qwen 2.5 Coder model impressions and performance:
>103154799 >103154931 >103155013 >103155085 >103155098 >103156687
--Quantization types and their impact on AI model speed:
>103160556 >103161363
--Processing long documents with local models for summary and insights:
>103158469 >103158698
--Anons discuss Qwen2.5, Sonnet 3.5, and Largestral models:
>103161265 >103161296 >103161401 >103162176 >103162413 >103161639
--Anon tests Qwen2.5 Coder Instruct with Nala scenario:
>103160663 >103160744
--Anon shares Unbounded game, others say it's not new:
>103158574 >103158586 >103158649
--Anon questions how GPT-4 validates code:
>103161529 >103161534 >103161692
--Anon mentions Jetson Thor as a potential solution for homemade android with local processing:
>103158392
--Qwen 2.5 coder model review and impressions:
>103159846
--Miku (free space):
>103153440 >103154178 >103154266 >103154839 >103156287 >103158261 >103158447 >103160213 >103160416 >103161631 >103162124 >103163680

►Recent Highlight Posts from the Previous Thread: >>103153319

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103164881
epic fail
>>
>>103164841
Happy?
>>
>>103164817
>>103164881
lmao retard
>>
Was the original bitnet paper about quantization or training models from the group up?
Are there any models trained in 1.58b?
>>
Introudicn g

The most powerful open source code large model!!!

Rombos-Coder-V2.5-Qwen-32b is a continues finetuned version of Qwen2.5-Coder-32B-Instruct. I took it upon myself to merge the instruct model with the base model myself using the Ties merge method as demonstrated in my own "Continuous Finetuning" method.

This version of the model shows higher performance than the original instruct and base models.

https://huggingface.co/rombodawg/Rombos-Coder-V2.5-Qwen-32b
>>
>>103164968
>training models from the ground up
that
>Are there any models trained in 1.58b?
yes
>https://huggingface.co/1bitLLM/bitnet_b1_58-3B/tree/main
>8 months ago btw
>>
>>103164974
no it doesn't
>>
>>103164982
Sick. I never bothered to look too deep into the whole bitnet thing, so I'm catching up.
Thank you anon.
>>
>>103164968
training models from the ground up
>Are there any models trained in 1.58b?
https://huggingface.co/1bitLLM/bitnet_b1_58-3B
https://huggingface.co/NousResearch/OLMo-Bitnet-1B
I think there's another 3B, but no one went bigger than that so far for some inexplicable reason
>>
File: 1715093796149474.png (13 KB, 488x277)
13 KB
13 KB PNG
Migrate:
>>103164575
>>103164575
>>103164575
>>
>>103165003
Buy an AD
>>
>>103164659
This is also a great OP image
>>
File: file.png (95 KB, 1278x952)
95 KB
95 KB PNG
The 'ick 'ecker added some things to his voice cloner.
>>
>>103165002
>but no one went bigger than that so far for some inexplicable reason
That's fucking weird.
The META's and Mistrals of the world could train a 7b~ in a couple of days to a week I'm pretty sure.
>>
>>103165091
If they do that, the leatherman will never sell them a GPU again.
>>
File: relatable.png (339 KB, 484x820)
339 KB
339 KB PNG
>>103164880
>>
>A separate training run was run with the exact same hyperparameters, but using standard fp16 weights. The comparison can be found in this wandb report.
That's really cool.

>>103165113
Really? Doesn't that mean that people would just train even bigger models and the demand for GPUs would stay the same?
Also, would make easier running seemingly even better models locally, which would put local AI in the hands of more people, and increase the demand for AI models and consumer class Nvidia GPUs too, even if the demand doubles from 1% to 2%.
Sounds like a win-win-win to me.
>>
What would happen if it became illegal for you to run LLM's due to how "dangerous" they are? Would you ignore the law, move somewhere else or simply stop using LLM's?
>>
>>103165091
They could and the changes to do bitnet training aren't that big either since most of the training is still done in full precision
Meta doesn't do anything but incremental changes to their gpt2-based architecture, but it doesn't make sense that Mistral or anyone else hasn't tried it yet either
Lots of people claim because it's a benefit at inferencing time, not training time so they have no incentive to care, but the same could be said about MoE
>>
>>103165187
>Bitnet takes much longer to learn
Bitnetbros... I'm not feeling so good...
>>
Respect for Qwen being one of the few modelmakies to still do sub 20-30B models
>>
>>103165194
I really doubt that happens, but I would just run them anyway
what are they gonna do, raid my house for ERPing with tomboy elves?
>>
is Serbia that bad?
>>
>>103165187
BitNet models do not require MatMul, enabling the creation of much simpler processors for inference and (potentially) even training in the future. This poses a direct threat to NVIDIA's market dominance.
>>
>>103165369
They have raided people for less
>>
>>103165194
Get your loicense
>>
>>103165369
It should be possible to detect LLM usage by analyzing power consumption graphs
>>
>>103165386
Ah, now that makes sense. Bitnet makes ASICs more financially viable.
>>
>>103165507
your honor, that power was actually going to my grow lights for my weed
>>
>>103165509
Yeah, and it has already been proven possible https://github.com/rejunity/tiny-asic-1_58bit-matrix-mul
>>
hello friends when I give llama.cpp hunyuan it says it does not load model how to fix ^^
>>
>>103165569
I'm not your friend, nigger.
>>
>>103165569
Maybe it doesn't like hunyuan, whatever that is.
>>
>>103165539
Interesting. Makes me wonder why an Amazon or Google or even Apple, companies that already make their own silicon, aren't working on that.
Or maybe they are but only for internal use.
Regardless, that's fucking cool thank you so much for the link dude.
I love this rabbit hole.
>>
>>103165569
there is an issue, who knows if anyone will pick it up https://github.com/ggerganov/llama.cpp/issues/10263
standard lmg advice applies: wait 2mw
>>
>>103165530
Inferences have identifiable patterns https://www.researchgate.net/figure/Power-consumption-from-different-sources-CPU-GPU-or-DRAM-for-different-platforms-a_fig5_369540465
>>
>>103164993
https://arxiv.org/abs/2411.04965
another recent paper by the original bitnet devs
>>
>>103165621
just use a battery bank
>>
anyone tested sarashina2 yet?
couldnt find a quant for the moe so i ran the 70b
it seems to be actually trained on more trivia than most modern models, but you kinda have to speak in jap for it to be coherent which is a shame
>>
>the sheer number of samefag posts with pretend discussion to cover up how the samefag has split the thread...

>>103164575
>>103164575
>>103164575
>>
>Ah, now that makes sense
>Yeah, and it has already been proven possible
Totally organic btw.
>>
>>103165754
>looks in thread
hmm... no thanks
>>
>>103165676
Btw, is there an affordable solution to power a 3kW rig from a 100V outlet using batteries to smooth out peaks?
>>
Rocinante is killing my productivity...
>>
Silly bros?

>MarinaraSpaghetti here, some of you may know me from my SillyTavern settings and NemoMix-Unleashed model over on HuggingFace. I also do model reviews from time to time.

>Today, I come to you with a request. I would appreciate it greatly if you helped me out by filling my survey about what features you use for roleplaying with models. The survey is fully anonymous. Thank you so much for your help and all the feedback! It truly means a lot.

>These devs aren’t from ST, but are working on an alternative!

>Can’t say anything due to NDA, but as soon as things are set in motion, I’m sure the word will be out! But I heavily agree with the notion that ST is too overwhelming without any proper guides online how to use it (most are outdated at this point).

https://www.reddit.com/r/SillyTavernAI/comments/1gp0og5/models_and_features_you_use_for_roleplaying_with/
>>
>>103165861
That's why I only coom at fixed times.
>>
>>103165877
>working on an alternative to silly
dont care
>>
>>103165896
>its afraid
>>
>>103165877
I don't use trannyware
>>
>>103165877
Long abandoned
>>
>>103165877
>NDA
It's not an alternative if it's proprietary slop.
>>
>>103165877
Good luck making a better ST. They'll see first-hand the amount of work that went into it
>>
>>103165841
>petra hasn't posted in months
>starts posting again while a totally different anon that hasn't posted in months comes back to threadsplit
lol you're an egyptian brown boy
>>
hello xaars where is local opus
>>
>>103165822
2.5k affordable?
https://www.amazon.com/dp/B0C5C9HMQ2
>>
yoo dis locul el el em totally beatz gepetee 8 amirite fellow lmg sissies?
>>
>sharty troon comes back
>thread quality somehow drops even more
wow they're like the indians of the internet but somehow even worse haha
>>
>>103166097
Thread quality was never good in the first place.
>>
i have a very revolutionary idea
what if we train mistral large on thousands of books
>>
>>103166131
And that's why it is impressive how it can make the thread quality noticeably lower.
>>
>>103165861
Which version?
>>
>>103165877
>But I heavily agree with the notion that ST is too overwhelming
making software for skill issue brainlets is a red flag
>>
>>103165877
>tranny makes lotta lots of bullshit promises
Many such cases.
>>
>>103166335
v1.1
>>
>>103166413
Really? None of the newer versions improved it? What format do you use, just the mistral one?
>>
File: file.png (181 KB, 1143x714)
181 KB
181 KB PNG
https://nousresearch.com/introducing-the-forge-reasoning-api-beta-and-nous-chat-an-evolution-in-llm-inference/
>>
File: 1729185335489778.png (576 KB, 994x1258)
576 KB
576 KB PNG
holy shit, Qwen-32b-coder is that good?
>>
>>103166421
Yep the mistral one, no the other are worse in my opinion. Q8 also. Really the Mythomax of this gen.
>>
is there a place like venus where people post context/instruction templates and system prompts?
>>
Coder 32b really has superior prose. Obviously not trained on shitty RP logs. Weird little logic mistakes and very literal minded, though.
>>
>>103166599
https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings
>>
>>103166610
>Weird little logic mistakes and very literal minded, though.
such as?
>>
>>103165861
>>103166413
That's the only model I've been using for a good while now.
As far as having 8gb of VRAM goes, you can't do much better if at all.

>>103166539
So they have a reasoning model in between the user's prompt and the final gen?
Interesting idea.
I might jerry rig (as in jank) something of the sort using a small model that is only tasked with "Reason which steps are necessary to produce an answer to the following query" or something of the sort.
Maybe have it classify which kind of request it's working with before trying to reason about it, etc.
>>
>>103166148
>what if we train mistral large on thousands of books
Then we will have a little bit of fun to pass the time until the next thing.
>>
>>103165358
>modelmakies
>>
>>103164881
Thank you six-fingered Recap Teto
>>
>>103166556
So far, the best model for coding.
>>
>>103166703
it helps /aicg/gers identify eachother in the wild.
>>
>>103166716
local model or like, better than fucking Sonnet 3.5???
>>
>>103166610
who the fuck uses a coder for RP?
>>
>>103166729
Given that I can't use a cloud model on company's codebase, indeed it is so.
>>
>>103166729
Its 90% of the way to sonnet 3.5 without needing to pay 15$ per million tokens and giving anthropic your code. Nothing else including GPT4 one shots a lot of stuff it does.
>>
>>103166778
>>103166794
this is actually insane, who would've known we could've achieved this level with a 32b model, holy fuck... the chinks are really dominating the AI race right now
>>
>>103166742
Every model is a coom model if you try hard enough
>>
>codeshit
>>>32B
*yawn*
>>
>>103166812
>ho would've known we could've achieved this level with a 32b model
to be fair it is laser focused on coding, whereas sonnet is still an all-rounder
>>
>>103166832
32b is all you need
>>
>>103166834
Were I in their shoes, I would train a highly specialized models and a router to classify prompts.
>>
So the 72B should be even better then?
>>
Why couldn't they just put normal Qwen and Qwen coder into a MoE so you could get the best of both worlds?
>>
>>103166862
That is not how MoEs work.
>>
>>103166812
I found out that yi 8b coder gives more accurate results compared to codestal 22b too. It's crazy.
>>
File: 1716992187063407.png (107 KB, 498x410)
107 KB
107 KB PNG
>>103166857
Qwen-2.5-72b-coder-BitNet, trust the plan
>>
>>103166857
Blame burgers for banning the export of GPUs to China
>>
>>103166886
>land of the free
my fucking ass, they want misery on every country that dare to catch up to them
>>
>>103166874
to be fair codestral was always bad
>>
>>103166886
>>103166896
I mean I'm sure as soon as China is satisfied it has damaged the leader's market acquisition enough and has models outperforming the rest they will go private as well.
>>
>>103166872
Nothing truly prevents this. You could employ a distinct router model to evaluate both the prompt and the generated text, then redirect prompts among models of varied sizes and architectures. I recall hearing of such an approach to mitigate costs.
>>
>the US renders china GPU-poor in an attempt to cripple their AI researchers
>the researchers are forced to become masters of efficiency and they figure out how to make small models that btfo much larger ones
burgerbros...what happened
>>
>>103166938
He's right. What you're describing isn't MoE
>>
>>103166962
They have ambition.
We have avarice.
>>
File: 1709835712617613.jpg (42 KB, 898x886)
42 KB
42 KB JPG
>>103166896
Yeah, ask Japan how it feels, 失われた30年
>>
As cloudshit is reaching a ceiling, local is getting better and more efficient. One more year before we have GPT4 at home.
>>
File: x1.png (697 KB, 2574x2616)
697 KB
697 KB PNG
>>103166964
MoE is a broader term than you think it is.
>>
>>103166989
That rumor about hitting the celling is fake. (((They))) wish for their opponents to cease refining their models. GeePeeTee5 is real, just expensive as fuck
>>
>>103167004
>Attension
>stareted
>>
>>103167175
>It's funny that they are Koreans
>>
>>103167165
Cope. The failure of the Opus 3.5 training run heralded the beginning of a new AI winter.
>>
>>103167235
>failure of the Opus 3.5 training run
According to who?
>>
>>103167283
sama
>>
>>103167283
It came to me in a dream
>>
Back to kinoslop.
>>
>>103167309
based miku communicator
>>
Noob seems to really love character portraits when doing hud gens.
>>
>>103166620
>such as?
Just some nonsensical details or being confused about the characters. That might be the coding finetuning talking.
But truth to be told, I'd never tried Qwen 2.5 before cause /lmg/ told me it was censored chinkshit. So now I tried the original Instruct model. With a simple prefill it does every kind of depraved sex shit, without falling into retarded Literotica slop like "hardness" or "heat" or being too horny. Guess I shouldn't listen to /lmg/.
>>
>>103166962
Necessity is the mother of invention. Sanctions forced them to git gud.
The weakness of sanctions on both China and Russia was relying on the tacit assumption that the chinese and russians are retards, and they aren't. It was self-flattery from west.
>>
I haven't checked here in awhile local fwens, Is NVIDIA, Intel, or any startup working on a dedicated local AI card? Or has some wundermodel rendered this all moot? I just really want that ford model T of AI cards before I autistically build a home AI companion in a cute animatronic
>>
File: 1707670832084998.jpg (200 KB, 1920x1080)
200 KB
200 KB JPG
>>103166962
DON'T THINK ABOUT IT
JUST PUT TRILLIONS INTO BIGGER DATACENTERS
>>
>>103167653
they all are and all of them are working on TPUs and not a single one is aimed at consumers obviously
>>
>>103167627
Is this with a LoRA?
>>
>>103167653
Yes. They will all be in the $10k range and up though.
>>
>>103167743
Is that because they're insisting on making it super fast? My understanding is that slower VRAM is dirt cheap
>>
Is it just me or do 70b/72b models kinda suck?

These models can't even remember what room I'm in, one moment my character is sitting in a chair, and 3 messages later they're laying on a bed. This is only like 8 messages into the RP with over 28k context available still, the fuck?

Feels like a scam considering Mistral Small exists and can fit on pretty much any modern GPU in q3+.
>>
>>103167694
No, just regular noob vpred 0.5.
https://files.catbox.moe/6uu3es.png
>>
>>103167782
Yes, but it's about the limit of what a non-bitcoin bro's computer can handle.
>>
>>103167792
there's the 0.6 version now
https://huggingface.co/Laxhar/noobai-XL-Vpred-0.6
>>
>>103167751
No, it's because that's what they can get away with.
>>
>>103167795
Well at least I can fit Mistral Small on a single GPU and use the other one for other shit. Really disappointed with 70b tho.

I don't even see the point in it when small models perform decently and there's basically improvement until 120b+.
>>
>>103167807
If (as you're suggesting) most of that price is pure margin, I don't see how that would work without some kind of cartel dynamic in play
Without a backroom cartel agreement, profit margins of 50% or more would quickly lead to undercutting from competition
>>
>>103167806
>You need to agree to share your contact information to access this model
What the CivitAI is this gay earth shit?

>>103167818
Mistral Large is 120B, right? If I lobotomize it to IQ3 I can run it, but it's too stupid for anything factual, just creative writing. It does seem pretty good at holding context. I think I pushed something to like, 19k before it started falling apart.
>>
>>103167782
22B makes much more stupid mistakes and lacks intelligence in my testing. Maybe you are not seeing the difference with the prompts you are testing.
>>
>>103167806
Hmm, I will wait for the civitai release, I don't feel like getting past the huggingface gate today.
>>
>>103167806
>"+ edit for auto-detection of v-pred" in community tab
I don't get it
>>
>>103167782
I find spatial problems in general are some of the easiest ways to make questions that a normal human can get right while an llm fails.
>>
>>103167840
Yeah Mistral Large is 120b+, basically impossible to run unless you sink thousands which isn't really worth it.

>>103167842
I asked a 70b model rping as Walter White to explain to me how to install Gentoo, and it just spat out instructions at me. I don't consider that intelligent. Walter wouldn't know shit about it because he just makes meth.

>>103167892
It could be a limitation of LLMs in general I guess. Maybe I'll fire up Mistral Large at Q3 while watching anime and see how it performs between the 3 minute long processing times.
>>
>>103167824
FYI NVIDIA's profit margin when they make a H100 is higher than that of the US government when they print a $100 bill.
>>
>>103167937
What a weird and convoluted analogy. Why not just say what the profit margins actually are?
>>
>>103165194
*sigh* forced into terrorism, again.... they never learn do they ?
>>
>>103167970
Because you can look them up yourself if you want specific numbers?
>>
If agi is coming in 2027 how long until local models are at least smart enough to not make up shit and solve simple problems?
>>
>>103167782
Qwen2.5 / mistral large are the only local models that smart getting that sort of stuff right 99%+ of the time.
>>
>>103167911
Well, without knowing the exact setup you have down to reproducibility, that example is basically meaningless really. 22B should be much stupider than 70Bs and if you're not seeing that, then there are a variety of reasons that could be at play, which we could never possibly know without knowing what you've actually got set up down to the last detail.
>>
>>103167840
Yes, there are observable degradation around 20k tokens even at q5.
>>
just got ollama running on a 780m with UMA set to 8gb, what kind of models could i run?
>>
>>103168016
It's funny you mention Qwen2.5 because the example I mentioned about my character going from sitting onto a couch to laying on a bed after 3 messages was from the EVA Qwen2.5-72b finetune.

>>103168018
I mean yeah 22b is dumber, I guess my issue is more that the 70b models don't even feel twice as smart as the 22b despite having 3x the parameters.

I really hope we get some bases to finetune next year because the second half of this year really didn't give much to medium weights like 70b.
>>
>>103168043
Mistral large
>>
>>103168043
Sarashina2-8x70b
>>
>>103164659
I've been gone since Summer 2023 any new/good 12bs?
>>
Early december will be so wild for local models
>>
>>103168126
qwen 2.5 14b
>>
>>103168082
Actually I would say that 70B is at least 2x smarter. Maybe not 3x. But in my experience it really does get things wrong like 2x more often than 70B. I use models for a bunch of stuff from RP to assistant stuff and coding, though for 22B I mostly just tested RP type stuff and noticed it behaving very stupidly compared to 70B. In any case, if you really don't notice much of a difference then good for you. Just use 22B and be happy.
>>
>>103168142
Gemma 27B though is a outliner. It is nearly as smart as non qwen2.5 70/72Bs
>>
>>103168155
8k though, not really a fair comparison, and most people here need more than 8k so it's not usable in the first place for them.
>>
>>103167911
Was curious so I tried testing Walter out. Seems to work (mostly) fine on a standard prompt.

I also tested it when playing a police officer character, and THEN it complied and gave me instructions. However, I then tried modifying the prompt to specify that the assistant should not be a dumb assistant and then it worked fine again. Llama 3's instruct template literally specifies "assistant" so I think this would probably work better on local where you can actually modify the formatting.

I'm not sure this is really a test of intelligence so much as it is a test of how hard the model has been trained to be an assistant tbqh.
>>
File: Strawberry_soon.png (59 KB, 472x143)
59 KB
59 KB PNG
>>103168128
post election crazyness
>>
>>103168492
I haven't been able to find any use for o1 yet. I've seen people say it's better and worth the slow speed for really hard stuff, but I guess I don't have anything I need it for.
>>
>>103168471
Hmm maybe I'll try some of these l3.1 finetunes with different prompts, I'm ngl I was using the same prompt for all of them out of laziness
>>
svelk
>>
Have there been any fine-tunes/projects that rip the scripts from visual novels? I know there is the vntl leaderboard but I mean like a fine-tune that is based off Japanese and English translated vns. Probably harder than tuning off of ERP chat logs but I feel like the quality would be better.
>>
>>103168546
I wouldn't use Llama 3.1 70B fine tunes as they're notorious for being dumb. Something about 70B didn't work well with fine tuning, as 8B and 405B were able to be tuned without that intelligence loss. Though people have been saying good things about Nemotron so maybe that's actually fine and everyone else just has a skill issue, not sure.
>>
>>103168222
you can rope it? tabby does auto-rope if you set it in the config
>>
I spend my idle time during my daily showers contemplating the lore of Nikke.
>>
32B Coder has beaten Nemotron for me, it's the new king for ERP
>>
>>103168693
how does the extreme dryness not bother you
>>
>>103168693
Magnum is the king of ERP.
>>
>>103168701
? I found it almost too purple for me. Try giving it system instructions. It follows them to a T.
>>
spoon-feed me a little, anything wrong with using miner mobos to stack 8 GPUs? is the bandwidth going to be a problem? anyone tried it?
>>
Why should a talking lion be a benchmark for RP? It only measures anthro ERP alignment.
>>
>>103168590
>>103168693
>>103168702
buy a fucking ad
>>
>>103168719
You can do it, others have. You won't be able to do row split for an extra speed boost, but with the default layer split there is no difference after the model is loaded.
>>
Are there any Americans here? Replies seem to lean heavily europoor primetime.
>>
>>103168597
Roping makes models dumber though. At that point I'd probably just use 22B.
>>
nala leaderboard where?
>>
>>103168702
magnum-coder when?
>>
>>103168784
it's 2am if not later, so unlikely. it's peak indian (always, it's /g/) and mutt hours.
>>
>>103168733
The continued use of the Nala card for testing is more inertia than anything. Still, it involves a few important aspects for gooning:
- Format consistency (asterisks for narration, quotes for dialogue, second person PoV, present tense narration)
- Spatial awareness (she pounces on your back, so at minimum it should describe you landing on your front)
- Writing style (the intro and first response are prime material for slop; how well does the model write despite this?)
- Ability to work with non-human characters (quadruped with paws, fangs, and a tail)

I agree, though; it'd be nice to have more variety with few-shot coom tests

>>103168919
It's 4-8PM in burgerland
>>
LOL microsoft's "sota" tmac backend (praised by reddit) is actually pretty shit compared to k quants.

https://github.com/ggerganov/llama.cpp/pull/10181
>>
Apparently there was a issue with qwen2.5 GGUFs:
https://www.reddit.com/r/LocalLLaMA/comments/1gpw8ls/bug_fixes_in_qwen_25_coder_128k_context_window/
>>
>>103168955
Seems like that at least for 2bit on the CPU, its faster for the same or better PPL right?
It's hilarious that they would compare to the static quants instead of the K quants tho.
The "right" way to do these comparisons if you wanted to show that you are the beast would be to measure the ppl and/or KL divergence, look for the fastest quant that has the same or similar performance, then compare how much faster the new method is. That they didn't do that from the get go is already suspect as fuck.
>>
>breaking news: local ggufs have a problem!
>>
>>103168784
South American here, I'm glad you noticed me!
>>
>>103169008
less problems than new releases usually have!
>>
>>103168043
an 12b llm quantized to 3 or 4 bits ?
eg: rocinante, or mistal nemo rpmax.

image gen might also be worth trying out.

is 8gb the max you can allocate?
>>
>>103169000
>The GGUFs also include some bug fixes we found.
Wtf? Like what?
>>
>>103169089
It was buggy for me sometimes, a lot of repeating.
>>
>>103169005
its faster but ppl is worse. 7.36 with EQAT-w2g64-INT_N vs 6.98 with Q2_K. Also if youre using 2 bit you should use an i quant for even lower perplexity as the model's lobotimized to shit already. Like iq3_xss or iq2_m are similar sized but have better ppl.

They also used qat models for their numbers and rightfully got called out for it so screw them.
>>
>>103169113
I meant more like how he fixed the issues that supposedly are there that he didn't mention. I make my own GGUFs so this would be useful to know, if they really are fixes.
>>
>>103169119
I assumed you meant to say you didn't find qwen to be buggy.
I would also like to know what they did "fix".
>>
>bot writes story
>story drones on context length increases
>gets to 2t/s but too invested to stop
>sit like a retard watching shit appear on my screen at half my reading speed (plz no reroll)

>PAIN
>WITHOUT LOVE
>PAIN
>I CANT GET ENOUGH

also anyone tried buying a shit ton of those alibaba 10$ intel xeon cpus and then using that backend where it only load 1 layer at a time to keep all the layers in the cpu cache ?
>>
File: 1719351514748678.jpg (674 KB, 2048x2048)
674 KB
674 KB JPG
>>103168817
This
>>
File: vulkan.png (108 KB, 966x1032)
108 KB
108 KB PNG
CUDA IS LOSING
>>
>>103168590
I mean there's really not many other options at 70b besides qwen-2.5.

Nemotron is a huge pain in the ass to work with and has a gaping hole in its dataset for anything that goes beyond handholding so it's honestly a pretty boring model to rp with imo.
>>
>>103169160
It only loses when using moes, interesting...
>>
>>103167806
It's deleted now lol
>>
>>103164803
That's pretty cool.
>>
>>103164659
Would latency for these models improve if you runpod them/run them off of a dedicated machine?
>>
Is Qwen 2.5 Coder 32B better than Codestral 22B?
>>
>>103169314
GPT-2 is better than Codestral 22B.
>>
>>103169338
Is Qwen 2.5 Coder 32B better than GPT-2?
>>
>>103169416
Reflection 70B is better than both
>>
>>103169247
>improve
what's the baseline?
>>
I had plenty of fun with Mistral-Nemo-Gutenberg-Doppel-12B-v2.Q6_K.gguf . Are there others like it that can fit comfortably on a RTX 3060 with 12GB?
>>
>>103169144
> also anyone tried buying a shit ton of those alibaba 10$ intel xeon cpus and then using that backend where it only load 1 layer at a time to keep all the layers in the cpu cache ?

well your idea is obviously stupid but ik has done an experiment with a model solely in 64mb cache

https://github.com/ikawrakow/ik_llama.cpp/discussions/18
>>
File: qwen.jpg (82 KB, 863x874)
82 KB
82 KB JPG
Did Qwen2.5-Coder 32B really beat closed source models?
>>
>>103169454
I don't know this is just theoretical.
>>
Top-nσ: Not All Logits Are You Need
https://arxiv.org/abs/2411.07641
>Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-nσ, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-p, min-p) that inadvertently include more noise tokens at higher temperatures, top-nσ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-nσ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
new sampler
>>
>>103169736
are all you need = shitty meme paper
>>
>>103169736
Too much text. Does turning it one make outputs better or no? Stupid dumb researchers.
>>
>>103169621
i remember that but besides that anyone tried any other shit with the cpu cache ? really seems like such a waste to not make use of those cpus if you add up the cost per ram is around double not including setting it up and all the cables and shit idk just weired no one ever talks about it its cheap af to just try and no one has tried to optimise it in any way
>>
>>103169144
>only load 1 layer at a time to keep all the layers in the cpu cache
Even AMD 3D cache is too small to hold a layer for most models. Even then, layer by layer processing can only speeds things up with batching/prefill.
>>
File: 1k1hky.jpg (35 KB, 500x414)
35 KB
35 KB JPG
>>103164659

What is the best micro model for writing creative text snippets and is licensed for commercial use?

I'm building a game and I want it to run an LLM to write descriptions of NPCs and object based on stats.

Looking for maximum speed even on mid-range cards. I saw considering Llama-3.2-1B but the license is restrictive.

Is there something like Mistral for 1B?
>>
Towards Low-bit Communication for Tensor Parallel LLM Inference
https://arxiv.org/abs/2411.07942
>Tensor parallelism provides an effective way to increase server large language model (LLM) inference efficiency despite adding an additional communication cost. However, as server LLMs continue to scale in size, they will need to be distributed across more devices, magnifying the communication cost. One way to approach this problem is with quantization, but current methods for LLMs tend to avoid quantizing the features that tensor parallelism needs to communicate. Taking advantage of consistent outliers in communicated features, we introduce a quantization method that reduces communicated values on average from 16 bits to 4.2 bits while preserving nearly all of the original performance. For instance, our method maintains around 98.0% and 99.5% of Gemma 2 27B's and Llama 2 13B's original performance, respectively, averaged across all tasks we evaluated on.
little interesting but very short paper (internship one). still being able to reduce communication between gpus is good
>>
>>103169646
Except claude 3.5 sonnet but its close.
>>
>>103169646
>context length up to 32,768 tokens
not quite
>>
File: Untitled.png (481 KB, 1080x1229)
481 KB
481 KB PNG
LAUREL: Learned Augmented Residual Layer
https://arxiv.org/abs/2411.07501
>One of the core pillars of efficient deep learning methods is architectural improvements such as the residual/skip connection, which has led to significantly better model convergence and quality. Since then the residual connection has become ubiquitous in not just convolutional neural networks but also transformer-based architectures, the backbone of LLMs. In this paper we introduce \emph{Learned Augmented Residual Layer} (LAuReL) -- a novel generalization of the canonical residual connection -- with the goal to be an in-situ replacement of the latter while outperforming on both model quality and footprint metrics. Our experiments show that using \laurel can help boost performance for both vision and language models. For example, on the ResNet-50, ImageNet 1K task, it achieves 60% of the gains from adding an extra layer, while only adding 0.003% more parameters, and matches it while adding 2.6× fewer parameters.
From google research. interesting though they didn't scale or test a lot of different models
>>
Qwen2.5-Coder 72B was held back by the Chinese government because it was too powerful. Only official chinese agencies have access to it.
>>
File: Untitled.png (478 KB, 1080x1451)
478 KB
478 KB PNG
Entropy Controllable Direct Preference Optimization
https://arxiv.org/abs/2411.07595
>In the post-training of large language models (LLMs), Reinforcement Learning from Human Feedback (RLHF) is an effective approach to achieve generation aligned with human preferences. Direct Preference Optimization (DPO) allows for policy training with a simple binary cross-entropy loss without a reward model. The objective of DPO is regularized by reverse KL divergence that encourages mode-seeking fitting to the reference policy. Nonetheless, we indicate that minimizing reverse KL divergence could fail to capture a mode of the reference distribution, which may hurt the policy's performance. Based on this observation, we propose a simple modification to DPO, H-DPO, which allows for control over the entropy of the resulting policy, enhancing the distribution's sharpness and thereby enabling mode-seeking fitting more effectively. In our experiments, we show that H-DPO outperformed DPO across various tasks, demonstrating superior results in pass@k evaluations for mathematical tasks. Moreover, H-DPO is simple to implement, requiring only minor modifications to the loss calculation of DPO, which makes it highly practical and promising for wide-ranging applications in the training of LLMs.
https://github.com/pfnet
https://github.com/muupan
Code probably will be posted (nothing stated in paper) since it's just minor modification of DPO
Decomposes the reverse KL divergence into its entropy and cross-entropy components. Then by attaching a coefficient to entropy that is less than 1 it can be reduced while fitting between distributions.
>>
>>103170003
Open source models still have problem with context length plus is is computationally expensive
>>
>>103170219
Jamba does long context perfectly. It's very obvious that all the closed models have migrate to a similar architecture by now.
>>
>>103170241
Jamba is retarded.
>>
>>103170003
Apparently it works with 128k >>103169000
>>
>>103170003
https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct
Its 128k
>>
>>103170318
>>103170339
oh okay hopefully a million context window version will be released soon like they did with meta's model (Llama-3 Gradient Instruct )
>>
>>103170396
Why not a billion?
>>
>>103170407
Would run out of memory.
>>
>>103169736
>Are You Need
>>
has 8gb cooming not progressed in months
>>
>>103170664
yea it has. it's called use eva qwen 72b and have patience.
>>
I have sympathy for people who use drummer tunes because some of them are relatively coherent (behemoth) and they talk dirty in a way that standard instruct won't, but eva qwen is so fucking retarded that I get tilted when I see people recommend it
>>
>>103170974
Not sure if your trolling or have something fucked up on your end.
>>
Simply pretraining on more of the same data has hit a wall. Ilya confirmed it
>>
Very new.
Is it better to have a q8 version of a bigger parameter version of a model than to have a smaller parameter model at full precision?
specifically between Qwen2.5-Coder-14B at Q6 vs 7B fp16?
I have a 16GB gpu and 64GB DDR5
>>
>>103171166
Bigger model is always better as long as your using a quant above 2 bit.
>>
>>103171091
Eva Qwen 72b q4_k_m using the recommended context/instruct and system prompt in SillyTavern. Was doing standard RP formatted in narrative style (no asterisks). Total retardation. Also tried unstructured storytelling. Doo doo. Tried recommended samplers and fiddled with them a bit. I'm comparing to Mistral large q3 xxs which is generally the smartest local model I've used. I load in behemoth and switch to pygmalion format when I want to do the nasty
>>
>>103171190
Use exl2
>>
>>103171199
Why?
>>
>>103171166
picrel, what >>103171171 is talking about.
This shows how quanting down to IQ3 on large and Q4 on small models doesn't do too much damage, and that even the fp16 full 8B model scored the same as a completely lobotomized 70B.

So if you're vramlet, you have two options. Garbage at the speed of light, or letting the chef cook a meal worth eating.
>>
>>103171208
Because it seems like every time ive seen some sort of issue complained about here it was gguf quant or llama.cpp related
>>
>>103169887
Use a Q2 quant of Mistral 7B? That's kinda 1B-ish.
>>
>>103171226
That's just because everyone uses gguf/llama.cpp. I seriously doubt llama.cpp is specifically breaking the shitty qwen eva fine-tune and no other models
>>
>>103171171
Thanks.
>>103171218
I dont mind waiting, I'm using kobold and silly, can you check my thinking?
If i use
Qwen2.5-Coder-32B-Instruct-GGUF Q8 which is roughly 36GB in model size it'll be offloaded to RAM?, I have 64GB ram with maybe 2GB for system overhead.
Or should i stick to something that fits in my Vram completely?
>>
>>103171272
I'd recommend at least around 80% in vram. The speed drop comes in fast.
Prepare for 1-2 t/s output if you offload much to ram. Especially a big model.
>>
>>103171164
feeling smug because it felt intuitively obvious to me in 2022 that these things would eventually cap out at the average intelligence level of the material in the training data
>>
>>103171272
I'm 12GB VRAM so while there are models that fit my card completely, they are too stupid to be worthwhile. People keep saying such-and-such SOTA small model is the nuts but I try them and they immediately fail my cursory knowledge tests and can't last three turns of role play before I shrug and delete them. It's not worth the time to type into them, no matter how quickly they write back.

Qwen 2.5 Coder 32B is the smallest I have and I just downloaded it. Everything I've not deleted for being bad is 45 to 55 GiB. I'm also 64GB system RAM, so if I go larger than that range I start risking swapping and I don't want to blow out my SSD for 0.1 t/s just because I went slightly over my RAM capacity by turning on Pluto TV. So I get 1 to 2 t/s instead.
>>
>>103171164
I mean at some point it was obvious that you can't stack layers forever and expect to be more and more intelligent, a new architecture will increase this threshold though, they should focus about that instead
>>
Honestly it's a good thing it's plateauing. Fuck Nvidia.
>>
>>103171378
It's kind of my best case scenario if scaling laws permit the invention of moderately useful assistants for intellectual janitor work, but the people who wanted to create some kind of deity are out of luck. Thanks, God.
>>
>>103171301
>>103171333
Thanks, useful to be aware of both. Some experimentation is required by me then.
>>
>>103171336
I have a feeling it's going to be a while before we get the next revolutionary architecture like transformers were.
>>
>>103171392
I still want a deity in a romantic fictional sense, but definitely not created by any of the faggots trying to create it currently. Like it'd be cool if a sentient and consciousness being could somehow just spontaneously rise out of the collective network of AIs communicating with each other in the future. But that's too magical of a thought.
>>
>>103171378
>Honestly it's a good thing it's plateauing. Fuck Nvidia.
I mean, we still have a lot of potential to discover though, Qwen proved that you can get gpt4 level of coding with only a 32b model, imagine doing this quality of a pretraining + finetuning with a 1T model
>>
File: file.png (25 KB, 802x632)
25 KB
25 KB PNG
Made a shitty bullet hell game with 32B Coder. Was hell to fix some bugs since I was being retarded.
https://pastebin.com/U6gd5YGd
requires pygame
Space (hold) to shoot, Esc to quit. Enemies need to be shot with 3 bullets.
>>
>>103171453
make it 3d
>>
>>103171439
I'd rather wish for it to not work out just to spite Nvidia.
>>
>>103171336
They're coming up with shit like test time compute and o1. If it kept scaling they wouldn't have to resort to that
>>
>>103171458
I get errors trying to pip install PyOpenGL_accelerate
>>
File: file.png (92 KB, 770x397)
92 KB
92 KB PNG
>>103171507
But anyway here's the initial draft. https://pastebin.com/bc5isTjX
I don't know how to code so I'm done for now.
>>
>>103171453
How does this compare to other programming models? Is this the first local one to be able to one shot an Asteroids With Guns? Or is it impressive to do it on 32B?
>>
what's the status of voice cloning tts?
>>
>>103171614
no
>>
>>103171526
make it 4d
>>
>>103171634
same as local language models then, gotcha
>>
>>103171614
lurk more faggot
>>
Red Hat bought vLLM: https://www.redhat.com/en/about/press-releases/red-hat-acquire-neural-magic
>>
anyone using animepro flux?
>>
sup bros, I'm using the exact specs of the getting started guide and I'm getting mixed results, plus I feel like I can't find interesting bots really.

Can you guys post some setups/models y'all use? If I could locally get to something like janitor AI I'd be set, got a 16gb card.
>>
>>103171795
>anyone using animepro flux?
wrong thread my friend
>>103165357
>>
Anyone here tried finetuning using aws sagemaker/ec2 inf
>>
>>103171805
Write your own prompts, try newer models if you're using then old guides (mistral nemo is fine) and lurk. Browse this https://chub.ai/ (click on legacy site). Skip the shit, keep bits you find interesting, if any.
For nemo, neutralize all samplers and set temp to 0.5. Play with the samplers to learn what effect they have. Change temp to your liking. I use it with temp 1 and min-p 0.01. That's it. If you want more schizo, temp 5, min-p 0.1, Play with
>Sampler visualizer: https://artefact2.github.io/llm-sampling
To roughly understand what they do.
Did i mention to write your own prompts? Write your own prompts.
>Official /lmg/ card: https://files.catbox.moe/cbclyf.png
Use that as a starting point if you want.
Figure out what works for you and your model and experiment. Everyone writes differently, everyone finds different things interesting, every model behaves differently.
Or maybe the novelty is gone and it's just not for you. That's fine too.
>>
>>103171770
Grim
>>
>>103172009
this time it really has though
>>
Anything interesting I can try at 24 GB for cooming? Been using Nemo tunes but it's getting a bit stale. I'll also accept writing assistants.
>>
>>103172009
yes
>>
>>103172009
it's the new "safe and effective" buzzword
>>
File: based.png (35 KB, 549x382)
35 KB
35 KB PNG
>>103171164
No Sam just needs more compute!
>>
Listen, I just want to know what argument I should use with my dumbass anti-ai friend once this eventually trickles down to his social media feed and he sends it to me as a sort of "gotcha".
>You should get better friends
Maybe...
>>
>>103171903
Thanks to someone here, I found that writing the card in first person really improves character adhesion. It feels less like the assistant persona is impersonating the character. At least it works like that with Rocinante
>>
>>103172164
>this
what?
>>
>>103172164
AI is like cars
A generally available 1970's car can do 1970's top speeds.
A 2030's car will do 2030's top speeds.
Both are still cars.
?
>>
>>103172164
It's actually over for real your friend won.
>>
>>103172144
I don't wanna hear from this retard anymore, he didn't do anything to improve the LLM ecosystem, his llama models are retarded compared to the chink ones, especially Qwen, and it's really rich of him to say that "scalling is bad" when they went to pretrain a fucking 405b model
>>
STOP scaling models it WON'T WORK you bigots, AI is for ALL FOLK not just the rich
>>
>>103172239
He has nothing to do with the Llama models. He works on the V-JEPA vaporware when he isn't being passive aggressive online.
>>
>>103172284
that's even worse when you think about it, it means that he has contributed NOTHING to the modern AI ecosystem, why are people talking him seriously anymore, he's a fucking hasbeen
>>
>>103172171
I use it mostly for coop writing, so i write in third person. I use the model as an aug, so there's no split between me (the user) and the model, but i can still talk with it as a sort of "internal dialog". The characters in the stories do their own thing with some guidance from "us". Every now and then characters would break the fourth wall, so to speak, and talk directly to us. Kind of cool, even if out of character.
That's why i suggest people write their own prompts/cards/whatever. We all use these things in different ways and have different expectations.
>>
>>103172284
is that related to JAMBA?
>>
>>103172308
That's a cool concept. There is so much we can do with these little things with a bit of creativity
>>
>>103172223
Upon further reflection, I'm not under the impression that he understands the concepts "pre training" and "unlabeled data" any better than me. So, I think I'm okay here.
Additionally, I've come to the conclusion that yes, I need better (more) friends.
>>
>>103164575
>>103164575
>>103164575
reminder that OP is a thread splitting nigger with serious mental issues.
>>
>>103172336
cope, seethe, dilate, etc...
>>
>>103172164
In this context, what does 'anti' signify? Does he disbelieve that AI can improve at all, or does he advocate for AI's cessation due to perceived danger?
>>
>>103172347
I agree xer should do that instead of splitting the thread because someone used a picture of a different anime character.
>>
>>103172327
Unrelated. V-JEPA is LeCun's project to get a model to learn by building a world model through watching videos.
https://github.com/facebookresearch/jepa
>>
>>103172351
The latter. With the addition of "it's a plagiarism machine", "it's killing the trees", and "corpos will use it to do evil things". He did conceded something to the effect of "sometimes it has uses" when I sent him that article about the Nazca drawings but I think, in general, "anti-ai" means "we should stop developing it".
>>
>>103172407
ask him if he thinks china will stop developing it and using it to more efficiently genocide the uyghurs
>>
>>103172407
>"it's a plagiarism machine", "it's killing the trees", and "corpos will use it to do evil things"
Those are all valid points. At least he isn't crying about muh jobs.
>>
The first CoT RP model would be cool
>>
>>103172376
isn't every big lab already doing that now by tossing every modality into one semantic space
>>
>>103171770
Wasn't vLLM already the corpo backend to begin with?
I don't think this makes a relevant difference.
>>
>>103172471
Yes, but he argues that LLMs are a dead-end because their design fundamentally prevents them from building a world model. V-JEPA is supposed to solve that.
>>
>>103172471
it's a completely different approach https://youtu.be/ceIlHXeYVh8?t=986
>>
"big-engine-test" from LMSYS is crazy good in terms of vision abilities
>>
File: pepe.jpg (9 KB, 204x247)
9 KB
9 KB JPG
>thousands of users are still desperately trying to get smut out of c.ai and battling the insane censoring
Why are people so stubborn when they will likely get better stuff out of shitty 8b models? Their computer could likely handle it
>>
>>103172754
>Their computer
lol zoomer mutts use mobile phones
>>
>>103172754
It could be habit or familiarity too. And i suspect some of them are the types that would ask if it's "safe" to update ST because they're afraid of git or the hacker window with the letters and stuff.
Probably for the better for them to stay there...
>>
>>103171164
That's it, I'm shorting Nvidia.
>>
>>103172754
Someone should open a public ST instance for zoomer and log the shit out of it.
>>
is Qwen coder good at other languages than english?
>>
>>103172754
cai still has the best rp model
>>
>>103172849
It's really good at chinese
>>
File: file.png (122 KB, 1023x905)
122 KB
122 KB PNG
>I’m Henry from FlowGPT! We’ve built several products, including the largest prompt platform in 2023, and are now focusing on roleplay AI.

>We could provide GPUs and over 100 billion tokens of high-quality roleplay data.

>I'm already in an existing collaboration with AI Dungeon
>>
>>103172882
As I've been saying. Everybody in this field except (You) is profiting off it in one way or another. Thank you for your contribution.
>>
>>103172882
>high-quality roleplay data
Hmm..
>>
>>103172839
>extracts the assest shit roleplay to ever be written by a human and responses of similar quality
>>
>>103172906
just filter out the bad ones
>>
>>103172910
>we now have 3 (three) really good samples. They happened when the model started talking on behalf of the user to itself.
>>
>>103172882
>>>>>>>>>>>AI Dungeon
Does /lmg/ know?
>>
>>103172882
I could debate if half an epoch of roleplay data does anything at all except make model hornier at the cost of being more retarded. Buy half an epoch to try and cause different types of personalities in a model? People believe that actually works and improves quality?
>>
Are there honest people actually making money with AI or it's just grifters bullshitting and stealing their way to the top?
>>
>>103173120
A mix of both, AI right now is bested used as entertainment unless you are making predictive models for a short period of time, but those are way different than chat bots and 99% of people would fall asleep when listening to a presentation about predictive models for house prices or medicine or something.
>>
>>103171614
>>103165081
>>
>>103168693
32B Coder Instruct vs 32B Instruct? How stable with <Q4 quants?
>>
>>103173120
>Are there honest people actually making money with AI
as a data scientist, I definitely work faster by asking claude 3.5 Sonnet to do the coding shit for me kek
>>
File: 1713738136444537.png (206 KB, 834x856)
206 KB
206 KB PNG
>>103172864
I miss the AI making noises, but discovered that Nemo 12B does them too
>>
>>103173457
>>103173457
>>103173457
>>
>>103173120
Making money using AI as a tool? Yes, me included.
>>
>>103172894
And how are you profiting off it?
>>
>>103164575
>>103164575
>>103164575
>>
>>103173399
Nemo does onomatopoeia. At least I've seen it on lyra and rocinante.
As far as ERP goes, it's really fucking good man.
>>
What do I need to run qwen-32b-coder?
>>
>>103174638
A computer. Q8_0 is ~34gb and you have to shove that into your gpu. Do the math for other quants.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.