[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


No LLM talk. Actual straight up machine learning.

I want a real person to talk to about this, share ideas, how can I find people IRL or here or maybe discord or IRC.

I am struggling hard right now with hyperparameters, how did you cheaply find yours, I am so upset right now, what better GPU can I get my 3090 is eating shit, each update takes 1 second, 0.6 of that second is CPU encode, is it worth moving encode to GPU? is this idea retarded?

Worst of all is the CPU bottleneck, rollout takes like 100 seconds with 12 agents. I have a 5900, and I wish someone told me logical cores are a meme when it comes to real work, what's very important is per core performance and this CPU eats shit, so much so that I am beginning to think about a hardware update, but for now I need to get by with shit I have.

would be cool to hear peoples ideas, especially around communication, maybe there is like a support club where we can sit in a circle and vent. I am in nyc if anything.

early shalom shabat
>>
One of the most retarded and painfully underage threads in recent memory.
Off yourself kid
>>
>>108775508
don't bully me retard, if your IQ is not ML-tier you should be on /pol/
>>
>>108775482
Assuming you mean preprocessing with "encoding", then how about you use your brain and fix your pipeline? You already said it takes 60% of the time, which it shouldn't. Ever.

Also, i highly doubt core count matters for this at all. Logical cores - aka a partially duplicated pipeline - are there to better use the physical core and are of course highly application dependent in their usefulness.
But that really doesn't matter, you're going to be massively bottlenecked by your memory subsystem anyway. If you're unaware of this simplistic fact of modern architectures, you might have skipped a couple of steps in your overeager attempt to play around with ML tools. It's a bit of a meme, but you need better fundamentals if you want to properly utilize the hardware you got. If you want to skip that then you need to have sufficient funds to offset your deficiency in that regard.
>>
>>108775574
yeah, my fundamentals are bad, and while it makes sense for most people, I cannot operate that way, i can only learn by jumping neck deep and figuring my way out...

60% may be bad, but maybe i could give some context as to why that is, with my weights loaded and relatively modest batch of ~350, I am at 24GB VRAM (not spilling into system ram there's like 300MB buffer).

so 25 updates per iteration for example take around 25-27 seconds, half of it is encoding, but whole actual iteration is around additional 100 seconds just to sample data. I tried doing synchronous, but it bottlenecks the GPU and final clock time is still the same

for some reason talking to ai does not help with any of this, I got stable everything except I cannot figure out a valid environment for hyperparameter search, I want to do it cheaply, but i cant, I am trying to validate several different configs for evaluation, but they suck, I tried frozen buffer to prevent rollouts - results are garbage, online buffer appears to be stronger
>>
>>108775482
You might want to provide more details on what kind of model you are training? I work in computer vision, not sure why you are hitting cpu bottle necks, but typically I train/test on very small subset of data to optimize hyperparameters. Also, how do you people even continue to post with such horrible captcha system, it's even worse than it used to be, you guys must be some serious no life losers.
>>
>>108775482
>I am struggling hard right now with hyperparameters, how did you cheaply find yours
Shouldn't you be able to put your hyperparameters in autograd like your normal parameters and let it find them for you :-)
>>
>>108775539
you can read few books from the past (before mid 2ks) which have not been read by these current 'ai' wizards nor by you
hence he got a point
>>
>>108775482
>Actual straight up machine learning.
>I want a real person to talk to about this, share ideas, how can I find people IRL or here or maybe discord or IRC.

You dont have to lurk 2 years anymore. literally talk and let any gpt teach you absolute basic until you are up to the level people bother listen to you
>>
File: murphy.jpg (340 KB, 2443x3093)
340 KB JPG
get to reading
>>
>>108776979
this is old as shit. why would you recommend something so dated you clearly dont know enough about the subject. God if you're honestly not trolling you're probably some pretentious wannabe academic fuck that probably got a masters but was too pussy to get a phd and even then you probably can't do shit on pytorch

Holy fucking shit the deep learning section is like reading walls in ancient ruins

OP you might find some O'Reilly books useful just for general fundamentals, they skip a lot of academic jargon and are usually written by professionals in the field. They're also relatively easy to find on websites like libgen

AI and ML for Coders in PyTorch
Deep Learning with PyTorch, Second Edition

if you're looking to get yourself employable and prep for interviews where they're looking for more like models in production then i would strongly suggest Chip Huyen's ML book and AI Engineering book
>>
>>108777168
Not him, but there was a rerelease and update recently-ish, in 2022-2023.
https://probml.github.io/pml-book/book1.html
https://probml.github.io/pml-book/book2.html
>>
>>108776053
my captchas are easy they are hard when you are bad poster

my model is not vision its weather related, aren't you worried about overfitting your hyperparameters with a small dataset?

my dataset is like 3GB npy file with some metadata files fp16, i do 12 rollouts 2000 steps, the size of dataset is to accommodate several years but obs window is one year, if i lower it i would be overfitting hard, not sure thats good idea..

>>108776060
>hyperparameters in autograd

i dont think i can do that i asked gpt, it needs like fixed-point assumptions or some bs which is incompatible here, also my outputs are discrete

>>108776077
i only read AI outputs man

>>108776420
lol I think i am close, i think this is the last frontier for me, next challenge will me machine vision, i do not really get it right now, I know a lot about TCN dilutions etc but nothing about CNN.
>>
>>108777168
not looking to be employed that would distract me from my projects, consult maybe, I was a PM, i got tired of being diplomatic

I am using pytorch ill look at what you are suggesting, i just really cannot operate this way, i go to the goal, what i want, and work my way backwards, ultra-agile. thanks for the suggestions i have some research papers saved ive been meaning to read i feel like they have better information, this one is opened in the adjacent tab
>Evaluating hyperparameter optimization on the generalization of deep reinforcement learning algorithms
>>
I guess my issue is the experimental nature of hunting the cheap setup that can do an effective search,

I have used Sobol method, 256 configs 6 iterations, frozen replay buffer to avoid expensive rollouts, that blew. So right now I'm cross-validating those results with a live replay buffered one, i took 6 best configs, and 6 rejected configs, and I am running them head to head, but this time 60 updates, live buffer, to see if my Sobol was set up correctly, if all 12 pass, then I can continue using the cheap method, if they fail then I have to switch things up, maybe run a 64 grid instead of 256, use online buffer which will cause my iters be 170s long, and only do like 20 iters max, i think with that setup i could do each exploration under 24 hours long.

I have no idea how you guys don't have issues with CPU bottleneck, my GPU is basically idle for 100 seconds, then pegged at 100% for 60 seconds, and like i said i tried to run asynchronously, wallclock didn't change so i went back to sequential. I am as optimized as I can be, and I optimized for max GPU utilization.

my ratio is just under 1:1 if you are not bottlenecking you might be overtraining on stale data like 2:1 etc.
>>
When you guys do your hyperparameter convergence, do you experience that your alpha converges first, then your lr and tau, and target entropy with gamma converge last, or is it all at once for you?
>>
Damn this thead is a mess, just read 10 books before posting. You remind me of a cargo culting indian on meth. Or a Markov chain bot.

t. ML postdoc

t.
>>
Machine learning is a range so wide with so much ugly and inefficient obscure python code that nobody will be able to give you tips and tricks unless you are asking some really basic noob questions.
Unless you have somebody actually working with you, you will have to figure out everything on your own.
I know you said no LLMs but I am vibecoding a project somewhat adjacent to ML, finetuning LLMs when the whole model doesn't fit on a GPU by streaming layers in and out of the GPU.
>>
Start with the operating system.
https://github.com/EmptyMonad/bootnn
>>
>>108777892
i'm way beyond that

>>108777870
kek

>>108777883
this is how i feel extremely alone, this reminds me to when i did research in grad school, first few months my phd was guiding me and telling me what to do, often helping me out with common sense until it clicked and then i was able to stand on my feet, and right now I have to do it alone



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.