friendship with hip 6.4 release is OVER. TheRock nightlies are my friend now
flash attention working on rocm
INFO <dinoml.backend.rocm.builder_cmake> Executing "C:/Program Files/CMake/bin/cmake.EXE" -D CMAKE_PREFIX_PATH="C:/TheRock/" -D CMAKE_CXX_COMPILER="C:/TheRock/bin/hipcc.exe" -DCMAKE_RC_COMPILER="C:/Program Files (x86)/Windows Kits/10/bin/10.0.22621.0/x64/rc.exe" -D CMAKE_BUILD_TYPE=Release -D GPU_TARGETS="gfx1201" -B "tmp/flash_attn_sdpa/build" -S "tmp/flash_attn_sdpa" -G "Ninja"
INFO <dinoml.backend.rocm.builder_cmake> Executing cmake --build tmp\flash_attn_sdpa\build --config Release
INFO <dinoml.compiler.compiler> compiled the final .so file elapsed time: 0:00:16.619116
FlashAttention matches Torch SDPA
also first full model build on amd, sd 1.5
my pytorch nightly isn't working for some reason just hangs but i see from sd.next's benchmarks that 9070xt on windows is getting ~17it/s (i'll try 2.9.1+rocm7.11.0a20260103 that it mentions)
dinoml is at 27it/s and there's definitely a lot of performance left on the table