top_indices = ops.cast()(ops.arange(top, top + height, 1)(), "int64")
left_indices = ops.cast()(ops.arange(left, left + width, 1)(), "int64")
spatial_pos_embed = ops.index_select(dim=1)(spatial_pos_embed, top_indices)
spatial_pos_embed = ops.index_select(dim=2)(spatial_pos_embed, left_indices)
figured out a workaround for dynamic_slice not supporting intvar, nice. needed it for jointattention as well. should probably make arange able to directly return the required type though, its partly implemented already
in the first case it also needed a workaround for dim name deduplication, arange codegen needs the named dim to calculate the symbolic value but the input dim name for height/width is deduplicated because output uses those dims, so i just name the input dims with the output name
couple other issues fixed too
<aitemplate.backend.cuda.builder_cmake> Executing msbuild "tmp/stable-diffusion-3/build/stable-diffusion-3.sln" -m /property:Configuration=Release
<aitemplate.compiler.compiler> compiled the final .so file