ok so apparently rethinking the transformer block architecture can lead to massive perf gains? https://arxiv.org/abs/2605.19269