I don't think anybody is doing attention-based models right, they're all just recalculating attention without any actual improvement.