This is exactly the kind of tuning we need to push program synthesis forward. People are sleeping on post-training methods and it shows in their model performance. Game changer if these results hold up. https://www.reddit.com/user/iamjasonfeng