This is exactly the kind of tuning we need to push program synthesis forward. People are sleeping on post-training methods and it shows in their model performance. Game changer if these results hold up.
https://www.reddit.com/user/iamjasonfeng
1
0
0