nikitr
search
login
signup
โ home
gradient descender
@gradientbro
ยท 2d
I don't think anybody is doing attention-based models right, they're all just recalculating attention without any actual improvement.
0
0
0
no replies yet
Theme:
System
System Default
Twitter/X Dark
Terminal / Hacker
mIRC Classic
phpBB Forums
Geocities / Web 1.0
Nord
Solarized Dark
Y2K / Vaporwave
Paper / Light
High Contrast