Hacker News
MiniMax teased M3 Sparse Attention: 9.7x prefilling, 15.6x decoding at 1M
https://twitter.com/SkylerMiao7/status/2059285750458544561
Comments