Abstract: Transformers, especially the decoder-only variants, are the backbone of most modern large language models. Yet, we have a very limited understanding of their limitations (i.e., what tasks ...