Abstract: Transformers, especially the decoder-only variants, are the backbone of most modern large language models. Yet, we have a very limited understanding of their limitations (i.e., what tasks ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results