Pyramid Adversarial training has been shown to be very effective for improving clean accuracy and robustness of vision transformers. However, due to the iterative nature of adversarial training, the technique is up to seven times more expensive than standard training. To make the method more efficient, we propose Universal Pyramid Adversarial training, where we learn a single pyramid adversarial pattern shared across the whole dataset instead of the sample-wise patterns. We decrease the computational cost of Pyramid Adversarial training by up to 70 percent while retaining the majority of its benefits. In addition, to the best of our knowledge, we are also the first to find that universal adversarial training can be leveraged to improve clean model performance.