This paper is one of the more interesting takes on context extension I have seen in a while because it challenges the assumption that we need explicit positional encodings during inference. The authors make a case that embeddings like RoPE act more like scaffolding during construction rather than a permanent load bearing wall. The idea is that these embeddings are crucial for getting the model to converge and learn language structure initially, but they eventually turn into a hard constraint that prevents the model from generalizing to sequence lengths it has never seen before.