The encoder's embedding is contextual, it depends on all the tokens. If you pull...

The encoder's embedding is contextual, it depends on all the tokens. If you pull out the embedding layer from a decoder only model then that is a fixed embedding where each token's representation doesn't depend on the other tokens in the sequence. The bi-directionality is also important for getting a proper representation of the sequence, though you can train decoder only models to emit a single embedding vector once they have processed the whole sequence left to right.

Fundamentally it's basically a difference between bidirectional attention in the encoder and a triangular (or "causal") attention mask in the decoder.