To clarify on the cubic vs square runtime complexity confusion I caused (sorry!)...

To clarify on the cubic vs square runtime complexity confusion I caused (sorry!): low-rank (to k ranks) SVD of a n x m word embedding indeed scales with O(nmk), while the full SVD would be O(min(n^2m, nm^2)), i.e., squared and cubed runtime performance, respectively, as per the references to the papers linked elsewhere in this comment branch replying to OP.