As to [1]: Yes, I was not honest in the sense that non-standard SVD implementati...

greeneggs · on Oct 22, 2017

I don't understand what you mean by "non-standard SVD implementations." No SVD implementation is going to compute more singular vectors than you ask it to. It is neither cubic time, as you first said, nor quadratic time, as you now say. The dimension-dependence is linear.

fnl · on Oct 29, 2017

Nope, it's not linear [1].

ADDENDUM: To which I should add, to avoid more discussions, that parallel methods on dense matrices exist that essentially use a prefix sum approach and double the work, but thereby decrease the absolute running time [2]. However, as that exploits parallelism and requires dense matrices, that does not apply to this discussion.

[1] https://link.springer.com/chapter/10.1007%2F978-1-4615-1733-...

[2] http://www.netlib.org/lapack/lawnspdf/lawn283.pdf

fnl · on Oct 29, 2017

And here is a reference to what I mean by non-standard methods.

http://sysrun.haifa.il.ibm.com/hrl/bigml/files/Holmes.pdf

fnl · on Oct 29, 2017

Finally, to tie this discussion off, two truly official references that explicitly address the issue of runtime complexity.

In the best case, as determined by Halko et al., you low-rank k approximation of a n times m term-document matrix is O(nmk), and randomized approximations get that down to O(nm log(k)) [1]. And, according to Rehurek's own investigations [2], those approximated eigenvectors are typically good enough. I.e., in both cases, the decomposition scales with the product of documents and words, not their sum. Therefore, this is clearly not a linear problem.

On top of that, when these inverted indices grow too large to be computed on a single machine, earlier methods required k passes over the data. These newer approaches [1,2] can make do with a single pass, meaning that the thing that indeed scales linearly here is the performance gains of scaling your SVD among a cluster with these newer approaches. Maybe this is the source of confusion for some commenters here.

[1] https://authors.library.caltech.edu/27187/

[2] https://link.springer.com/chapter/10.1007%2F978-3-642-20161-...

fnl · on Oct 30, 2017

To clarify on the cubic vs square runtime complexity confusion I caused (sorry!): low-rank (to k ranks) SVD of a n x m word embedding indeed scales with O(nmk), while the full SVD would be O(min(n^2m, nm^2)), i.e., squared and cubed runtime performance, respectively, as per the references to the papers linked elsewhere in this comment branch replying to OP.