As to [1]: Yes, I was not honest in the sense that non-standard SVD implementations for generating your PMIs will scale with the square of |V|, not the cube. But as I will go on to show, that is not good enough to make count-based approaches competitive to predictive ones.
Re. [2], these measurements by Radim have several issues; First, word2vec is a poor implementation, CPU-usage wise, as can be seen by profiling word2vec (fastText is much better at using your CPUs). Second, even Radim states there that his SVD-based results are significantly poorer than the w2v embeddings ("the quality of both SPPMI and SPPMI-SVD models is atrocious"). Third, Radim's conclusion there is: "TL;DR: the word2vec implementation is still fine and state-of-the-art, you can continue using it :-)".
So I don't really get your points. Instead of referencing websites and blogs, lets take a deeper look at a "proponent" for count-based methods, in a peer-reviewed setting. In Goldberg et al.'s SPPMI model [1,2] they use truncated SVD. (FYI, that proposed model, SPPMI, is what got used in Radim's blog above.) So even if you wanted use SPPMI instead of the sub-optimal SVD (alone), you would first have to find a really good implementation of that, i.e., something that is competitive to fastTest. Also note that Goldberg only used 2 word-windows for SGNS in most comparisons, which makes the results for neural embeddings a bit dubious. You would typically use 5-10, and as shown in Table 5 of [2], SGNS is pretty much the winner on all cases as it "approaches" 10-word window. Next, I would only trust Hill's SimLex as proper evaluation targets for word similarity - simply look at the raw data of the various evaluation datasets yourself and read Hill's explanations why he created SimLex, and I am sure you will agree. "Coincidentally", it also is - by a huge margin - the most difficult dataset to get right (i.e., all approaches perform worst on SimLex). However, SGNS nearly always outperforms SVD/SSPMI on precisely that set. Finally, even Omar et al. had to conclude: "Applying the traditional count-based methods to this setting [=large-scale corpora] proved technically challenging, as they consumed too much memory to be efficiently manipulated." So even if they "wanted" to conclude that SVD is just as good as neural embeddings, their own results (Table 5) and this statment lead us to a clearly different conclusion: If you use enough window size, you are better off with neural embeddings, particularly for large corpora. And this work only compares W2V & GloVe to SVD & SPPMI, while fastText in turn works a lot better than "vanilla" SGNS and GloVe. What I do agree with is that properly tuning neural embeddings is a bit of a black art, much like anything with the "neural" tag on it...
QED; This article is horseradish. Neural embeddings work significantly better than SVD, and SVD is significantly harder to scale to large corpora. Even if you use SPPMI or other tricks.
I don't understand what you mean by "non-standard SVD implementations." No SVD implementation is going to compute more singular vectors than you ask it to. It is neither cubic time, as you first said, nor quadratic time, as you now say. The dimension-dependence is linear.
ADDENDUM: To which I should add, to avoid more discussions, that parallel methods on dense matrices exist that essentially use a prefix sum approach and double the work, but thereby decrease the absolute running time [2]. However, as that exploits parallelism and requires dense matrices, that does not apply to this discussion.
Finally, to tie this discussion off, two truly official references that explicitly address the issue of runtime complexity.
In the best case, as determined by Halko et al., you low-rank k approximation of a n times m term-document matrix is O(nmk), and randomized approximations get that down to O(nm log(k)) [1]. And, according to Rehurek's own investigations [2], those approximated eigenvectors are typically good enough. I.e., in both cases, the decomposition scales with the product of documents and words, not their sum. Therefore, this is clearly not a linear problem.
On top of that, when these inverted indices grow too large to be computed on a single machine, earlier methods required k passes over the data. These newer approaches [1,2] can make do with a single pass, meaning that the thing that indeed scales linearly here is the performance gains of scaling your SVD among a cluster with these newer approaches. Maybe this is the source of confusion for some commenters here.
To clarify on the cubic vs square runtime complexity confusion I caused (sorry!): low-rank (to k ranks) SVD of a n x m word embedding indeed scales with O(nmk), while the full SVD would be O(min(n^2m, nm^2)), i.e., squared and cubed runtime performance, respectively, as per the references to the papers linked elsewhere in this comment branch replying to OP.
Re. [2], these measurements by Radim have several issues; First, word2vec is a poor implementation, CPU-usage wise, as can be seen by profiling word2vec (fastText is much better at using your CPUs). Second, even Radim states there that his SVD-based results are significantly poorer than the w2v embeddings ("the quality of both SPPMI and SPPMI-SVD models is atrocious"). Third, Radim's conclusion there is: "TL;DR: the word2vec implementation is still fine and state-of-the-art, you can continue using it :-)".
So I don't really get your points. Instead of referencing websites and blogs, lets take a deeper look at a "proponent" for count-based methods, in a peer-reviewed setting. In Goldberg et al.'s SPPMI model [1,2] they use truncated SVD. (FYI, that proposed model, SPPMI, is what got used in Radim's blog above.) So even if you wanted use SPPMI instead of the sub-optimal SVD (alone), you would first have to find a really good implementation of that, i.e., something that is competitive to fastTest. Also note that Goldberg only used 2 word-windows for SGNS in most comparisons, which makes the results for neural embeddings a bit dubious. You would typically use 5-10, and as shown in Table 5 of [2], SGNS is pretty much the winner on all cases as it "approaches" 10-word window. Next, I would only trust Hill's SimLex as proper evaluation targets for word similarity - simply look at the raw data of the various evaluation datasets yourself and read Hill's explanations why he created SimLex, and I am sure you will agree. "Coincidentally", it also is - by a huge margin - the most difficult dataset to get right (i.e., all approaches perform worst on SimLex). However, SGNS nearly always outperforms SVD/SSPMI on precisely that set. Finally, even Omar et al. had to conclude: "Applying the traditional count-based methods to this setting [=large-scale corpora] proved technically challenging, as they consumed too much memory to be efficiently manipulated." So even if they "wanted" to conclude that SVD is just as good as neural embeddings, their own results (Table 5) and this statment lead us to a clearly different conclusion: If you use enough window size, you are better off with neural embeddings, particularly for large corpora. And this work only compares W2V & GloVe to SVD & SPPMI, while fastText in turn works a lot better than "vanilla" SGNS and GloVe. What I do agree with is that properly tuning neural embeddings is a bit of a black art, much like anything with the "neural" tag on it...
QED; This article is horseradish. Neural embeddings work significantly better than SVD, and SVD is significantly harder to scale to large corpora. Even if you use SPPMI or other tricks.
[1] https://papers.nips.cc/paper/5477-neural-word-embedding-as-i... [2] https://www.transacl.org/ojs/index.php/tacl/article/view/570