PredictionIO – an open-source ML stack built on Spark, HBase, and Spray

cko · on June 16, 2015

Does ML unambiguously stand for Machine Learning now? I got excited because I thought it referred to ML the language.

darksaints · on June 16, 2015

As a fan of both the ML language and Machine Learning, I would say not unambiguously but highly likely. ML as a specific language is mostly dead, living on through its descendants like OCaml, F#, Scala, etc. For the most part, people will mention an ML descendant by name if they are talking about a programming language.

troymc · on June 16, 2015

ML stands for many things. It all depends on the context. See the Wikipedia disambiguation page:

https://en.wikipedia.org/wiki/ML

Gonzih · on June 16, 2015

I found it very interesting tool to play around, at the same time when I was playing with it (year ago or so) it felt like black box without any visible feedback on your learning process. It's very hard to tune ML system without intermidiate learning feedback.

reycharles · on June 16, 2015

I remember them sending me an (unwanted) email because I had contributed to an unrelated Scala project on Github by reporting a bug.

angrymouse · on June 16, 2015

Similar(ish) thing. I think I starred the project and about a year later I got an email out of the blue from them.

I get that I starred the project on github but that's hardly consent for an email.

elcct · on June 16, 2015

Oh no, that's terrible. This comment should be at the top

troysk · on June 16, 2015

I have used this and found it to be super easy for someone with no knowledge of ML to get started. Is there anything better than this?

aidanf · on June 16, 2015

There are quite a few machine learning API services. Here are a few:

* BigML https://bigml.com/

* Dato https://dato.com/

* Amazon Machine Learning http://aws.amazon.com/machine-learning/

* Google Prediction API https://cloud.google.com/prediction/docs

* Azure Machine Learning http://azure.microsoft.com/en-us/services/machine-learning/

Whether or not any of them are better than the others - it depends on what you're trying to do. Different services have different strengths and capabilities, so whether or not any of them are suitable depends on the task you're working on.

troysk · on June 17, 2015

Thanks for sharing! Knew of Google, Amazon and Azure solutions. BigML and Dato looks interesting!

j42 · on June 16, 2015

Maybe Spark + Cassandra, but Cassandra has definitely shown to be finicky when it comes to 4+ node clusters, meaning, unless you're willing to contribute time and resources to devops and dive into Java, it's a point against getting up and running quickly.

That said, this service runs on HBASE which is great, however, queries function similarly to a mapreduce. This has proven consistently slower than SQL-like alternatives and I can think of a few use-cases where you'd definitely want that added speed.

It's really a question of "good enough" and how many components of your stack you're willing to take responsibility for, in exchange for enhanced scalability and IOP/s.

For what its worth though, I think Spark (in light of the recent commitment from IBM) is here to stay, so I'd say it's the unequivocal leader in distributed load/clustering frameworks.

tstonez · on June 16, 2015

Also built-in support for other data store backends e.g., PostgreSQL, MySQL, ...see https://docs.prediction.io/system/anotherdatastore/ since 0.9.3 release.

troysk · on June 17, 2015

Storage is not what engineers struggle with its the ML setup.

somerandomness · on June 16, 2015

The website doesn't say who is behind this. How can a user trust it?

stuartaxelowen · on June 17, 2015

I believe they're an open-source-as-a-service company like Chef or Datastax.

bra-ket · on June 16, 2015

previous thread https://news.ycombinator.com/item?id=6574087