Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PredictionIO – an open-source ML stack built on Spark, HBase, and Spray (prediction.io)
86 points by stuartaxelowen on June 16, 2015 | hide | past | favorite | 16 comments


Does ML unambiguously stand for Machine Learning now? I got excited because I thought it referred to ML the language.


As a fan of both the ML language and Machine Learning, I would say not unambiguously but highly likely. ML as a specific language is mostly dead, living on through its descendants like OCaml, F#, Scala, etc. For the most part, people will mention an ML descendant by name if they are talking about a programming language.


ML stands for many things. It all depends on the context. See the Wikipedia disambiguation page:

https://en.wikipedia.org/wiki/ML


I found it very interesting tool to play around, at the same time when I was playing with it (year ago or so) it felt like black box without any visible feedback on your learning process. It's very hard to tune ML system without intermidiate learning feedback.


I remember them sending me an (unwanted) email because I had contributed to an unrelated Scala project on Github by reporting a bug.


Similar(ish) thing. I think I starred the project and about a year later I got an email out of the blue from them.

I get that I starred the project on github but that's hardly consent for an email.


Oh no, that's terrible. This comment should be at the top


I have used this and found it to be super easy for someone with no knowledge of ML to get started. Is there anything better than this?


There are quite a few machine learning API services. Here are a few:

* BigML https://bigml.com/

* Dato https://dato.com/

* Amazon Machine Learning http://aws.amazon.com/machine-learning/

* Google Prediction API https://cloud.google.com/prediction/docs

* Azure Machine Learning http://azure.microsoft.com/en-us/services/machine-learning/

Whether or not any of them are better than the others - it depends on what you're trying to do. Different services have different strengths and capabilities, so whether or not any of them are suitable depends on the task you're working on.


Thanks for sharing! Knew of Google, Amazon and Azure solutions. BigML and Dato looks interesting!


Maybe Spark + Cassandra, but Cassandra has definitely shown to be finicky when it comes to 4+ node clusters, meaning, unless you're willing to contribute time and resources to devops and dive into Java, it's a point against getting up and running quickly.

That said, this service runs on HBASE which is great, however, queries function similarly to a mapreduce. This has proven consistently slower than SQL-like alternatives and I can think of a few use-cases where you'd definitely want that added speed.

It's really a question of "good enough" and how many components of your stack you're willing to take responsibility for, in exchange for enhanced scalability and IOP/s.

For what its worth though, I think Spark (in light of the recent commitment from IBM) is here to stay, so I'd say it's the unequivocal leader in distributed load/clustering frameworks.


Also built-in support for other data store backends e.g., PostgreSQL, MySQL, ...see https://docs.prediction.io/system/anotherdatastore/ since 0.9.3 release.


Storage is not what engineers struggle with its the ML setup.


The website doesn't say who is behind this. How can a user trust it?


I believe they're an open-source-as-a-service company like Chef or Datastax.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: