Enterprises think differently. They want data provenance, privacy, ability to mi...

acheong08 · on May 8, 2024

Llama and Mistral are already local & fulfill these requirements

abdullin · on May 8, 2024

IBM goes at great lengths to train models on clean data that has lower risk of copyright or legal issues attached. Just take a look at the model description.

That data issue is important enough for some companies to pick mediocre model over llama or mistral.

doctorpangloss · on May 8, 2024

What if I told you that a lot of freely licensed code on GitHub is not clean? That the authors may have read something and rewritten it in a way that wasn’t transformative? So it basically has the same problems.

bayindirh · on May 8, 2024

What if I told you the supposedly clean "The Stack" dataset contains at least one GPL repository inside, just because their license detection tool bugged out?

IBM and other big players are vigilant about these things, and this is what companies pay for.

Their software may not be better in some metrics, but they're cleaner in some and their support contracts allows people to sleep tight at night.

This is what money buys. Peace of mind and continuity.

maccard · on May 8, 2024

> IBM and other big players are vigilant about these things, and this is what companies pay for.

And more importantly, IBM will guarantee it in the case that they're wrong. _That's_ what companies pay for.

CamperBob2 · on May 8, 2024

And more importantly, IBM will guarantee it in the case that they're wrong.

So will OpenAI, according to Sam Altman. Can they be trusted?

bayindirh · on May 8, 2024

IBM has a track record going back to automatically price calculating cheese cutters [0], but Sam does not.

IBM has proven itself in various ways over the years, OpenAI hasn't.

While IBM is a behemoth of a money making machine, they put money where their mouth is. OpenAI does not.

So I'll trust IBM, but not OpenAI.

[0]: https://youtu.be/z8VhNF_0I5c

bayindirh · on May 8, 2024

Yes. I tucked it under "support contract" part mentally, actually.

maccard · on May 9, 2024

That's fair, but until I actually read one of those contracts myself I didn't really understand what people meant by "support"

bayindirh · on May 9, 2024

It depends per job and per requirements, and has a direct affect on the cost of the contract in general.

doctorpangloss · on May 8, 2024

Indemnity is moving the goal posts, no? So you’re conceding that their data isn’t clean. But they say it’s clean.

This support contract stuff: what are you talking about? You download these models, you use them. What would you pay for? It’s not clean data, they say it’s clean: why would I pay liars? Let’s game out the indemnity idea. I pay $10k/mo for 12 months. Then OpenAI loses v. NYTimes, ruled LLM training is not fair use, need express permission. IBM pulls the models. What the hell did I pay $120k for? And by the way, you can pay a law student 1 beer to tell you OpenAI is going to lose because of Warhol v Goldsmith. You can do whatever you want with your money, but I personally would not waste it on worthless indemnity.

bayindirh · on May 8, 2024

First of all, "The Stack" is the dataset that models like StarCoder is trained upon. I don't know what's the data source for IBM Granite family.

I know the Stack is not clean, because they included my fork of GDM's greeter, which is GPL licensed.

My words about IBM was in general. I can't tell anything about their models, because I didn't see mention of "The Stack", and I don't know what their models are based on.

On the other hand, IBM doesn't like risks from my experience, so they would play it way safer than other companies.

If their data is not clean to begin with, then shame on them, and hope their AI efforts burn to the ground.

BTW, LLM training is not fair use. For start, Fair Use's definition automatically excludes "for profit" usage. Just because OpenAI has a non-profit part and training done here doesn't make them immune to consequences of for profit operations.

rolisz · on May 8, 2024

Yes, but nobody got fired for buying IBM

antod · on May 8, 2024

Yeah if something you install doesn't work, you get the blame. If IBM supplies something that doesn't work (likely), you get to blame them instead.

szszrk · on May 8, 2024

Between "works" and "doesn't work" there is a full rainbow of possible answers, KPIs, yearly reviews in a network of matrix reporting.

There will be market for their services. Maybe a different one, but there will be.

morgante · on May 8, 2024

That's a pretty outdated phrase, even in enterprise.

kubami · on May 8, 2024

By no means it is an outdated phrase. Ask any startup sales person!

morgante · on May 8, 2024

I personally know a VP who was fired for buying "IBM Cloud." You can absolutely get flak for choosing IBM these days, even at a stodgy enterprise.

The gist is still current, but you need to fill in AWS as the current uncontroversial choice.

dubcanada · on May 8, 2024

Must be a very terrible company to work for if they are firing people solely on them picking X over Y.

jazzyjackson · on May 8, 2024

but who can you pay to run these models and fulfill these requirements /for you/ ;)

worthless-trash · on May 8, 2024

I could be wrong, but I think thats what the RHEL AI, topic was all about 24 hours ago ?

keefle · on May 8, 2024

Can I sue Lamma and Mistral if things go wrong?

dartos · on May 8, 2024

Llama is owned by Meta, so you’d be suing meta

But I’m pretty sure both models have “we’re not responsible” clauses.

keefle · on May 9, 2024

That was my point. Whereas if you are using a service from IBM as an enterprise, you would be able to sue them

insane_dreamer · on May 8, 2024

“Nobody was ever fired for hiring IBM”