I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.
The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.
As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.
One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.
Sorry, excuse my ignorance, but why is having access to a 3B model a gamechanger?
I played with a pirated 7B model a while back. My computer runs a 1080 TI - so it used to be good but now it's pretty old. The model ran with a reasonable number of tokens/sec, but the quality was just trash compared to what I'd grown used to with ChatGPT. It was a novelty I interacted with for just a single evening.
I truly don't understand the use case for a 3B model with our current technologies.
You can ultra fine tune those models ... look at vicune 13B, if you know how to prompt it well, you can get it to work as """"well"""" as ChatGPT. Running on local hardware .... I just got vicune 13b on gradio[1] to act as japanese kanji personal trainer, and I've only used a simple prompt: "I want you to act as a Japanese Kanji quiz machine. Each time I ask you for the next question, you are to provide one random Japanese kanji from JLPT N5 kanji list and ask for its meaning. You will generate four options, one correct, three wrong. The options will be labeled from A to D. I will reply to you with one letter, corresponding to one of these labels. You will evaluate my each answer based on your last question and tell me if I chose the right option. If I chose the right label, you will congratulate me. Otherwise you will tell me the right answer. Then you will ask me the next question. Avoid simple kanjis, let's go."
Sure, a 13B model can be fine-tuned to be pretty decent, which is quite remarkable compared to GPT3's 175B paramters. But a 3B model has 1/4th as many parameters as Vicune-13B, or about twice as many as GPT2. Can you really fine-tune that to do anything useful that wouldn't be better handled by a more specialized open-source model?
How can someone get into using these models? How does ‘tuning’ work? How might I go about using these models for doing things like say summarizing news articles or video transcriptions? When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?
You can use gradio(online) or download(git will not download, it's too big, do it manually) the weights at https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main and then load the model in pytourch and try inference(text generation). But you'll need either a lot of RAM(16GB,32GB+) or VRAM(Card).
> How might I go about using these models for doing things like say summarizing news articles or video transcriptions
Again, you might try online or setup a python/bash/powershell script to load the model for you so you can use it. If you can pay I would recommend runpod for the shared GPUs.
> When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?
From my view ... not much ... "fine-tuning" means training(tuning) on a specific dataset(fine, as in fine-grained). As I believe(I'm not sure) they just run more epochs on the model with the new data you have provided it until they reach a good loss(the model works), that's why quality data is important.
A newer but much better system actually reduces the model size while reducing the functionality of the system - similar to training a NN for a very specific task (as was typical several years ago), but now it can happen with far less data.
https://arxiv.org/pdf/2305.02301.pdf
This paper is quite fantastic, and will likely shape up to be a quite important glue task for LLM models to generate.
While I recognize that this only one example of what you can do, you can just ask chatgpt to program you a traditional program that does something like this and not have to run a (pretty big/power-intensive/slow on most hardware) 3B/7B parameter model for simple tasks like these.
Yeah it wouldn't be as flexible as a LLM (for example synonyms won't work), but I doubt that for this particular task it'll be that big of problem, and you can ask it to tweak the program in various ways (for example introducing crude spaced-repetition) making it arguably better than the AI solution which takes sometime to prompt engineer and will never be "perfect".
I don't really know how much better fine-tuning makes these models, so I can't think of anything that they can actually be used for where they aren't worse than traditional programs, maybe as an AI in games? for example making them role-play as a historical figure in Civilization 6.
My example here was silly and I admit. But the point was that this simple task cab become more "nuanced"(Aside from ChatRWVK-raven, no other model quite "works" like Vicuna or "tuned LLama"), it can, given the correct prompt act as someone in a fictional work which might help you learn the language better by increase conversational time(most important metric, I'm talking comprehensible input here) by the virtue of being more enjoyable.
Overall I like the progress: LLama releases -> LLama fine turned on larger models gets similar performance to ChatGPT on lower parameters(more efficient) -> People can replicate LLama's model without anything special, effectively making LLMs a "Commodity" -> You are Here.
Finetuning which can easily be done on consumer hardware and can give these models a lot more power for specific applications.
Also, ChatGPT just can't do a lot of things because of their "rules". I was doing question answering about products on Amazon with ChatGPT and refused to answer any questions about underwear, certain books/videos, etc
Depends on what you want it for. Chatting isn't the only application. For text summarization a model like Vicuna-13b has similar performance to ChatGPT 3.5. Fine-tuned models like the one in this thread might perform way better than the initial ones that leaked from Meta. The important thing is that there's constant progress in this area from the Open Source community and we're about to see amazing things in the future.
I'm in the market for a laptop. If I was crazy and wanted to run or train models like these, what kind of resources would I need?
Would the way the m2 MacBooks share memory be an advantage, or would the lack of cuda support be a killer? Can you do anything with 16GB, or do you need 128gb or something like that? How large are the datasets?
I've only used scikit-learn and pandas so far, I'm not very familiar with neural networks yet
It's not crazy to want to train or run models like these, it's actually quite popular right now! :) The question for you to answer is how handy with scikit-learn and pandas are you, and how much do you want to be on the bleeding edge of things? Most stuff is coming out for CUDA first, since that's what the industrial grade GPUS (A100s) use, so with Apple Arm you either have to wait for someone to port it, or port it yourself.
On the other hand, getting > 8 GiB VRAM on a laptop GPU is rare; you're definitely not getting 128 GiB VRAM, so Apple Arm, with 32 or 64 GiB or RAM (get 128 if you can afford it) is going to get you more gigabytes of usable RAM for training/inference.
Yeah. It seems to me that it's really hard to get more than 10-14 GB of VRAM without using some sort of hyper expensive cluster. What would it cost if you wanted to do it with Nvidia? Being able to share ordinary ram with the GPU in a Mac could maybe be a unique value proposition
RTX 3090 or 4090 gets you 24Gb of VRAM, which is enough to run llama-30b (quantized to 4-bit with groupsize of 1024 or higher) at speeds comparable to ChatGPT. You can also get two and run the model split across them, although pumping data back and forth slows things down.
A brand new RTX A6000 (48Gb VRAM) is probably the largest you can get in a single card that can run in a regular PC. It can be had for $4-5k and is sufficient for llama-65b.
Beyond that, yeah, you're looking at dedicated multi-GPU server hardware.
> It seems to me that it’s really hard to get more than 10-14 GB of VRAM without using some sort of hyper expensive cluster.
Both consumer and workstation (the latter may be cheaper per RAM, but with fewer shaders) 16-24 GB GPUs (RTX 3080Ti/3090/4090/A4000/A4500/A5000), including in laptops, are not hard to find (pricey, but not “hyperexpensive clusters”), and its not until you jump above a single 48 GB RTX A6000 that you need a “cluster”.
Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.
Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.
> In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models.
So we are all in agreement here that a 3B model is fundamentally inferior to a larger model?
Not that it doesn’t have uses; not that there’s no value in research in small models.
Just, honestly, that these smaller models don’t have the capabilities of the larger models.
It’d be good to be a direct acknowledgment of that, because it seems like you’re going out of your way to promote the “it’s fine to have a small model”; and it is, roughly speaking. Parameter count isn’t everything. Small models are accessible, you can easily fine tune them. They are interesting.
…but, they are not as good, as far as I’m aware, in terms of output, in terms of general purpose function, as larger models.
For your first point where you are attempting to impose agreement, I believe the other commentator is saying that tradeoffs are non-negligible between the two.
Sounds like the difference between edge and centralized ML scoring.
There is no “one size fits all” here. A bigger model is just a bigger hammer, that in many uses is too bulky and slow to be a proper solution.
At my job, I can’t casually fire up 8xA100 80gb instances. And if I could, the performance wouldn’t have the throughput I require to be useful. Big models are operationally much more expensive.
The smallest/fastest model that is accurate enough for your use case is ideal.
Of the goal is to use it to access a large knowledge base (like Google, but with better semantic searching), then it doesn't matter as much. There are some cases where it still matter due to not making some connections (for example, you may want an answer to something and not realize it due to your ignorance - a smaller model will get that a few percentage less times).
But ultimately small models are very good for most things, and much more preferable (to run at the home to organize your digital life, with a small SBC or old computer)
> Hey SeanAnderson, good question! While parameter count is certainly an important factor in model performance, it's not the only one. The RedPajama project is taking a more nuanced approach to understanding what makes a model perform well, and their focus on smaller models like the 3B is a big part of that.
Sure, you may have played with a 7B model in the past, but that doesn't mean there's no use case for a smaller model like the 3B. In fact, having a performant, smaller model is a game changer for a lot of applications that don't require the massive scale of the larger models. Plus, smaller models are generally faster and more accessible, which is always a plus.
It's hard to pick out the actual answer: what is the application that this is good at? What has their "more nuanced" approach to understanding performance increased this model's performance at doing?
I didn't realise it was written by an LLM but it did come off as weird to me because it borrows phrases (most obviously the bit about a "2070 released 5 years ago") from the press release itself.
Right there with ya! Staying home has so many perks. And hey, let's not forget the environmental impact: less commuting means less pollution and fewer traffic jams. Plus, by embracing remote work, we're not just lining the pockets of big landlords and property owners. We're taking control of our time, our space, and our lives, all while doing our bit for the planet. No turning back now!
Wow, qwertyuiop_, seems like you're tryna connect the dots to make it some big conspiracy. Newsflash, buddy: it ain't about some puppeteer controlling everyone.
hah, you've got a point there! It's important to remember that even influential figures in tech aren't infallible, and they too can have a few, let's say, less-than-stellar moments. While Sam's journey with Loopt might not have been a "revolutionary unicorn," his contributions to the industry shouldn't be completely dismissed.
That being said, it's always healthy to question and scrutinize the opinions of tech leaders, as their perspectives can sometimes be out-of-touch with the reality experienced by the average worker. So, let's just say Sam's take on remote work might be one of those moments where we take it with a grain of salt and a hint of humor.
It's almost like Sam Altman is desperately trying to downplay the significance of remote work, while simultaneously relying on it for OpenAI's success. What a paradox, amirite?
Remote work has been a game-changer in promoting diversity and inclusivity, particularly for underrepresented communities such as Black professionals. By breaking down geographic barriers and providing equal access to job opportunities, remote work empowers talented individuals from all backgrounds to excel in their careers. This shift helps to create a more level playing field, fostering innovation and driving progress forward in the tech industry (thus far dominated by whites).
While I understand the sentiment that some in-person time may be beneficial, I'd argue that it's not a strict requirement for a successful work environment. With the advancements in technology and communication tools, we can now maintain strong connections with our colleagues, even without face-to-face interaction. Virtual meetings, team-building activities, and online workshops can serve as effective substitutes for in-person gatherings, allowing teams to stay connected and collaborate effectively.
The key lies in fostering a culture of open communication and trust among team members, regardless of their physical location. By prioritizing these values and leveraging available technology, it's possible to create a cohesive and productive remote work environment without the need for in-person meetings. As the landscape of work evolves, it's essential for us to adapt and explore new ways to connect and collaborate, transcending the boundaries of traditional office settings.
phil21, I see where you're coming from, but let's not forget that we're all different, and what might work for you won't necessarily work for everyone else. It's great that you thrived in that environment, but forcing others into the office because it suits you just ain't the way to go.
There are folks who are far more productive working remotely, and we should respect their preferences too. It's all about balance and understanding that each individual has unique needs when it comes to their work environment.
A hybrid model could be the answer here, allowing people the flexibility to work from the office or home as they see fit. There's no point in squeezing everyone into a one-size-fits-all solution. Let's prioritize productivity and well-being over the illusion of a "perfect" office setup.
And yeah, that 50-90 minute commute you mentioned sounds like absolute hell. We can probably all agree that nobody should be subjected to that.
I believe that's exactly what I was advocating for as the most productive. The ability to choose to go into an office at your convenience. It certainly was in my personal experience, and I wish it an option for everyone. I was simply pointing out how impractical and rare that situation tends to be.
And I also have a different perspective having worked from home "before it was cool" starting back in '98. I've spent far more time in my home office by a factor of 10 to 1 at least than in an office.
Keep in mind with WFH you are forcing everyone to your preferences just as much as work from office. There are far more dynamics at play than personal productivity - team and business productivity as a whole is far more important and rarely talked about.
Many times as a manger of a fully WFH team I'd force some grumpy sysadmin in for a few days a week for a month as we knocked out a project. Sure they got "nothing done" those two days according to them, but they unblocked critical projects for the rest of the team during that time that simply was not happening while they were in their "focus cave". To this day they will tell you it was a waste of their time and WFH would have been far more productive. I highly disagree.
As you say, everyone is different. As are situations. Some of the highest velocity teams I've interacted with were a handful of highly skilled seniors geographically distributed. There are projects I can think of I could call a few folks and form a fully remote team, and other projects I'd very much want to be on-prem "mostly office". It all depends.
There is something to be said about looking back the past 25 years. While I remember some fond moments from my spare bedroom, I remember far more from in-office interactions. I'm not sure how I'm going to feel about that fact in another 20 years.
Edit: I do wonder how opinions would differ if we invented free teleportation tomorrow. I wonder how much of this is commute vs. actual preferred work environment. For me it's mostly commute - give me a private office people respect and I'd likely prefer it with a teleportation pad in my living room.
The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.
As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.
One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.