More

srush · on Sept 23, 2024

I made these a couple of years ago as a teaching exercise for https://minitorch.github.io/. At the time the resources for doing anything on GPUs were pretty sparse and the NVidia docs were quite challenging.

These days there are great resources for going deep on this topic. The CUDA-mode org is particularly great, both their video series and PMPP reading groups.

nextos · on Sept 23, 2024

Slightly offtopic, but any chance you could update or re-upload code for your https://github.com/harvardnlp/DeepLatentNLP tutorial? I found the NLP latent variable models discussed there really interesting, and notebooks were excellent. However, these seem gone and the only thing left are slides?

Alternatively, any other places that discuss the same topics, including some code? I could only find equivalent discussions with code in Pyro docs and Kevin Murphy's book, volume 2. But these are more sparse as they also cover many other topics.

srush · on Sept 23, 2024

I'll take a look. Yeah Pyro is the best thing to do here. But it would be nice to revisit some of these implementationz

nextos · on Sept 23, 2024

Thank you so much!

bytepoet · on Sept 23, 2024

Thanks a lot, Sasha, for creating these. I found your LLM training puzzles to be excellent as well.

srush · on Sept 23, 2024

Awesome! Here are all of them if anyone else is looking.

https://github.com/srush/Triton-puzzles https://github.com/srush/tensor-puzzles https://github.com/srush/autodiff-puzzles https://github.com/srush/transformer-puzzles https://github.com/srush/GPTworld https://github.com/srush/LLM-Training-Puzzles

lins1909 · on Sept 23, 2024

Thanks Sasha - this looks like a great resource.Just to be clear, would you recommend going through other newer resources than this instead?

Not sure if your comment is to discourage someone from going through this.

srush · on Sept 23, 2024

These still hold up, and I think they're a great first step. But they no longer get you to the goal line. Think about it more as conceptual practice, before you enter the jungle.

lins1909 · on Sept 23, 2024

Got it, thank you.

olive247 · on Sept 28, 2024

Do you have links to the other great resources you are referring to?

srush · on Sept 5, 2024

tweet says the opposite?

srush · on Aug 16, 2024

PyTorch is a generationally important project. I've never seen a tool that is so inline with how researchers learn and internalize a subject. Teaching Machine Learning before and after its adoption has been a completely different experience. Never can be said enough how cool it is that Meta fosters and supports it.

Viva PyTorch! (Jax rocks too)

deepsquirrelnet · on Aug 17, 2024

This is exactly why I gravitated to it so quickly. The first time I looked at pytorch code it was immediately obvious what the abstractions meant and how to use them to write a model architecture.

Jax looks like something completely different to me. Maybe I’m dumb and probably not the target audience, but it occurs to me that very few people are. When I read about using Jax, I find recommendations for a handful of other libraries that make it more useable. Which of those I choose to learn is not entirely obvious because they all seem to create a very fragmented ecosystem with code that isn’t portable.

I’m still not sure why I’d spend my time learning Jax, especially when it seems like most of the complaints from the author don’t really separate out training and inference, which don’t necessarily need to occur from the same framework.

6gvONxR4sf7o · on Aug 17, 2024

Honestly, when I turn to JAX, I generally do it without a framework. It’s like asking for a framework to wrap numpy to me. Just JAX plus optax is sufficient for me in the cases I turn to it.

PostOnce · on Aug 17, 2024

Torch was originally a Lua project, hence why pytorch is called pytorch and not just torch.

In another timeline AI would have made Lua popular.

The best part is it trampled TensorFlow which I personally find obtuse.

n7g · on Aug 17, 2024

> In another timeline AI would have made Lua popular.

I wonder if it'd have been hated more than Python is - especially with the 1-based indexing...

goatlover · on Aug 18, 2024

Scientific computing tends to be 1-based. Thus R, Julia, Fortran, Matlab.

CuriouslyC · on Aug 17, 2024

Python isn't hated AFAICT, though people will profess to hating building large projects in it (myself included), but many of those people also love it for shorter programs and scripts.

dartos · on Aug 18, 2024

Everything is hated.

Python has always gotten hate for being super super slow and having an ugly syntax (subjective ofc, but I happen to agree)

pjmlp · on Aug 20, 2024

Additionally, nowadays it also has Java and C++ bindings to the same native libraries, so others can enjoy performance without having to rewrite their research afterwards.

srush · on July 16, 2024

These slides from Lucas Beyer are pretty nice. https://docs.google.com/presentation/d/1ZXFIhYczos679r70Yu8v...

srush · on May 22, 2024

Yup. I often find people learning ML Engineering struggle a lot with shapes and broadcasting. The goal of these puzzles is to force you to really learn the semantics of broadcasting and internalize that data shapes in ML correspond to how most people think about loops.

srush · on May 22, 2024

Hey, I made these. They're pretty fun. Sometimes people tell me they use them for ML interviews, but they're kind of hard.

The motivation was primarily teaching point-free, array programming. I don't think it is a great style, but it is fun as a brain teaser.

If you enjoy this type of thing, I made a bunch more. They're all kind of ML + PL in style.

- https://github.com/srush/gpu-puzzles

- https://github.com/srush/tensor-puzzles

- https://github.com/srush/autodiff-puzzles

- https://github.com/srush/transformer-puzzles

- https://github.com/srush/LLM-Training-Puzzles

- https://github.com/srush/triton-puzzles

All the graphics for these are made in Chalk which is a python port of Haskell's Diagrams library to https://github.com/chalk-diagrams/chalk . Honestly I mostly make the puzzles as an excuse to hack on the graphics library which I find pretty interesting.

cherryteastain · on May 22, 2024

I really like the concept, but both Colab and locally running jupyter notebook seem to have issues. I'm getting an error related to "env.height" (can send you the full stacktrace if interested) in the very first puzzle.

srush · on May 22, 2024

Oh no, yes, please send a stack trace (although if it is in colab I should be able to repro)

cherryteastain · on May 22, 2024

Nevermind, I think it was just me being silly and not running the bit with wget at the top!

andriym · on May 22, 2024

keep 'em coming!

srush · on May 15, 2024

This book is great. Really mind warping at first read. Fernando Pereira has had an incredible influence across NLP for his whole career. Here is an offhand list of papers to check out.

* Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001) - Central paper of structured supervised learning in the 2000s era

* Weighted finite-state transducers in speech recognition (2002) - This work and OpenFST are so clean

* Non-projective dependency parsing using spanning tree algorithms (2005) - Influential work connecting graph algorithms to syntax. Less relevant now, but still such a nice paper.

* Distributional clustering of English words (1994) - Proto word embeddings.

* The Unreasonable Effectiveness of Data (2009) - More high-level, but certainly explains the last 15 years

srush · on Feb 12, 2024

Hi! Blog author. This was an attempt a couple years ago to understand and write about this paper in a detailed way. Here is a video going through this topic as well: https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb

Section 2 of the blog post is no longer very relevant. A lot of advances (DSS, S4D) simplified that part of the process. Arguably also this all should be updated for Mamba (same authors).

jwuphysics · on Feb 12, 2024

Thanks for your spectacular resources! I see that you began an Annotated Mamba repository -- any chance you could share when that blog page might go live?

radarsat1 · on Feb 12, 2024

This was an excellent write up thanks. It'll help me understand the Mamba work a lot more.

I still find it really confusing how a linear model can perform so well.

srush · on Nov 3, 2023

Want to give proper credit to my former student for starting this: Yuntian Deng et al., 2016 (https://arxiv.org/abs/1609.04938). I believe this repo uses the dataset from that paper.

Some recent cool work he's been doing: https://www.youtube.com/watch?v=lx1XcTdhalU.

srush · on Nov 1, 2023

Yup, should work nicely together.