Hacker Newsnew | past | comments | ask | show | jobs | submit | lukko's commentslogin

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.

I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.

Thank you!


OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/

It includes both mathematical formulas and PyTorch code.

I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.


Cool!

It's also nice that Sutton & Barto belabors a lot of old stuff that is no longer obsessed over, and this skims through that and gets to the stuff that is much more relevant today.


Even this OpenAI course is from 2020? Are there no useful recent updates on the subject, especially now with everyone working and using RL?

Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.

Read here: http://incompleteideas.net/book/the-book-2nd.html


I worked thru David Silver’s RL course a while back, it’s got great explanations as he builds up the equations. It’s light on implementation, but the intuitive side really complements more code-heavy examples that lack the “why” behind the equations.

https://davidstarsilver.wordpress.com/teaching/


The bellman equations (exactly as written above) are not found in ML libraries.

This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL.

Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below.

- the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand.

- first basic bellman equation solving (this is code wise just solving a system of linear equations)

- an algo you will come across called policy iteration (code wise, a bunch of for loops..)

- here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation.

- here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance.

- you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO.

- then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time.

- at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions)

- learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point.

- learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.


> The bellman equations (exactly as written above) are not found in ML libraries. This is because they work assuming you know a model of the data. Most real world RL is model-free RL.

Q-learning (the usual application of the Bellman equation) is generally model-free. It is also commonly found in reinforcement learning libraries.


Usually deep Q learning is found in libraries where you function-approximate Q with a NN, which I alluded to in one of my later paragraphs (the approximation one).

Model-free RL doesn't mean you aren't training a model. It means that you aren't explicitly building a model of the environment's f(s,a)=(s',r) transition function, which methods like Dreamer do.

Q-learning only approximates the Q-value function, not the full state transition, so it is model-free.


I know that... I didn't say Q learning is not model-free.

Do you seriously think a writer of that post would think model-free RL means "not training a model at all"?

> Dreamer

Especially when I later mentioned dreamer myself specifically as a model-based algorithm

> Site guidelines: Please respond to the strongest plausible interpretation of what someone says


I would recommend that you start with one of the classics (not much of deep RL)

https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...

This will have a gentler learning curve. After this you can move on to more advanced material.

The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.

Happy reading.


Hahah, this made me laugh. Thanks, Claude


Was this written by a human?


I would love if I don't have to port my whole iOS app to Android manually. How exactly would this integration work if say business logic is handled by Swift - I'm guessing UI and SwiftUI would not be supported initially?

My app [0] uses a lot of metal shader code - I'm guessing there's no easy way to bring that across?

[0] https://apps.apple.com/app/apple-store/id1545223887


It'll take you thirty minutes to port the shaders with a modern LLM.

I am not joking. I have done this. Shaders are pretty simple. You'll have some weird artifacts but thats more because of platform differences than translation errors.


Metal cannot be used on Android. Your business logic can be ported - if it's separated as a library. If you don't want to separate it, Skip can handle bridging a lot of Apple libraries including SwiftUI.


Thanks - I see, so swift packages for everything.

What would be the equivalent shader / GPU language on Android? OpenGL?


OpenGL or Vulkan, you might be able to have some luck transpiling metal shaders to spirv-cross at a cursory look


Sometimes it is easiest to have an agent like Codex rewrite the shader instead...


Vulkan with glsl to spirv compiler would be equivalent


Yeah I would also like to see SwiftUI but its apple ecosystem only.


https://skip.tools ported SwiftUI to Android.


I'm a doctor too and would love to hear more about the rationale and process for creating this.

It's quite interesting to have a binary distinction: 'concerned vs not concerned', which I guess would be more relevant for referring clinicians, rather than getting an actual diagnosis. Whereas naming multiple choice 'BCC vs melanoma' would be more of a learning tool useful for medical students..

Echoing the other comments, but it would be interesting to match the cards to the actual incidence in the population or in primary care - although it may be a lot more boring with the amount of harmless naevi!


Thanks for your comment. The main motivation for me in developing the app was that lots of my patients wanted me to guide them to a resource that can help them improve their ability to recognise skin cancer and, in my view, a good way to learn is to be forced to make a decision an then receive feedback on that decision.

For the patient I think the decision actually is binary - either (i) I contact a doctor about this skin lesion now or (ii) I wait for a bit to see what happens or do nothing. In reality most skin cancers are very obvious even to a non-expert and the reason they are missed are that patients are not checking their skin or have no idea what to look for.

I think you are right about the incidence - would be better to be a more balanced distribution of benign versus malignant, but I don't think it would be good to just show 99% harmless moles and 1% cancers (which is probably the accurate representation of skin lesions in primary care) since it would take too long for patients to learn the appearance of skin cancer.


> most skin cancers are very obvious even to a non-expert and the reason they are missed are that patients are not checking their skin or have no idea what to look for

I am a skin cancer doctor in Queensland and all I do is find and remove skin cancers (find between 10 and 30 every day). In my experience the vast majority of cancers I find are not obvious to other doctors (not even seen by them), let alone obvious to the patient. Most of what I find are BCCs, which are usually very subtle when they are small. Even when I point them out to the patient they still can't see them.

Also, almost all melanomas I find were not noticed by the patient and they're usually a little surprised about the one I point to.

In my experience the only skin cancers routinely noticed by patients are SCCs and Merkel cell carcinomas.

With respect, if "most skin cancers are very obvious even to a non-expert" I suggest the experts are missing them and letting them get larger than necessary.

I realise things will be different in other parts of the world and my location allows a lot more practice than most doctors would get.

Update: I like the quiz. Nice work! In case anyone is wondering, I only got 27/30. Distinguishing between naevus and melanoma without a dermatoscope on it is sometimes impossible. Get your skin checked.


Thanks for your kind words with regards to the app and well done for getting such a high score!. I agree that BCC is often subtle. My practice is also largely focused on skin cancer. I would say that the majority of melanomas (and SCCs) that I diagnose would be obvious to a patient that underwent a short period of focused training and checked their skin regularly. A possible explanation for the difference in our experience is that the incidence of skin cancer (and also atypical but benign moles) a lot higher in Australia than in the UK.


There would be quite the difference in our patient demographics.

I have quite a few patients from the UK who have had several skin cancers. Invariably they went on holidays to Italy or Spain as a child and soaked up the sun.

Keep up the great work.


Classically, BCC's have a pearly surface and 'rolled' edges, which differentiates them from pimples.


This was my concern too - as a little project, it's interesting but if it's a replica of XP it has been done before and much more accurately.

As a portfolio, I think it doesn't work at all and is detrimental to what you're trying to do. I think now in design, it is more important than ever for your work to cut through the noise and show at least some attempt to create something original.

I think sometimes graphic design is seen as competence with certain programs, which I guess includes genAI now, or making something cool - but really it is visual communication that responds to a set of constraints - e.g. a brief, tailored to a target audience, communicating a product or emotion. There are no shortcuts - study what has been done, work on communicating what you want to say with colour, layout, typography and images. Draw and paint; avoid genAI until you are competent without it. Currently as a graphic design portfolio, I'm sorry to say it is memorably bad and there is a lot of work to do.

That said, well done on finishing something, and making it to the top of HN. I hope the attention leads somewhere and that you continue making things.


I highly disagree with the feedback above.

The reality is, it depends on the context of whom is hiring. A startup values things like being resourceful and finishing stuff vs a large firm wherein most projects get dumped anyway.


In either case a "real" portfolio will be more effective and less work than this Windows XP thing, which is the point.


As someone that has hired designers before I'm far more impressed by this than a portfolio.


To get the attention it works very well. It stands out and will be remembered.


I think now in design, it is more important than ever for your work to cut through the noise and show at least some attempt to create something original.

From what I've seen, at least half of design work is "make it look like x" where x may be "glass", "CRT effect" or "BigCo's design language".

This project looks like some light-hearted fun and demonstrates an ability to achieve a desired look. You seem to be looking for someone doing greenfield design work for a large advertising agency.

I see nothing in your profile that indicates any expertise in design, so it's really bold of you to level this kind of criticism at someone's project.


You’re reinforcing my point on not really understanding what design is, above. It is not a surface coating or a look.


You're talking absolute nonsense and failing to read what I've written. Your scathing review is simply wrong.

Again, you have zero design credentials in your profile. You don't dictate what design is and is not.


thanks for the support mate! don't worry about people like that, If people want to be so rigid in their thinking - let em!


The rigidity is in creating derivative work, if you make something original you will know, it will be very exciting.

All the best to you both.


I like the idea of a 'bullshit timebox' - an hour period of protected time for minor chores & slightly annoying tasks.

I wonder what the best way of arranging it is. I guess you want to schedule them or have set weekly times, otherwise there's a slight overhead of remembering and finding the best time to timebox. Or maybe you use the last timebox to schedule the next one..


Is this a paid placement? It seems kinda unusual for the NY to name an app on the home page, and there doesn't seem to be anything unique about Opal vs other blocker apps.


could be. or it could just a situation of picking something and moving on with their day. especially as you say, there doesn’t seem to be anything unique vs many similar apps.

this is exactly the kind of thing i appreciate. where if i’m taking the limited amount of time i have in my day, im choosing to browse through something, i don’t need decision fatigue on something like this. just recommend me the thing that you know works (and is pretty much the same as others that also work) so i can move on with my day.

someone recommending something doesn’t mean it’s somehow the only choice out there, it’s just curation. we could all use less decision fatigue, particularly if it’s one of those things where they’re all so similar to each other.


Yes it is a paid placement and a fictional story


I love that the design is based on a Cray II [0]! I recently saw the original supercomputer in Paris.

I can't quite see why this would need a Arduino Opta, over a regular Arduino Nano (maybe with a multiplexer?) - is it because the solenoids are 24V?

https://en.wikipedia.org/wiki/Cray-2


Oh, thank you for pointing that out!

If you notice any more typos / terrible translations please send to hello [at] lungy [dot] app and I'll get them fixed. Thank you!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: