Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are you talking about? Fine tunes are basically just more of the same training, optionally on selected layers for efficiency.


RLHF or DPO are definitely not just the same thing as the basic torch training loop, hence my many more lines of code argument.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: