Would that be legal? I wonder who holds the copyright to user-contributed data in DuoLingo (assuming they're copyrightable).
If some of the user-generated content isn't copyrightable, or was contributed by users willing and able to share it with a FOSS project, could only that data be scraped, or would it be too difficult to identify?
One way is to get Premium and download the course. I haven't looked at it, but I assume they haven't bothered to do any copy protection on those data packages. Not sure if they contain account-bound watermarks of any kind.