Hey, wanted to share cookiecutter template used at deepsense.ai made by me - which got open sourced recently.
Link:
https://github.com/deepsense-ai/ds-template/tree/main
Design, docs, tips:
https://deepsense-ai.github.io/ds-template/
Blogpost:
https://deepsense.ai/machine-learning-project-templateIs it perfect for every project and uses shiny modern tools? Not really, but reality showed it is nonetheless very useful, less problematic and quite easy to adapt to your own needs. (I would personally hint things like switch pylint to ruff if possible, add jupytext etc)
Feel free to fork or extract configuration, maybe it will inspire you somehow to build your own. Certainly as software house specialized for AI with diverse customers and project types it solves specific problems you might not have :)
We needed a solid foundation to propagate quality and good practices, especially for more junior team members, we also have to enforce client's specific coding styles, SOC and security concerns etc. which are unfortunately missing often in existing solutions due to "we are data scientist and experiment code must be bad - it's faster to ship!". (TBH I disagree with such sentiments and empirically I've observed lower velocity in such projects and lot of tears coming from SEinML).
Another big issue I, personally and I'm sure many of you relate, is just how much I detest setting up new projects - spending countless hours toggling with config files, setting up tools, troubleshooting strange issues and so on. Often, it seems like there just isn't enough time to sufficiently handle all these when building PoCs/MVPs.
Generated project consists of:
Basic python package structure:
setup.py - compatibility for pip install -e ..
setup.cfg - package metadata and dependencies.
pyproject.toml - all tools configuration (if support is present)
a very minimal python code + example test
pre-commit hooks:
black, flake8 - enforce code style
pycln - cleanups unused imports
mypy - checks type errors
isort - sorts imports
pylint - provides static code analysis and enforces coding standard
pyupgrade - modernizes code for given python version
bandit - checks for security issues
Sphinx documentation:
basic preconfigured documentation template
recommended extensions
page with list of autogenerated thirdparty python packages list with licenses
Basic script to create venv
Minimal README.md file
Preconfigured semantic versioning with bump2version
Dockerfile for pre-commit image
Gitlab integration (default, optional):
linter stage (pre-commit run --all)
tests (pytest) + code coverage
license checks of installed packages
building and hosting documentation on GitLab Pages
building package and uploading to private GitLab Package registry
security: trivy
steps to rebuild linter docker image
Other less important files (more configurations, .gitignore etc)
TL;DR: cookiecutter template - hope you will find something interesting to get from the template.
I know many people have strong feelings about certain choices but open sourcing should help you shave some time and build your own version in less time than starting from scratch.