A Python project checklist
When building a new project, it's a smart move to be very strict right from the start. It is much harder to add more linting/typing checks once you have 1000+ lines of code.
That's why I'm providing an opinionated list of libraries for your new Python project. I might write a more in-depth article on the best practices when building a web app with Python. For now, this is mostly a checklist with some obvious recommendations.
Why not a template repo instead of a checklist? Template repositories (e.g. built with cookiecutter) go quickly out of date and discourage learning about the ins and outs of all those best practices. They might make sense for your organization, but they're not the point of this article.
Summary
- Development tasks:
Makefile
- Typechecking: mypy
- Dependency and virtualenv management: poetry
- Linting: flake8
- Code autoformatting: black and isort
- Tests: pytest with plugins such as pytest-cov, pytest-factoryboy
- Docstring: pydocstyle
- Logging: structlog
- Configuration: Pydantic BaseSettings with dotenv support
- Error reporting: Sentry
- Documentation: Docusaurus or Sphinx
- Profiling and performance: profile, pyinstrument, sqltap
Other topics:
- Data validation: Pydantic Models
- DB ORM: sqlalchemy and alembic for migrations
- Web framework: fastapi
- FP utilities: toolz
- CLI framework: typer
Not detailed below:
Running development tasks: Makefile
Makefiles are well understood, work almost everywhere, and shorten the ramp-up time for fellow developers who might not have much experience with Python.
Here's an example:
SHELL := bash
.ONESHELL:
.SHELLFLAGS := -eu -o pipefail -c
.DELETE_ON_ERROR:
MAKEFLAGS += --warn-undefined-variables
MAKEFLAGS += --no-builtin-rules
install: # Install the app locally
poetry install
.PHONY: install
ci: typecheck lint test ## Run all checks (test, lint, typecheck)
.PHONY: ci
test: ## Run tests
poetry run pytest .
.PHONY: test
lint: ## Run linting
poetry run black --check .
poetry run isort -c .
poetry run flake8 .
poetry run pydocstyle .
.PHONY: lint
lint-fix: ## Run autoformatters
poetry run black .
poetry run isort .
.PHONY: lint-fix
typecheck: ## Run typechecking
poetry run mypy --show-error-codes --pretty .
.PHONY: typecheck
.DEFAULT_GOAL := help
help: Makefile
@grep -E '(^[a-zA-Z_-]+:.*?##.*$$)|(^##)' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[32m%-30s\033[0m %s\n", $$1, $$2}' | sed -e 's/\[32m##/[33m/'
This is a great article about Makefile: Your Makefiles are wrong
Typechecking: mypy
Type annotations in Python libraries are not yet pervasive, but getting better every day.
Supported by Python's founder, Guido van Rossum, mypy is the de facto standard. The cheat sheet is a very helpful resource.
You can configure mypy inside setup.cfg
:
[mypy]
strict = true
# I prefer to be explicit about ignoring packages which do not yet have types:
[mypy-psycopg2.*]
ignore_missing_imports = True
Dependency and virtualenv management: poetry
Unfortunately because of the way pip
installs dependencies, you have to deal
with virtualenv in most cases (although things might change rapidly with pdm).
Nowadays I'd recommend using poetry. It is not yet absolutely perfect, but it provides a very elegant CLI API.
It's super easy to start a project:
poetry new service-name
cd service-name
$EDITOR pyproject.toml
rm -Rf tests
mv README.rst README.md
poetry add sqlalchemy # for instance
Then either go into a virtualenv-enabled shell with poetry shell
or prefix
your commands with poetry run ...
.
Linting: flake8
There are two main linters:
I usually rely mostly on flake8 because it has fewer false positives. While pylint is super configurable, it includes too many checks to my taste.
I use the following configuration (in setup.cfg
):
[flake8]
max-line-length = 99
extend-ignore =
# See https://github.com/PyCQA/pycodestyle/issues/373
E203,
Code autoformatting: black and isort
black autoformats your code so that you don't have to think about it.
isort is a nice addition to black, it will sort your imports to comply with PEP 8:
- Sort alphabetically
- Group into standard imports, third-party imports, app imports
Here's my isort config to comply with flake8 and black (in
pyproject.toml
):
[tool.isort]
multi_line_output = 3
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
ensure_newline_before_comments = true
line_length = 88
Test: pytest with coverage
pytest has a lot of magical features but
it makes writing tests so efficient. The fixture system is brilliant and
super powerful. Using plain assert
instead of having to learn an assertEqual
metalanguage makes your life more meaningful.
Here's the config I use (in pyproject.toml
):
[tool.pytest.ini_options]
# Personal preference: I am too used to native traceback
addopts = "--tb=short"
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"if __name__ == .__main__.:",
"nocov",
"if TYPE_CHECKING:",
]
[tool.coverage.run]
# Activating branch coverage is super important
branch = true
omit = [
# add your files to omit here
]
I usually use the following plugins and lib:
- pytest-factoryboy use factories to create your fixture. Super powerful, and avoids having a single file where all your reusable fixtures are defined a thousand times with different variations.
- pytest-mock makes it easier to
work with
unittest.mock
. - pytest-cov provides coverage reports for your tests.
- doubles: sadly not maintained anymore (but
it still works!), it provides a much simpler and stricter mocking experience
than
unittest.mock
. - requests-mock to check
integration with HTTP services called with requests. It automatically integrates with pytest and provides a fixture named
requests_mock
.
Checking docstring: pydocstyle
pydocstyle enforces PEP 257 for docstring styling.
Here's my config (in setup.cfg
):
[pydocstyle]
# Do not require any docstring
ignore = D100,D101,D102,D103,D104,D105,D106,D107,D213,D203
Logging: structlog
structlog is a must-have for all your logging needs.
from structlog import get_logger
logger = get_logger(__name__)
def hello(name: str):
logger.info("saying hello", name=name)
# instead of :
# logger.info("saying hello to %s", name)
Structuring your logs has numerous advantages:
- Immediately parsable by automated tools (kibana, m/r jobs, etc.)
- Easier to write: you don't have to think about the order of your logging message
- Flexible: can be further manipulated since all log messages are dicts until they're displayed
You can create yourapp.lib.log
:
import json
import logging
from typing import Any, Dict
from uuid import UUID
import structlog
from yourapp.config import config
def default(obj: Any) -> Any:
if isinstance(obj, UUID):
return str(obj)
raise TypeError(f"Can't serialize {type(obj)}")
def dumps(*args: Any, **kwargs: Any) -> str:
kwargs.pop("default", None)
return json.dumps(*args, **kwargs, default=default)
def add_version(
logger: logging.Logger, method_name: str, event_dict: Dict[str, Any]
) -> Dict[str, Any]:
"""Add version to log message."""
event_dict["version"] = config.git_commit_short
return event_dict
class ConsoleRenderer(structlog.dev.ConsoleRenderer):
def _repr(self, val: Any) -> str:
# Display shorter uuid
# https://www.structlog.org/en/stable/_modules/structlog/dev.html#ConsoleRenderer
if isinstance(val, UUID):
return str(val)
return super()._repr(val)
def configure_logger(level: str = "INFO", *, console: bool = False) -> None:
"""Configure logging.
console should be True for console (dev) environment.
"""
# see https://stackoverflow.com/questions/37703609/using-python-logging-with-aws-lambda
root = logging.getLogger()
if root.handlers:
for handler in root.handlers:
root.removeHandler(handler)
logging.basicConfig(format="%(message)s", level=level)
if not console:
processors = [
add_version,
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(serializer=dumps),
]
else: # nocov
processors = [
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
ConsoleRenderer(),
]
structlog.configure(
processors=processors, # type: ignore
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
cache_logger_on_first_use=True,
)
Configuration: Pydantic's BaseSettings
Like most people, I usually ended up having my own mechanism for handling
configuration. Thanks to the web framework fastapi, I've discovered that
pydantic provides a very handy
BaseSettings class that
relies on environment variable for its configuration. BaseSettings
provide
many things that would be annoying to implement from scratch:
- Type hints
- Read from environment variables
- Validate configuration values
.env
support with python-dotenv- Secrets support
import os
from pathlib import Path
from typing import List, Optional
from dotenv import load_dotenv
from pydantic import BaseSettings
ENV_FILENAME = os.environ.get("DOTENV", ".env")
class MisconfiguredException(Exception):
pass
class Config(BaseSettings):
# Please use env_name ONLY for informational purpose (see docs)
env_name: str
git_commit_short: str = "unknown"
# Activate this to get profiling - see documentation.
is_db_enabled: bool = False
db_user: str = "unconfigured"
db_password: str = "unconfigured"
db_name: str = "unconfigured"
db_port: str = "5432"
db_host: str = "localhost"
sentry_dsn: Optional[str]
def get_config() -> Config:
"""Get the config."""
# We follow serverless's dotenv plugin's behavior here:
# https://www.npmjs.com/package/serverless-dotenv-plugin
# First load .env
load_dotenv(dotenv_path=".env")
if not Path(ENV_FILENAME).exists():
raise ValueError(f"Config file {ENV_FILENAME} does not exist.")
if ENV_FILENAME.endswith(".local"):
raise ValueError(
"Expected env filename like '.env.dev', "
f"got override ending with .local instead: {ENV_FILENAME!r}. "
f" Try with {ENV_FILENAME.replace('.local', '')!r}"
)
# Then load .env.{env}
load_dotenv(dotenv_path=ENV_FILENAME)
# Then load .env.{env}.local if it exists
override = ENV_FILENAME + ".local"
if Path(override).exists():
load_dotenv(dotenv_path=override)
return Config()
config = get_config()
Now you just have to run your commands like this:
DOTENV=.env.test poetry run pytest .
Error reporting: Sentry
Sentry is a service that provides exception monitoring. Its SDK is very simple to integrate.
Usually, I use the following pattern:
# app.py
from app.lib.log import configure_logger
from app.lib.sentry import configure_sentry
configure_logger(config.log_level)
configure_sentry()
# app.lib.sentry
from typing import Any
import sentry_sdk
from structlog import get_logger
from app.config import config
logger = get_logger(__name__)
def configure_sentry(**kwargs: Any) -> None: # nocov
if not config.sentry_dsn:
logger.info("not configuring sentry")
return
sentry_sdk.init(
debug=False,
dsn=config.sentry_dsn,
environment=config.env_name,
traces_sample_rate=1.0,
release=config.git_commit_short,
**kwargs,
)
Documentation: Docusaurus
Sphinx is another great choice (especially if you want to get Python code auto-documentation), but at my current company Gens de Confiance we use Docusaurus, a powerful yet simple documentation management tool.
Domain models, data validation: Pydantic
Pydantic Models is a flexible way to create your domain model objects:
- Type hinting
- Validators
- Export to json-schema
You can also use Python's standard lib dataclass together with something like marshmallow.
ORM: sqlalchemy
If you need to interact with the DB, sqlalchemy is a very safe choice. It comes with loads of features and is the most used non-Django Python ORM, which means that you'll find Stack Overflow solution for all your problems. Using alembic for DB migrations is the next logical move.
Both libraries are written by the insanely productive Mike Bayer.
Web framework: fastapi?
You have a lot of excellent choices when it comes to Python web framework. I usually prefer microframework and am currently developing with fastapi, which is a lot of fun to work with.
I usually refrain from using any plugins that come with the framework, because they are usually too coupled to the framework and the context of an HTTP request. I might write an article about my preferred setup.
Utility functions for functional programming: toolz
While Python is not a strict functional programming language, it is possible to
write FP-styled code with it. toolz is
a great companion and provides many utility functions that make writing code
easier. It has a curried-by-default namespace (from toolz.curried import take
). Checkout its cheat
sheet.
CLI framework: typer
typer (same author as fastapi and pydantic) leverages type annotations to make it super easy to write powerful scripts with a command line interface:
#!/usr/bin/env python3
"""Say hello.
"""
import typer
def main(name: str) -> int:
typer.echo(f"Hello {name}")
password = typer.prompt("password")
assert len(password) > 8
return 0
if __name__ == "__main__":
typer.run(main)
Performance profiling: pyinstrument and sqltap
- pyinstrument is a recent Python profiler which can export to HTML.
- sqltap integrates with sqlalchemy to allow you to introspect SQL queries and also exports to HTML.
Some services (like Sentry) can also profile code thanks to their SDK. Otherwise, you can also rely on the standard library module profile.
Wishlist
- Lots of inconsistencies for where to put configuration:
setup.cfg
,pyproject.toml
, specific files, etc. - The more complex the library, the more useful it would be to have type annotations and... the less probable it is to have those annotations. Libraries such as sqlalchemy (coming in 2.0) and toolz don't have official types for now.
- mypy encourages nominal subtyping (see this FAQ) which is a bit sad because it discourages using simple dicts. Fortunately something like PEP 589 TypeDict will improve things.
- It would be so nice to avoid having to use virtualenv (even through
something like poetry or pipenv). First-time Python developers get so
confused about them (compared to Node's simpler
node_modules
setup). I'm really looking forward to seeing PEP 582 deployed.
Missing something?
Drop me an email at charles at dein.fr
if you think I'm missing something!
For more resources related to Python, check out my repo charlax/python-education.