Identifying python files with no coverage
History / Edit / PDF / EPUB / BIB / 1 min read (~127 words)I use pytest with coverage and I want to see the files that have no coverage.
It appears that pytest and pytest-cov will not list someof the files that are under namespace packages, while it will work fine for files in regular packages (see PEP 420 on the topic of implicit namespace packages).
To fix this problem, one solution is to add __init__.py
files in all of your directories in order to create regular packages.
If you are using PyCharm Professional, you can simply run your test with coverage. This will allow you to identify all the files that have currently no coverage as they will appear with coverage = 0%.
My pytests take a while to complete, how can I speed up the process?
A fairly cheap solution is to use parallelization to run your tests on multiple CPUs instead of the 1 cpu used by default. To do so, you can install pytest-xdist
. Once the extension is specified, all you need to do is add -n auto
when you call pytest
.
Another thing you should do that requires more effort is to investigate which of your tests are consuming a lot of CPU time to execute. To do so, use the --durations=0
flag when you call pytest
. A report will be generated after your tests have run that lists how long setting up, running and tearing down each specific test took. The list is ordered from longest to shortest durations, meaning that the tests that have the most potential for being optimized will be at the top. You should focus on these tests because the longest one will determine how long it would take to run your tests even if you had an infinite amount of CPU cores.
Investigate why certain tests take a while to execute.
- Are some tests computing something that takes a while and is computed exactly the same way by multiple tests? Precompute this result once and share it between the different tests (think of it as a fixture).
- Are calls to a slow external API done? If you are not testing that the remote API is changing, store example responses and emulate receiving them.
- Is there a loop in the test that runs hundreds of thousands iterations while the same test could be executed with only a thousand iterations?
Introducing mypy in code with lots of issues
History / Edit / PDF / EPUB / BIB / 2 min read (~282 words)I want to include mypy
as part of my CI pipeline but my existing code contains a lot (> 100, but < 500) of issues. How can I get started?
Create a minimalist configuration of mypy
such that it will list issues that need to be fixed and return a non-zero exit code. Based on the problem definition, we assume that at this step you have more than 100 issues that are listed and that fixing those issues will take many hours you'd rather invest in improving the code than to fix typing issues.
Add a step in your CI pipeline that runs mypy
and list all those issues. Verify that it indeed breaks the build.
Once you've satisfied yourself that CI fails, we will "fix" the mypy
issues by adding the #type: ignore
and/or # noqa
comment after the offending lines with issues. This will have the effect of resolving all the currently found mypy
issues, such that mypy
should now return a zero exit code. With this, any future code that fails to pass the mypy
check will break the build. This will allow you to use mypy
from this point forward to check your types.
I suggest adding an additional comment such as # FIXME: TICKET-ID
, where TICKET-ID
refers to the id of a ticket in your issue tracking system that explains that you need to take care of this technical debt.
Always prefer to fix the issues instead of ignoring them. However, also consider whether fixing those issues is an appropriate use of your time when you want to introduce mypy
(which should be as soon as possible in my opinion).
I want to analyze a python script to extract something from it. How do I do that?
Python has an abstract syntax tree like most programming language.
You can use the ast module to parse a string that contains the code you want to analyze.
A simple example is as follow. It will read a file defined in the file
variable, use ast
to parse it, returning a tree
that can then be traversed using the visitor pattern. Defining visitors lets you separate the responsibility of each of them, making the code that analyzes code easier to understand.
import ast
class ClassVisitor(ast.NodeVisitor):
def visit_ClassDef(self, node):
# Do some logic specific to classes
self.generic_visit(node)
class FunctionVisitor(ast.NodeVisitor):
def visit_FunctionDef(self, node):
# Do some logic specific to functions
self.generic_visit(node)
visitors = [
ClassVisitor(),
FunctionVisitor()
]
with open(file, "r") as f:
code = f.read()
tree = ast.parse(code)
for visitor in visitors:
visitor.visit(tree)
Run your program with python -m cProfile -o profile.cprofile my-script.py
Install snakeviz
(pip install snakeviz
) to visualize the generated profile.
snakeviz profile.cprofile
Alternative approach
Install pyprof2calltree
to convert the cprofile to a kcachegrind compatible profile.
pyprof2calltree -i profile.cprofile -o callgrind.profile.cprofile