29 Feb 2020

Smarter people than you

History / Edit / PDF / EPUB / BIB / 1 min read (~105 words)
Questions

How can you tell when you've reached the top of your job and that you won't find smarter people?

When people that challenge you bring arguments which you've already considered. When the suggestions of others are stale or not intriguing. When you don't find excitement in your work. When you aren't challenged anymore. When the problems you are facing are not solved elegantly by others. When the problems you are facing have become so niche that few people on Earth may be able to discuss them or help you solve them.

My click CLI is slow, even just to show the help. How do I make it go faster?

In most cases, the reason your click CLI is slow is that you have large imports at the top of the files where you have declared your commands.

The typical pattern is as follows:

cli.py

from train import train
from predict import predict

@click.group()
def cli():
    pass

cli.add_command(predict)
cli.add_command(train)

train.py

import click
import pandas as pd
import torch

@click.command()
def train():
    pass

predict.py

import click
import pandas as pd
import torch

@click.command()
def predict():
    pass

Notice that in both these files we import pandas and torch, which can account for a large chunk of script execution time simply due to importing them. You can verify that by simply running python -X importtime train.py 2>tuna.log and using tuna (run tuna tuna.log) to inspect the results and convince yourself.

The suggested pattern is to move the imports inside of the function itself, as such:

train.py

import click

@click.command()
def train():
    import pandas as pd
    import torch

    pass

predict.py

import click

@click.command()
def predict():
    import pandas as pd
    import torch

    pass

This will shave off a large amount of time spent importing those packages (pandas and torch). They will only be loaded when you need to run the command itself, not every time you invoke the CLI.

Another pattern which is more complicated is to move the logic of the functions in separate files. This is done to avoid the common mistake that will happen over time that developers will add more logic in those command files, adding imports at the top of the file and slowing the CLI again. By moving the complete implementation to a separate file, you can have the imports at the top of the file and it is not possible to make this mistake again.

train.py

import click

@click.command()
def train():
    from train_implementation import train
    train()

train_implementation.py

import pandas as pd
import torch

def train():
    # Implementation is now here
    pass

I have two test files with the same name and pytest complains. How do I make it work without changing the test filenames?

Example directory structure

/path/to/project/tests
├── a/
│   └── test_a.py
└── b/
    └── test_a.py

Error message:

import file mismatch:
imported module 'test_a' has this __file__ attribute:
  /path/to/project/tests/a/test_a.py
which is not the same as the test file we want to collect:
  /path/to/project/tests/b/test_a.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules

Add a __init__.py to each directories with tests files that have the same name. Technically, you only need to have a __init__.py file in one of the two directories, so that one is in a package while the other one is in a different one. Adding it in both simply prevents this issue from occurring again if you were to add a third file test_a.py.

/path/to/project/tests
├── a/
│   ├── __init__.py
│   └── test_a.py
└── b/
    ├── __init__.py
    └── test_a.py

How can you tell if you are a low performer?

I always prefer to compare myself against my prior self and not against others. Thus, I would consider myself a low performer if my throughput is lower than what it has been on average in the past. This may happen for many reasons, amongst them it would be because I'm learning something new, so I'm spending a good chunk of my time on learning and less on executing. It might be because I'm trying different ideas to find the best one because I'm working on something I've never worked before.

It's generally easy for a programmer to tell whether he's been more or less productive than the prior week. It is mostly based on feelings, where you feel good when you are productive and less good when you're not making any progress or facing issues.

If you think and feel that you are performing poorly, start recording more thoroughly what you are working on. Identify when you start and finish working on a task, and when you get blocked, write down why. After a few weeks, look at what you wrote and assess what might cause you to feel that you are a low performer. Is it because you're working on a task you are not good at? Is it because of a lack of motivation on the task you've been assigned to?

With more information in hand to determine why you feel that you are a low performer, you will be able to devise a plan so that you can once again feel like a high performer.

What should be defined to make a user demo walkthrough successful?

You need to define what you want to learn from the demo walkthrough: where does the user ask questions? where does he stay stuck? what is easy/hard for him to do? what does he think about when he goes through the demo? what is/isn't working? what frustrates him? where does the user want to have more guidance?

The user doing the walkthrough should be as close as possible to the ideal user otherwise you may get feedback that is biased on their own experience. A user with too much knowledge compared to your target user will be able to do many things your target user may need help with and they may assume a lot of things because they know about them. On the other hand, a user with too little knowledge will require help in many places where the target user is expected to have knowledge, which may make the demo walkthrough slower than desired.

The walkthrough should have a clear scenario. You may only give an initial setup to the user and a desired goal and let them figure everything out by themselves. You may also go with a more directed approach, where you tell them what to do and you see if the instructions are clear enough to accomplish the steps. The first approach is interesting because it allows you to observe variability in how to solve a problem.