How would you build an AI that could offer coaching for games like StarCraft or Dota/LoL?
I see coaching as similar to the loss function of a machine learning model. I also see coaching as trying to optimize a (program's) function by figuring out where the largest improvements can be made. In order to provide effective and useful feedback, a coach should focus on the areas where the player shows the most potential for improvement. In a game like Starcraft, that would mean first pointing out the macro level mistakes then the micro level mistakes.
To coach you need to have a model of which actions have an impact on the game and how much impact they have. A learning algorithm/model like AlphaZero generally exhibits two types of learning: learning from observation and learning by playing against itself.
The most common approach to coaching is by observing more successful players than themselves. The learners may watch better players while the better players are playing the game and commenting on their gameplay or the learners may watch someone reviewing a replay and providing their own analysis. Both of these cases can be seen as models (the players) trying to explain their internals (the logic behind their actions).
Coaching generally starts by trying to reproduce the recipe of someone else. You may not understand why they are doing certain things, but you do it yourself and you observe the results. As you practice repeating those same observations => actions, you try to reproduce as closely as possible what the better player would do.
In the first learning phase, the model simply observes what happens during gameplay. In competitive games such as MOBA/RTS, the only reward signal is the victory/loss at the end of a game. As human beings, we quickly learn that winning a fight/encounter is good and losing it is bad. Those give use intermediate reward signals that an AI agent may not be able to build right away since it is conceptually difficult to determine when an encounter begins and ends. The agent could however learn a simple metric such as the sum of the health of all units, where keeping this value high is generally a good thing.
The model will need at some point to be able to establish its own scoring system so it can give itself some intermediate rewards during a game. It will also need to learn how to segment a sequence of actions into repeatable action units such as constructing unit X, attacking player Y, defending zone Z. As such, it may deem that constructing unit X is worth 5 units of reward, attacking player Y is worth no reward and that defending zone Z is worth 30 points of reward. The value of rewards may vary based on numerous factors, such as how much time has elapsed since the beginning of the game, the known enemy army composition, existing vision, etc.
Having actions such as "attack coordinate X, Y" are a too low level. Your model will have to learn hierarchically complex actions such as "attack player X", "attack the gatherers of player X", "attack the weak gatherers of player X", etc. which will then translate down the hierarchy to "attack unit at coordinate X, Y".
An AI coach may look at hundreds or thousands of replays and observe the distribution of units allocation after 1, 2, 3, 5, 10, 15, 20 minutes (or every 5 seconds) into the game and their correlation to whether the player won or lost. It may look at the items purchased by the player in a MOBA game, their timing and their correlation to whether the player won or lost. For a human being to do similar thing would require a lot of time. Most would probably write scripts to automate the process of collecting those details instead of manually going through the replays one by one.
Playing against yourself is more complicated. A perfect recording of your actions may not prove difficult to beat. It may send units to the wrong location on the map, be caught off guard moving to a location while you positioned units in the middle of the path, it may react to an attack the "replay" opponent had sent to its base at one point in the game, etc. It is however a start, one example you can train against.
A lot of players who are invested in the game will do theorycrafting which is basically to use logic and reasoning in order to assess what to do in specific situations. They simulate various potential cases in their head and they devise plans to defeat them. While a human being may be able to devise a few dozen simulations over an hour, a computer may be able to generate hundreds of thousands. It may also be able to test them in a more accurate simulation environment. When game patches are released, it could rerun all the simulations it had generated to determine the impact of the patch on its existing strategies.
An AI coach can be provided the game rules, specifically, which units are weak/strong against other units, and look at the game while you are playing. If you attack your opponent and the AI observes a strong concentration of a specific type of unit, and it notices you do not have any of the units that counter this unit type, it may suggest that you start building those as soon as possible. It may also notice that your unit composition is weak against the unit composition of your enemy and suggest units to build to balance your army and to be better prepare for the next encounter.
We can see this act of theorycrafting as the equivalent of knowing, at a high level, the strategies and counter-strategies one can employ at an early point in the game, the same way you can learn the different opening moves in chess.
In the case of learning by playing against yourself, what we want the AI coach to provide us is an opponent that will challenge our current biggest weaknesses so we can address them. In many cases certain specialized strategies will be extremely strong against a specific type of strategy and we will want to know those cases so we can use those strategies when the time is right.
- Determine your weaknesses/areas of improvement
- Suggest potential approaches to solve recurrent problems we have
- Suggest heuristics that may be easy to understand and follow as human beings
- Simulate opponents that would exert your current weaknesses so you can practice against them
- Collect various gameplay related statistics their associated success rate (number of units of type X after Y minutes, number of creeps killed after X minutes, items purchase order, build order, etc.)
- If you were in an environment where you had access to very few replays, how would you learn the most out of those available?
How do you prioritize things when there are so many of them competing against one another?
My approach to task prioritization, assuming that you have a list of tasks with no other information, involves adding information that will help you prioritize them.
Assuming you are using an issue tracking system like Redmine, Notion.so or JIRA, I would add two new properties to each task: important and urgent. This approach is based on the Eisenhower method of time management. You will want to go through all the tasks and determine whether they are important and whether they are urgent.
With this first pass of information added to your tasks, you should be able to prioritize them in the following order:
- Important/Not urgent
- Not important/Urgent
- Not important/Not urgent
You should avoid as much as possible spending your time on tasks that are not important.
This is the first step to prioritizing your task. It is a lightweight approach to prioritization which will also allow you to consider whether a task has any importance, something that is sometimes not considered and leads people to work on low priority/importance tasks.
At this point, we now assume that you have a lot of tasks that are not urgent but important. So many in fact that it is hard to decide which ones to do. It may feel like you're back to square one, but now the important/urgent properties are not useful to order these tasks anymore since they all have the same value. This is where the return on investment (ROI) metric is useful.
Similar to the important/urgent properties, we will add two new properties: estimated value (in dollars) and estimated effort (in hours). A third field, the ROI, is automatically derived from the previous two by computing estimated value divided by estimated effort.
For each task, you will want to estimate to the best of your knowledge how much effort would be required to complete the task, as well as how much value you expect it to bring. Once you have those numbers for all your tasks, you should be able to order them by descending ROI. This will provide you with the list of tasks you should work on in order, as they are the ones likely to provide you with the highest amount of ROI, that is, the most value for the time spent.
If at this point you still have many tasks that have the same ROI, you can first try to tackle the ones with the least amount of effort to get them out of the way and have quick wins.
If you still have too many of your top ROI tasks with the same ROI value, my current approach cannot deal with this situation. It becomes a question of defining a higher level roadmap, determining which collection of features may have more impact than others, etc. It may also be possible to look into additional properties of the task, such as how long since they've been created and left incomplete.
Given that you define a return on investment (ROI) on a task, when should you stop working on a task and abandon it given its cost?
Let's assume that your task has an estimate of effort (in hours) and an estimate of value (in dollars). The ROI is defined as the value divided by the effort ($/h).
Before starting any task, you should define what is the minimal ROI a task should have to be considered (i.e., your ROI threshold). If you value your time at 50 $/h and a task has an estimated ROI of 25 \$/h, then it is probably not worth your time to work on this task. Any task with a ROI superior to 50 $/h is considered worthwhile.
A task should be reviewed at half the estimated effort duration. If you estimated the task will take 8h, then at 4h you should reevaluate whether you think you will finish the task in the remaining 4 hours. If 8h still stands, then you can proceed. If you believe you can accomplish the task in less time, then update your effort estimate. This should by the same act increase its ROI. If on the other hand you think that the task requires more time, then update the effort estimate as well. If you have increased your effort estimate, then I suggest reevaluating the task at the midpoint of the remaining effort (e.g., an 8h task updated to 12h would be first evaluated at 4h, then at 8h).
Since you are updating your ROI estimate, you can also update your estimate of the value of the task you are working on. If it is a large task, maybe after having prototyped the idea you realize it will not provide as much value as anticipated. You can then review its value down. If on the other hand you get excited by what it brings to the table, by all means, increase its estimated value.
Once you have adjusted the estimated effort and value, take a look at your estimated ROI. If it has gone below your threshold, consider dropping it. Investing additional time in it may provide less value than working on a new, more valuable task. If on the other hand, the ROI is still superior to your threshold, then continue working on the task.
How do you determine whether you have a useful model?
In machine learning, one way you can determine that you have a useful model is to compare it against baseline models. In a field such as time series, one can create models that are based on previous values, such as lag 0, which predicts that the next value will be equal to the current value, or a moving average, which takes the X last values and averages them and returns this average as the next predicted value. In this field, we expect that a model that can predict more accurately than those baseline models may prove to be useful.
In other cases, we may already have an existing model from which we can generate predictions. This model may also serve as a baseline which other models will have to beat in order to replace it.
There is however a case where the answer isn't clear: what happens when out of various models, one of the baseline models is the best? Then it becomes a question of whether the prediction interval produced by the model satisfies your needs for your problem.
Note that even when you beat the baseline models, if the best model still does not satisfy the error requirements that have been defined on the metrics you care about, the model may still not be useful. For example, an OCR model that has 95% accuracy at the character level will still produce 5 errors every 100 characters, which may be too high for the requirements of the system to be produced.
Another factor to keep in mind when comparing models is whether the improvements on the metric you are to optimize for are significant. The mean average error (MAE) may be lower for a model compared to another, but if its confidence interval is larger compared to the other model and their intervals overlap, then you cannot claim that one model is really better than the other. There may even be cases where you will prefer a model with higher MAE simply because its confidence interval is smaller than the other model with lower MAE but larger confidence interval.
There might be other attributes of the model that you may need to take into account. A model that takes days to train may be useless when you need it to be up to date every hour. A faster to train but less accurate model may be more useful in this case.
- Can the model under evaluation beat baseline models?
- Does the model under evaluation satisfy error requirements?
- Is the model under evaluation significantly better than existing models?
- Does the model under evaluation satisfy requirements such as time to train (e.g., less than 6 hours) or necessary resources (cpu, gpu, ram)?
Is reducing our size a viable alternative/approach to space exploration?
I don't have enough knowledge of physics to determine all the impacts that scaling ourselves down would have. There are a few things I would consider, such as:
- the impact on being launched in space
- do we need less energy?
- are there other means to get us in space if we're smaller?
- our ability to accelerate/decelerate in space
- what kind of technology can we use?
- our ability to send communication signals back on earth
- given our small size, how can we produce enough power for the signal to be received?
- are there physical phenomenons that we need to take into account due to our reduced size?
When I asked myself this question, I was reading Accelerando, where a crew is sent far in space in a "can". I always perceived it as being the size of a soft drink can and I thought that was an interesting idea. In the book they are not shipping human beings, but rather a simulation of their consciousness. In other words, they're sending an AI in a can™. I've also thought of this as explaining why we've never been visited by aliens. Maybe one of the steps in evolution or progress is that we are able to scale ourselves down so that we are not as visible as we currently are. Imagine being an ant colony instead of the human society, our footprint would be a lot less noticeable from outer space. Furthermore, if you were to travel in a can, it would be a lot more difficult to detect you than if you were travelling in an extremely large ship.
The benefit I could see from being able to reduce the size of our satellites or vessels is that less mass is required to be turned into a spacecraft, which could allow us to create many more spacecrafts for the same amount of materials.