Should I solve my problem with AI?

First start with a non-AI solution, then look into AI if what you have is unsatisfactory.

In this day and age, we want to make AI solve all kinds of problems, even those that don't require AI.

When you build a product, you generally have a problem you are trying to solve. With AI, companies are looking at existing problems and trying to find ways to turn them into AI problems. It is not a problem in itself to attempt the exercise, but it is a mistake to implement an AI solution where a non-AI solution would've been more than adequate. There is still a lot of work that needs to be done in the field of automation that does not require AI, but simpler statistical approaches.

A lot of research in the field of machine learning and deep learning like to point to Occam's razor in order to create simple models, but they sometimes seem to forget to apply the same principle to the whole solution, that is, do I need AI for this or would something simpler be as good?

A problem with the current wave of AI companies is their relation to AI itself. They corner themselves into doing AI only, while AI still highly relies on programming and IT, which are still highly technical but a lot less glamorous. It is definitely exciting to sell products that have AI in them, but starting with the tool and not the problem that needs to be solved is similar to trying to find all the problems a hammer can solve instead of knowing that you need a hammer when you want to nail something.

How can an agent efficiently store terabytes of data, with hundreds of gigabytes updated daily?

This question comes from the idea that if we want to implement an artificial intelligence, it will have to be able to process a large amount of data daily, similar to how we need to process a stream of sensorial (sight, hearing, taste, touch, smell) inputs actively more than 12 hours per day.

In human beings, even though we perceive a large amount of incoming data, a lot of it is compressed through differencing, that is, comparing the previous input with the new input and only storing the difference. This is similar to how video is currently encoded and compressed. To be able to accomplish this feat we however need two things: a temporary buffer to store the previous input (or sequence of inputs), and a mechanism to differentiate between the previous and the current input.

That differentiation mechanism can be highly complex depending on the degree of compression desired. For example, if you shift all the pixels in an image by 1 on the x-axis, your differentiation mechanism may simply tell you that all the pixels have changed, return a delta between their previous value and new value and be done. In some cases you may be lucky and a large number of pixels have remained the same value. However, a much better differentiation mechanism would realize that everything has moved by one pixel on the x-axis and instead return to you that it detected a x+=1 transform, which compresses the transformation a lot more than by the simple pixel by pixel difference. In the case of the brain, one benefit it has is that it can correlate multiple input channels to make sense of what is happening and better compress the information. In the previous case, the eyes may perceive that all the signals are now different at each receptor. The brain however also receives information from the ears, telling it that the head moved by a certain amount, which most likely explains the transform that was applied to the eyes input.

In the brain we make use of the fact that the sensory inputs are different modes. Each is compressed somewhat independently from the others. As such, we would expect information that is similar in format to be compressed together (text with text, video with video, audio with audio, etc.) as it is likely to lead to the highest compression. Furthermore, being able to make use of the structure within the data will lead to better compression than simply compressing blindly a collection of inputs as an opaque blob.

I would expect such a compression system to make use of two types of compression: offline and online. Offline compression would occur during low periods of activity and would be able to offer higher levels of compression at the cost of less responsiveness during recompression. Online compression would occur when the system is actively being used and would rely mostly on fast encoding techniques to keep responsiveness high.

Online compression would rely on a lookup dictionary with a most used recently retention policy to compress blocks of data that have already been seen numerous times. The quality of the online compression highly depends on the assumption that what will be observed in the future is highly likely to be like what has been observed in the past. During the day, we spend most of our time in the same environment. As such, we experience and observe the same things for an extended amount of time. Being able to determine what is similar and what is different is what will lead to the highest amount of compression.

Offline compression would rely on the ability to make the most efficient use of the compute and memory available as this process would be time-constrained. It might be possible that the online and offline systems share information such that the online compressor can let the offline compressor know regions of data that might be ripe for recompression. In the case that both systems do not communicate, the offline system would likely benefit from knowing which regions have already been compressed to the fullest so that it spends most of its time processing data that was recently added. When it is done with this step, it can then attempt to increase the compression efficiency of all the data stored. Here again it should be able to make use of the differencing approach given that days will likely be highly similar. As such, we would expect the amount of space necessary to store a day to decrease drastically as more and more days of data are observed, possibly to the point where new days of data can be expressed as segments of previous days entirely.

Are passive or active agents more intelligent?

A passive agent is an agent that simply does its thing but does not interact with the environment.

An active agent is an agent that actively interact with the environment.

Given those two definitions, we expect the active agent to appear more intelligent because it behaves according to its environment and interacts with it. A passive agent may however also behave according to its environment, it just doesn't try to alter it.

Is an agent that never says anything necessarily dumb? Such agent could be hiding all the information of the world within itself, and could potentially solve any problem thrown at him, but it simply does not offer such answers because it doesn't interact with the world. The relationship between the agent and the world is one-sided, from the environment (on)to the agent. From the outside, the agent looks like an inanimate object that doesn't know anything nor can it do anything. But if you are able to peek inside, you can observe the most complex processes occurring. I would suggest that such agent is highly intelligent.

In the stock market, we say that an investor is active if they regularly manage their portfolio, while a passive investor is one that manages their portfolio less frequently, depends on indexes instead of individual stocks and prefer to rely on the trend of stocks to make a profit. Active management is often compared to passive management as a benchmark, that is to say, you should not get involved in active management if your strategy cannot beat a simpler passive strategy. It is often the case that we see active investors as being foolish and more likely to lose money than passive investors.

Is there a general characteristic of simple programs that are able to learn complex behaviors, such as neural network or RL-based algorithms that can be implemented in less than 100-250 lines?

I don't know yet.

This question has come after thinking about DNA as being the code of human beings. DNA is also part of other animals, even viruses. Organisms use nucleotides to store the programs that are necessary to their existence. DNA is used to produce proteins within the body that accomplish various functions such as regulating our body, controlling our mood, our attention, our hunger, etc.

This code has evolved since we were non-biological. From a large amount of randomness (chemical elements), nucleotides were formed, which then somehow led to the formation of DNA itself after a likely long process. If through randomness we moved from a chaotic world to one with order and structure, and where a chain of DNA could finally emerge, it would be interesting to investigate the process in further details to determine whether it could give us clues regarding the process of creating a program that could evolve the same way DNA did.

Cellular automaton are also interesting to study in that aspect. By defining a small set of rules, it is possible to generate and observe complex behaviors.

One common behavior of cells is that they reproduce. As such, I would expect a program that can learn complex behaviors to have some reproductive function. Reproduction is considered as one of the traits of an entity being alive. My idea here is that exploring how we were able to massively populate the Earth may provide us with ideas on how a bit of code learned to lengthen itself, by the same process increasing the size of its host as well as the complexity and variety of cells that compose it.

How would you build an AI that could offer coaching for games like StarCraft or Dota/LoL?

I see coaching as similar to the loss function of a machine learning model. I also see coaching as trying to optimize a (program's) function by figuring out where the largest improvements can be made. In order to provide effective and useful feedback, a coach should focus on the areas where the player shows the most potential for improvement. In a game like Starcraft, that would mean first pointing out the macro level mistakes then the micro level mistakes.

To coach you need to have a model of which actions have an impact on the game and how much impact they have. A learning algorithm/model like AlphaZero generally exhibits two types of learning: learning from observation and learning by playing against itself.

The most common approach to coaching is by observing more successful players than themselves. The learners may watch better players while the better players are playing the game and commenting on their gameplay or the learners may watch someone reviewing a replay and providing their own analysis. Both of these cases can be seen as models (the players) trying to explain their internals (the logic behind their actions).

Coaching generally starts by trying to reproduce the recipe of someone else. You may not understand why they are doing certain things, but you do it yourself and you observe the results. As you practice repeating those same observations => actions, you try to reproduce as closely as possible what the better player would do.

In the first learning phase, the model simply observes what happens during gameplay. In competitive games such as MOBA/RTS, the only reward signal is the victory/loss at the end of a game. As human beings, we quickly learn that winning a fight/encounter is good and losing it is bad. Those give use intermediate reward signals that an AI agent may not be able to build right away since it is conceptually difficult to determine when an encounter begins and ends. The agent could however learn a simple metric such as the sum of the health of all units, where keeping this value high is generally a good thing.

The model will need at some point to be able to establish its own scoring system so it can give itself some intermediate rewards during a game. It will also need to learn how to segment a sequence of actions into repeatable action units such as constructing unit X, attacking player Y, defending zone Z. As such, it may deem that constructing unit X is worth 5 units of reward, attacking player Y is worth no reward and that defending zone Z is worth 30 points of reward. The value of rewards may vary based on numerous factors, such as how much time has elapsed since the beginning of the game, the known enemy army composition, existing vision, etc.

Having actions such as "attack coordinate X, Y" are a too low level. Your model will have to learn hierarchically complex actions such as "attack player X", "attack the gatherers of player X", "attack the weak gatherers of player X", etc. which will then translate down the hierarchy to "attack unit at coordinate X, Y".

An AI coach may look at hundreds or thousands of replays and observe the distribution of units allocation after 1, 2, 3, 5, 10, 15, 20 minutes (or every 5 seconds) into the game and their correlation to whether the player won or lost. It may look at the items purchased by the player in a MOBA game, their timing and their correlation to whether the player won or lost. For a human being to do similar thing would require a lot of time. Most would probably write scripts to automate the process of collecting those details instead of manually going through the replays one by one.

Playing against yourself is more complicated. A perfect recording of your actions may not prove difficult to beat. It may send units to the wrong location on the map, be caught off guard moving to a location while you positioned units in the middle of the path, it may react to an attack the "replay" opponent had sent to its base at one point in the game, etc. It is however a start, one example you can train against.

A lot of players who are invested in the game will do theorycrafting which is basically to use logic and reasoning in order to assess what to do in specific situations. They simulate various potential cases in their head and they devise plans to defeat them. While a human being may be able to devise a few dozen simulations over an hour, a computer may be able to generate hundreds of thousands. It may also be able to test them in a more accurate simulation environment. When game patches are released, it could rerun all the simulations it had generated to determine the impact of the patch on its existing strategies.

An AI coach can be provided the game rules, specifically, which units are weak/strong against other units, and look at the game while you are playing. If you attack your opponent and the AI observes a strong concentration of a specific type of unit, and it notices you do not have any of the units that counter this unit type, it may suggest that you start building those as soon as possible. It may also notice that your unit composition is weak against the unit composition of your enemy and suggest units to build to balance your army and to be better prepare for the next encounter.

We can see this act of theorycrafting as the equivalent of knowing, at a high level, the strategies and counter-strategies one can employ at an early point in the game, the same way you can learn the different opening moves in chess.

In the case of learning by playing against yourself, what we want the AI coach to provide us is an opponent that will challenge our current biggest weaknesses so we can address them. In many cases certain specialized strategies will be extremely strong against a specific type of strategy and we will want to know those cases so we can use those strategies when the time is right.

  • Determine your weaknesses/areas of improvement
  • Suggest potential approaches to solve recurrent problems we have
  • Suggest heuristics that may be easy to understand and follow as human beings
  • Simulate opponents that would exert your current weaknesses so you can practice against them
  • Collect various gameplay related statistics their associated success rate (number of units of type X after Y minutes, number of creeps killed after X minutes, items purchase order, build order, etc.)

  • If you were in an environment where you had access to very few replays, how would you learn the most out of those available?