Tuesday, January 31, 2017

GTO and Exploitative Play

GTO and Exploitative play

Today I’m going to expand on a dichotomy that has been largely subtextual but would be more useful made explicit.

GTO
In poker there is a concept known as GTO, or Game Theory Optimal. In a GTO model every decision is optimal because it cannot be exploited by an opposing player even if he has full knowledge of your gameplan because the risk-reward is entirely and mathematically accounted for.

The best illustration of GTO is the Prisoner’s Dilemma. The Prisoner’s Dilemma is easily solvable. A payoff matrix reveals that without knowledge of the opponent’s decision you should always betray them. This is the best decision precisely because it has the highest reward attached to an outcome that cannot be made worse (punished) by the opponent. Interestingly, humans have a cognitive bias toward cooperative behavior even though cooperation is in this case a mathematically losing strategy. This demonstrates the importance of actually making the matrix to determine the GTO in even this simple scenario.

Now, what if we turn to Rock Paper Scissors?
GTO for RPS is to throw rock 1/3 times, paper 1/3 times, and scissors 1/3 times in a random order. This has the highest reward attached to an outcome that cannot be punished. But if your opponent deviates from GTO then this is a little bit problematic. Let’s say that you face an opponent that abandons randomization and always throws rock. GTO demands that you ignore him and continue to randomize your throws. In the case that there is some knowledge of the opponent, GTO is in practice suboptimal depending on how you define optimal. This is of course what makes the a theoretical GTO so interesting in poker, a game in which results are measured in profit over time. Game Theory Optimal carries the highest profit with the least amount or risk but this does not not necessarily equal Most Profitable— in fact GTO only breaks even. Thus, if maximum profit is the goal then GTO is suboptimal in any case in which the opponent is not also playing GTO! By refusing to open yourself up to exploitation, you cannot exploit an opponent.

Exploitative Play
Exploitative play is, in a nutshell, recognizing risk in an opponent’s gameplan and compensating for it. Let’s say I recognize that my opponent throws rock every hand. Even though rock-only is exploitable, GTO cannot exploit it. In order to exploit rock-only I have to abandon GTO and adopt a more paper-heavy strategy. Once I do, provided that my opponent does not deviate from rock-only, Paper-heavy has an increased profit that is exactly as profitable as it proportionally favors paper. HOWEVER, in abandoning GTO to exploit my opponent’s strategy, I have adhered to a new strategy that is equally exploitable. It is entirely possible for my opponent to counter-adjust and switch to Scissors-only. That is the risk attached to abandoning GTO in search of profit. Your opponent may punish you at least as severely as you sought to punish them.

In summary:
GTO is maximizing profit by eliminating risk.
Exploitative play is further maximizing profit while inviting risk.


So what does this mean for Melee?


Potentially a lot. As I’ve repeatedly discussed, mixups are closely related to RPS. There is an inherent GTO. Adhering to or abandoning GTO for a more exploitative strategy is a judgement call that we always make deliberately, intuitively, or out of ignorance. It might be appropriate, it might not be. It’s a matter for individual assessment.

Here is what we should remember:

* GTO goes even unless you gain an unfair advantage, at which point GTO will always win over time precisely because it eliminates risk. It's specifically designed not to lose.

* Similarly, an optimized GTO model is more profitable than an underdeveloped GTO model.
If you are playing with rock (1pt), paper (1pt), and scissors (1pt) but your opponent is playing with rock (1pt), paper (1pt), and nail-clippers(.25 pts) then you win over the long term even without any exploitative play because you're using better options.

* Exploitative play requires that you understand your opponent’s strategy. You might consider it Attacking your Opponent’s Understanding. Maybe your opponent’s brain honestly believes that rock-only is optimal. Or maybe he’s just leading with rock to try and bait a paper switch. In a fighting game in which prepared reactions can trump a mixup scenario altogether there's a huge difference. In order to be successful, exploitative play requires 1) information and 2) acumen, otherwise it is not strategy, it’s just blind hope and high-risk variance.



Further reading:
https://arxiv.org/pdf/1404.5199v1.pdf and
http://poker.cs.ualberta.ca/publications/IJCAI03.pdf

Tuesday, January 10, 2017

Faster Improvement

Faster Improvement

Before you start, you have to accept a few base assumptions.
* In a game, “skill” is just your ability to execute winning tactics/strategy.
* In this context, “improvement” is synonymous with “learning skills.”
* Skill acquisition is a function of the accumulation of focus-intensive work, not directly of time.

Remember the Four Stages of Competence? It works nicely with these.
It, integrated as a cyclical model found in Improved Drastic Improvement, is the best methodology known to me. At its heart, this model simply asks that you:

* Identify a problem.
* Identify the solution.
* Practice the solution until it’s in your unconscious.
* Repeat.

Over time these correct solutions accumulate to form your unconscious gameplan. Each skill as learned individually measurably contributes to your results, ideally building on one another to create a juggernaut. With enough of the curated skills worked to the unconscious level, winning is inevitable.

But what does this process actually look like?

As of now, I think it’s best manifested in the following manner.

1) Record a (netplay/tournament/seriouslies) set.
2) Immediately break and identify the ONE most important lapse in your execution/strategy.
3) Identify the best tournament-viable solution to that lapse.
4) Practice executing this solution until it’s locked into your unconscious.
5) Repeat.

I’m sure that sounds a bit repetitive at this point, but in the past year or two the increase in popularity and infrastructure within the community has created a thriving netplay scene. Because you can record the equivalent of a tournament set vs a worthy human opponent at will it is now PRACTICAL to use the above model as your primary way to play, not just to color or elaborate on infinite friendlies. A like-minded friend that is able and willing to go through this process with you is obviously still a godlike asset, but it is a boon that this very rare kind of individual is no longer a requirement to go through it.

Before closing, I would like to make a few points.
* Working on exactly one issue at a time allows you to focus harder on it, increasing the efficacy of your practice.
* This issue could be technical/strategical/mental/health/attentional/etc. Anything that is an issue is an issue.
* Some issues might be completely solved in a matter of minutes, hours, or weeks. It might take some studying with debug mode, google, another smasher, or even a book to find the best possible solution for you. Who knows?
* The most effective practice with the smallest time commitment is two or three half-hour sessions spread throughout the day. Remember, intensity of focus is more important than time spent.
* What you choose to prioritize and what you honestly believe is the best solution will culminate into your personal style. It’s silly to worry about that because it will happen naturally.

Thursday, January 5, 2017

Godpuff Reaction Techchase

I've frequently wondered if Puff has a valid reaction techchase similar to sheik/falcon's.
Shortly after researching spikestun rest I posited one that I've since ruled flawed and deleted but now I'm revisiting the idea.

There are a few recurring situations (most frequently upsmash at ~20-30%, pound, and some AC bairs on spacies) where puff can position herself at a tech position as it occurs. This opens up the possibility to reaction techchase. However due to puff's poor speed it is not obvious how she can cover all four tech options. The following sequence can cover all four within a practiced reaction time.

1) dash toward the tech location
2) if MISSED TECH, pivot rest
3) SH
4) if TECH IN PLACE, rest
5) if TECH ROLL OUT/IN, immediately drift toward then use pound's boost to catch it.

This techchase is difficult and unintuitive but feasible. Because it is so infrequent it violates the 80/20 rule and I do not advocate practicing it. But it is pretty funny/cool.