Here are the links to my previous posts on blackjack. I used a modified version of my old blackjack simulator (discussed in detail in the linked posts). If you want to know more about how I coded it up or just need a refresher on basic blackjack strategy, you can read these first:
這是我以前在二十一點上的帖子的鏈接。 我使用了舊的二十一點模擬器的修改版(在鏈接文章中詳細討論了)。 如果您想了解更多有關我如何編碼的信息，或者只需要對基本的二十一點策略進行復習，則可以先閱讀以下內容：
One thing that perplexed me last year when I trained a neural net to play blackjack was why my neural net didn’t do better when I provided card counting information to it. Doesn’t card counting confer a significant advantage to the player?
For those unfamiliar with the concept, counting cards is a way of tracking the number of high cards (10, Jack, Queen, King, Ace) left in the deck. The idea is that when there are a lot of face cards left in the deck, it’s advantageous to players (because we are more likely to be dealt high cards and the dealer is also more likely to bust). So if our card counting tips us off that the remaining cards are more likely to be high, we should take advantage by making larger than normal bets.
對于不熟悉此概念的人，計數卡是一種跟蹤剩余在甲板上的高級卡(10張，杰克，皇后，國王，王牌)數量的方法。 這個想法是，當甲板上有很多面子牌時，這對玩家是有利的(因為我們更有可能獲得高額牌，而發牌者也更有可能破產)。 因此，如果我們的紙牌計數使我們發現剩余紙牌的可能性更高，那么我們應該通過比正常情況下更大的賭注加以利用。
When I tried to integrate card counting into my blackjack code last year, it failed to add anything because I did it incorrectly. The model that I built previously took in information like the cards in the player’s hand and the card that the dealer is showing and decided whether to keep hitting or stay.
I attempted to augment my model by providing it with a detailed count of all the cards that had been seen thus far as part of its training data (the count is refreshed every time that the deck is reshuffled). I figured that the count would allow the neural net to predict when the dealer was likely to bust — and this additional information would improve its hit/stay decision making.
Surprisingly, this was not what happened. The performance actually got worse! The win probability dropped from 42% without card counting data to 38% with it. The tie probability also drops from 9% to 5% hinting at a more aggressive playing style.
令人驚訝的是，這不是發生的事情。 性能實際上變差了！ 獲勝幾率從無紙牌計數數據的42％下降到有數據的38％。 平局概率也從9％下降到5％，暗示了更具侵略性的比賽風格。
有時機器學錯了東西 (Sometimes Machines Learn The Wrong Thing)
As you can see below, when given card count data, the neural net is able to do ever so slightly better in terms of probability of a win or tie when the dealer is showing a low card but significantly underperforms its simpler (no card count data) predecessor when the dealer is showing a 10 or an ace (11 denotes ace). The probabilities in the following plots were generated by simulating approximately 250,000 blackjack games with trained neural nets deciding whether to hit or stay.
如下所示，如果給定卡數數據，則神經網絡在發牌者顯示低卡數但顯著落后于簡單數據(無卡數數據)的情況下，在獲勝或平局的可能性方面要稍好一些。 )的前身，即莊家展示的是10或ace(11表示ace)。 以下情節中的概率是通過使用訓練有素的神經網絡模擬大約25萬個二十一點游戲來決定擊中還是停留而生成的。
If we isolate out just the games where the dealer is showing a 10, face card, or an ace we can see the cause of the underperformance. Our supposedly smarter neural net (the one with the card count data) is doing some pretty dumb things. Looking at the blue bars in the plot below, we can see that it chooses to hit frequently even when it already holds a high hand total of 17 or more (it even hits occasionally when it’s already at 20). Contrast that to our simpler model — it correctly knows to stay for hands totaling 17 or more.
如果我們僅將莊家出示10張，面子卡或ace的游戲隔離開，我們就可以看出性能不佳的原因。 我們所謂的更智能的神經網絡(帶有卡計數數據的神經網絡)正在做一些非常愚蠢的事情。 查看下面圖表中的藍色條，我們可以看到它選擇了頻繁擊球，即使它已經擁有高手牌總數達到或超過17(甚至已經達到20時也偶爾擊中)。 與我們更簡單的模型相比，它正確地知道要停留總計17張或更多的牌。
It’s not all bad. Earlier we saw that adding card count data improves the winning probability when the dealer is showing a low card (6 or less). Intuitively, the reason for this is that when the dealer is showing a low card, the dealer will definitely have to hit until his hand totals at least 17 (or he or she busts). So having knowledge of when the dealer is more likely to bust should prove somewhat helpful in these situations. And as we can see, the behavior of our card counting neural net is different. It is much more likely to stay for hands totaling 13, 14, and 15 — and based on the higher win probabilities, it seems to be on average making the right decision in these cases.
并不全是壞事。 較早前我們看到，當發牌者顯示一張低牌(6張或更少)時，增加卡數數據可以提高中獎幾率。 直觀地講，這樣做的原因是，當發牌者出示一張低牌時，發牌者肯定要打直到他的手牌總數至少為17(否則他或她破產)。 因此，了解經銷商何時更可能破產的事實在這些情況下會有所幫助。 正如我們所看到的，我們的卡片計數神經網絡的行為是不同的。 在總計13、14和15的牌局中，它更有可能留下來，并且基于更高的獲勝概率，在這些情況下，似乎平均可以做出正確的決定。
Still, the overall decline in win probability is disappointing. Choosing to hit when our hand is already at 19 or 20 is a big facepalm. Based on our examination of how the neural net’s behavior changed once we added in card count data, it looks like while the additional data embeds some signal, it also caused our model to become overfit and make some foolish decisions. So let’s attempt to fix this.
盡管如此，獲勝幾率的整體下降還是令人失望的。 當我們的手已經在19或20時選擇擊打是個大難題。 根據我們對神經網絡行為的觀察，一旦我們添加了卡計數數據，神經網絡的行為就會發生變化，看起來當附加數據中嵌入了一些信號時，這也導致我們的模型變得過擬合并做出了一些愚蠢的決策。 因此，讓我們嘗試解決此問題。
試圖修正我們的模型 (Trying To Fix Our Model)
When I went back and reviewed how counting cards actually worked, I realized I was thinking about it wrong. The additional features offer limited assistance in terms of helping our model know whether to hit or stay. In fact, as we saw, the added features were more likely to confuse the model than help it. But where the card count data might be able to help us is in deciding how much to bet. If we know that there are a lot of high cards left in the deck, we should bet more because the probability of blackjack (two card hand consisting of an ace and a 10 or a face card) is higher. Conversely, if we know that the remaining cards consist primarily of low cards, we should just make the minimum bet.
當我回過頭來回顧計數卡的實際工作原理時，我意識到自己在想錯。 附加功能在幫助我們的模型知道撞擊還是停留方面提供的幫助有限。 實際上，正如我們所看到的那樣，添加的功能比幫助模型更容易混淆模型。 但是，卡數數據可能在哪些方面可以幫助我們確定賭多少。 如果我們知道甲板上還剩下很多高牌，我們應該下更多的賭注，因為二十一點(兩張牌由一張ace和一張10或一張面牌組成)的可能性更高。 相反，如果我們知道其余的牌主要由低牌組成，那么我們應該進行最低**。
In order to do this, instead of shoving everything into a single model, we can split up the responsibilities. We could have our old model that already worked pretty well handle the hit or stay decision and build a new model which uses the card count data to decide how much to bet. It would look something like what’s pictured below. A note on the card count features — I keep a count of how many of each card type I’ve seen thus far and when the dealer reshuffles the deck (or stack) I reset all the counts to 0.
為了做到這一點，我們不必將所有事情都推到一個單一的模型中，而是可以分擔責任。 我們可以讓已經運行良好的舊模型處理命中或停留決定，并建立一個新模型，該模型使用卡計數數據來決定**多少。 它看起來像下面的圖片。 關于紙牌數量功能的注釋-我記下了到目前為止我所見過的每種紙牌類型的數量，并且當發牌人重新洗牌(或堆疊)時，我將所有計數重置為0。
Of course all this depends on whether the card count data truly does help us predict blackjacks. Let’s check. We can use a ROC curve to check the efficacy of neural net 2 (if you need a refresher on what ROC curves are, check out the following link).
當然，這取決于卡計數數據是否確實有助于我們預測二十一點。 讓我們檢查。 我們可以使用ROC曲線來檢查神經網絡2的功效(如果需要對ROC曲線進行復習，請查看以下鏈接)。
Judging by the higher area under its ROC curve (blue line), neural net 2 does seem to add value (relative to deciding randomly):
So let’s use it to size our bets and see if our performance improves. I simulated 100 blackjack decks (like a long night at the casino) 100 times and compared the average results and distributions between the following two betting strategies:
- Making the minimum bet of $10 every time. 每次最少**$ 10。
- When the count is advantageous, bet more based on how confident we are (formula below). Otherwise bet $10. 如果計數是有利的，則根據我們的信心(下式)進行更多**。 否則**$ 10。
Here’s how I decided to size the bets (full disclosure — I didn’t think super hard about this decision rule):
Using the training data, calculate the mean and standard deviation of the probabilities (of a blackjack) generated by neural net 2.Z-score neural net 2’s output prediction:
Z_prob = (prediction - mean)/std_devif Z_prob > 0:
bet = 10*(1 + Z_prob)
bet = 10
Basically if the predicted probability of getting a blackjack is higher than average, I bet more — and the excess amount that I bet is determined by how much higher than average the predicted probability is.
Cool, let’s see if we can actually improve our gambling results. The following plot compares gambling performance with and without neural net 2. While it looks like we still aren’t able to consistently make money, the dynamic bet sizing does improve our performance. The mean ending bankroll with dynamic bet sizing is 12% higher than without it.
很酷，讓我們看看我們是否可以真正提高**效果。 下圖比較了使用和不使用神經網絡2時的**性能。雖然看起來我們仍然無法持續賺錢，但動態**規模的確提高了我們的表現。 動態**大小的平均結束資金比沒有**的情況高12％。
Finally, let’s compare the distributions of our ending bankrolls to make sure our dynamic (and larger) bets is not causing excessive volatility in our betting results. The shape of the distributions seem reasonably similar with the dynamic bet sizing one (blue) slightly shifted to the right, which is good.
I was hoping for a bigger boost, but oh well. It looks like more tinkering is necessary, but at least it looks like we are onto something. It also goes to show that as cool and versatile as deep learning models can be, it still pays to be thoughtful about and to sanity check (as much as we can) the way our model works. Finally, winning when the odds are stacked against you is hard! Cheers!
我原本希望得到更大的推動，但是哦。 似乎有必要進行更多的修補，但至少看起來像我們在做些事情。 它還表明，盡管深度學習模型可以酷炫且用途廣泛，但仍然值得深思和對模型的工作方式進行健全性檢查(盡我們所能)。 最后，在賠率對您不利的情況下贏球很難！ 干杯!