This article introduces the basics of game, known as Prisoner’s dilemma. We consider simple agent-based program “Prisoners”, which simulates the game(to download program click reference at the end of article). Besides we describe different player-strategies and compare them using “Prisoners”.
Prisoner's Dilemma description:
The Prisoner's Dilemma is a game-theory pro blem for two players.
For eample, Alice and Bob are held by police in different rooms. They are being held on suspicion of a joint crime. They are told that:
We refer to confessing as defecting (from the gang) and not confessing as cooperating (with the gang). Payoffs are the utility derived from jail sentence, where utility = 5 – years in jail.
The Pridoner's Dilemma is widely studied because it applies to many real-word situations like nuclear disarmament and getting your kids to clean up their room. Many people object to the fact that defection is the equilibrium and claim that real people do not act like that.
But more interesting for us Iterated Prisoner’s Dilemma.
The Iterated Prisoner’s Dilemma is a version of the game in which the choice is repeated over and over again and in which the players can remember their previous moves, allowing them to evolve a cooperative strategy.
- If one of them confess to the crime and the other does not the confessor will be freed and the other guy will get 3 years.
- If both confess to the crime then each will jailed for two years.
- If neither confess then they will each be jailed for one year.
We try to repeat Axelrod’s experiment, but don’t ask people. We ask agents, which can realize one of the strategies described above.
- We let two players play the same game some number of times.
- Say there are 200 games. During the last game I gain nothing by cooperating so I should defect. So will he. Therefore, for game 199 I should also defect since I know he will be defecting for the next game. So will he. And so on…
- For any finite number of games defection is still the equilibrium strategy.
- However, practically we find that if there is a long time to go that people are more
willing to cooperate.
- A cooperative equilibrium can also be proven if instead of a fixed known number of
interactions there is always a small probability that this will be the last interaction.
- Robert Axelrod performed the now famous experiments on an iterated version of this
- He sent out an email asking people to submit fortran programs that will play the PD against each other for 200 rounds. The winner was the one that accumulated more points.
- Some of the strategies sent in were:
- ALL-D – always defect
- RANDOM – pick randomly.
- TIT-FOR-TAT – cooperate in the first round, then do whatever the other player did last time. The strategy is similar to the one nuclear powers adopted during the Cold War, each promising not to use its weaponry so long as other side refrained
from doing so as well.
- TESTER-defect first. If other player defects then play tit-for-tat. If he cooperated then cooperate for two rounds then defect.
- JOSS- play tit-for-tat but 10% of the time defect instead of cooperating.
- Which one won? Tit-for-tat won.
It still made less then ALL-D when playing against it but, overall, it won more than any other strategy. Its was successful because it had the opportunity to play against other
programs that were inclined to cooperate.
About: "Prisoner" is written on JADE. Program version 1.0.
Start program: To start program you can write in command line:
"java jade.Boot [RefName]:Referee". But you must have on your computer j2sdk1.4.1or later version, and JADE. Also you must correctly set variables %PATH%, %SETPATH%. Otherwise, you probably cannot use this program.
“Prisoner” is agent-based program. There are agents of 2 types. The first is Referee. The second is Player.
- Getting player's list
- Load players
- Construct games schedule
- Game organization
- Getting results
- Accumulation statistic
- Output results
So, at the beginning we start Referee. Then Referee get player’s list and strategy of every player from file. Next step is creating agent-players. After that he send game situation to agent-players, get their answers and save results. At the end Referee output results in the table and rank players according their points.
Players can use 4 strategies: ALL-D, RANDOM, TIT-FOR-TET, TESTER.
We organize game with different players. There are 50 rounds. And we start game five times. What did we get? The results are in table below:
- Getting game situation
- Make decision according to his strategy
- Getting game results
So, we see that RANDOM is the worst strategy. TIT-FOR_TET and TESTER in this case get practically the same result.
At the end of the article we note that some players can combine to each other and it allows find more difficult and more useful strategies.
In October 2004 the Southampton group, whose primary research areas is software agents, said its strategy involved a series of moves allowing players to recognize each other and act cooperatively.
Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume "master and slave" roles one would sacrifice itself so the other could win repeatedly.
If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the non-Southampton player. The result is that Southampton had the top three performers but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team.
-- AndrewKistenev? - 18 Dec 2004
| || 1 || 2 || 3 || 4 || 5 |
| ALL-D || 494 || 508 || 492 || 502 || 504 |
| RANDOM || 474 || 470 || 479 || 483 || 480 |
| TIT-FOR-TET || 507 || 507 || 509 || 509 || 511 |
| TESTER || 506 || 506 || 511 || 514 || 513 |