Prisoner's Dilemma
Introduction:
This article introduces the basics of game, known as Prisoner’s dilemma. We consider simple agentbased program “Prisoners”, which simulates the game(to download program click reference at the end of article). Besides we describe different playerstrategies and compare them using “Prisoners”.
Prisoner's Dilemma description:
The Prisoner's Dilemma is a gametheory pro blem for two players.
For eample, Alice and Bob are held by police in different rooms. They are being held on suspicion of a joint crime. They are told that:
 If one of them confess to the crime and the other does not the confessor will be freed and the other guy will get 3 years.
 If both confess to the crime then each will jailed for two years.
 If neither confess then they will each be jailed for one year.
We refer to confessing as defecting (from the gang) and not confessing as cooperating (with the gang). Payoffs are the utility derived from jail sentence, where utility = 5 – years in jail.
The Pridoner's Dilemma is widely studied because it applies to many realword situations like nuclear disarmament and getting your kids to clean up their room. Many people object to the fact that defection is the equilibrium and claim that real people do not act like that.
But more interesting for us Iterated Prisoner’s Dilemma.
The Iterated Prisoner’s Dilemma is a version of the game in which the choice is repeated over and over again and in which the players can remember their previous moves, allowing them to evolve a cooperative strategy.
 We let two players play the same game some number of times.
 Say there are 200 games. During the last game I gain nothing by cooperating so I should defect. So will he. Therefore, for game 199 I should also defect since I know he will be defecting for the next game. So will he. And so on…
 For any finite number of games defection is still the equilibrium strategy.
 However, practically we find that if there is a long time to go that people are more
willing to cooperate.
 A cooperative equilibrium can also be proven if instead of a fixed known number of
interactions there is always a small probability that this will be the last interaction.
 Robert Axelrod performed the now famous experiments on an iterated version of this
problem.
 He sent out an email asking people to submit fortran programs that will play the PD against each other for 200 rounds. The winner was the one that accumulated more points.
 Some of the strategies sent in were:
 ALLD – always defect
 RANDOM – pick randomly.
 TITFORTAT – cooperate in the first round, then do whatever the other player did last time. The strategy is similar to the one nuclear powers adopted during the Cold War, each promising not to use its weaponry so long as other side refrained
from doing so as well.
 TESTERdefect first. If other player defects then play titfortat. If he cooperated then cooperate for two rounds then defect.
 JOSS play titfortat but 10% of the time defect instead of cooperating.
 Which one won? Titfortat won.
It still made less then ALLD when playing against it but, overall, it won more than any other strategy. Its was successful because it had the opportunity to play against other
programs that were inclined to cooperate.
We try to repeat Axelrod’s experiment, but don’t ask people. We ask agents, which can realize one of the strategies described above.
Program Description
About: "Prisoner" is written on JADE. Program version 1.0.
Start program: To start program you can write in command line:
"java jade.Boot [RefName]:Referee". But you must have on your computer j2sdk1.4.1or later version, and JADE. Also you must correctly set variables %PATH%, %SETPATH%. Otherwise, you probably cannot use this program.
Program realization:
“Prisoner” is agentbased program. There are agents of 2 types. The first is Referee. The second is Player.
Referee:
 Getting player's list
 Load players
 Construct games schedule
 Game organization
 Getting results
 Accumulation statistic
 Output results
Player:
 Getting game situation
 Make decision according to his strategy
 Getting game results
So, at the beginning we start Referee. Then Referee get player’s list and strategy of every player from file. Next step is creating agentplayers. After that he send game situation to agentplayers, get their answers and save results. At the end Referee output results in the table and rank players according their points.
Players can use 4 strategies: ALLD, RANDOM, TITFORTET, TESTER.
Program results:
We organize game with different players. There are 50 rounds. And we start game five times. What did we get? The results are in table below:
 1  2  3  4  5 
ALLD  494  508  492  502  504 
RANDOM  474  470  479  483  480 
TITFORTET  507  507  509  509  511 
TESTER  506  506  511  514  513 
So, we see that RANDOM is the worst strategy. TITFOR_TET and TESTER in this case get practically the same result.
At the end of the article we note that some players can combine to each other and it allows find more difficult and more useful strategies.
In October 2004 the Southampton group, whose primary research areas is software agents, said its strategy involved a series of moves allowing players to recognize each other and act cooperatively.
Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume "master and slave" roles — one would sacrifice itself so the other could win repeatedly.
If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the nonSouthampton player. The result is that Southampton had the top three performers — but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team.
 AndrewKistenev?  18 Dec 2004
