Computer Science Changes    |    Index    |    Search
::: PrisonersDilemma :::

 
  ACM . Agent . PrisonersDilemma # Edit # Attach # Diffs # Printable # More :::

Main
• Register
• Users
• Site Map

Curriculum

Agent Web
• projects

Algorithms

Web Learn

Image Kit

ProgTech

Publishing

Prisoner's Dilemma

Introduction:

This article introduces the basics of game, known as Prisoner’s dilemma. We consider simple agent-based program “Prisoners”, which simulates the game(to download program click reference at the end of article). Besides we describe different player-strategies and compare them using “Prisoners”.

Prisoner's Dilemma description:

The Prisoner's Dilemma is a game-theory pro blem for two players. For eample, Alice and Bob are held by police in different rooms. They are being held on suspicion of a joint crime. They are told that:
  • If one of them confess to the crime and the other does not the confessor will be freed and the other guy will get 3 years.
  • If both confess to the crime then each will jailed for two years.
  • If neither confess then they will each be jailed for one year.
We refer to confessing as defecting (from the gang) and not confessing as cooperating (with the gang). Payoffs are the utility derived from jail sentence, where utility = 5 – years in jail.

Alice Defect Alice cooperate
Bob Defect 3,3 5,2
Bob Cooperate 2,5 4,4

The Pridoner's Dilemma is widely studied because it applies to many real-word situations like nuclear disarmament and getting your kids to clean up their room. Many people object to the fact that defection is the equilibrium and claim that real people do not act like that.

But more interesting for us Iterated Prisoner’s Dilemma. The Iterated Prisoner’s Dilemma is a version of the game in which the choice is repeated over and over again and in which the players can remember their previous moves, allowing them to evolve a cooperative strategy.

  • We let two players play the same game some number of times.
  • Say there are 200 games. During the last game I gain nothing by cooperating so I should defect. So will he. Therefore, for game 199 I should also defect since I know he will be defecting for the next game. So will he. And so on…
  • For any finite number of games defection is still the equilibrium strategy.
  • However, practically we find that if there is a long time to go that people are more willing to cooperate.
  • A cooperative equilibrium can also be proven if instead of a fixed known number of interactions there is always a small probability that this will be the last interaction.
  • Robert Axelrod performed the now famous experiments on an iterated version of this problem.
  • He sent out an email asking people to submit fortran programs that will play the PD against each other for 200 rounds. The winner was the one that accumulated more points.
  • Some of the strategies sent in were:
    • ALL-D – always defect
    • RANDOM – pick randomly.
    • TIT-FOR-TAT – cooperate in the first round, then do whatever the other player did last time. The strategy is similar to the one nuclear powers adopted during the Cold War, each promising not to use its weaponry so long as other side refrained from doing so as well.
    • TESTER-defect first. If other player defects then play tit-for-tat. If he cooperated then cooperate for two rounds then defect.
    • JOSS- play tit-for-tat but 10% of the time defect instead of cooperating.
  • Which one won? Tit-for-tat won. It still made less then ALL-D when playing against it but, overall, it won more than any other strategy. Its was successful because it had the opportunity to play against other programs that were inclined to cooperate.

We try to repeat Axelrod’s experiment, but don’t ask people. We ask agents, which can realize one of the strategies described above.

Program Description

About: "Prisoner" is written on JADE. Program version 1.0.

Start program: To start program you can write in command line: "java jade.Boot [RefName]:Referee". But you must have on your computer j2sdk1.4.1or later version, and JADE. Also you must correctly set variables %PATH%, %SETPATH%. Otherwise, you probably cannot use this program.

Program realization: “Prisoner” is agent-based program. There are agents of 2 types. The first is Referee. The second is Player.

Referee:

  • Getting player's list
  • Load players
  • Construct games schedule
  • Game organization
  • Getting results
  • Accumulation statistic
  • Output results

Player:

  • Getting game situation
  • Make decision according to his strategy
  • Getting game results

So, at the beginning we start Referee. Then Referee get player’s list and strategy of every player from file. Next step is creating agent-players. After that he send game situation to agent-players, get their answers and save results. At the end Referee output results in the table and rank players according their points. Players can use 4 strategies: ALL-D, RANDOM, TIT-FOR-TET, TESTER.

Program results: We organize game with different players. There are 50 rounds. And we start game five times. What did we get? The results are in table below:

1 2 3 4 5
ALL-D 494 508 492 502 504
RANDOM 474 470 479 483 480
TIT-FOR-TET 507 507 509 509 511
TESTER 506 506 511 514 513

So, we see that RANDOM is the worst strategy. TIT-FOR_TET and TESTER in this case get practically the same result.

At the end of the article we note that some players can combine to each other and it allows find more difficult and more useful strategies. In October 2004 the Southampton group, whose primary research areas is software agents, said its strategy involved a series of moves allowing players to recognize each other and act cooperatively. Teams could submit multiple strategies, or players, and the Southampton team submitted 60 programs. These were all slight variations on a theme and were designed to execute a known series of five to 10 moves by which they could recognize each other. Once two Southampton players recognized each other, they were designed to immediately assume "master and slave" roles — one would sacrifice itself so the other could win repeatedly. If the program recognized that another player was not a Southampton entry, it would immediately defect to act as a spoiler for the non-Southampton player. The result is that Southampton had the top three performers — but also a load of utter failures at the bottom of the table who sacrificed themselves for the good of the team.

-- AndrewKistenev? - 18 Dec 2004

Attachment sort Action Size Date Who Comment
Prisoner.zip manage 19.4 K 18 Dec 2004 - 14:02 AndrewKistenev?  

Rambler's Top100 Rambler's Top100


# Edit menu  

Topic revision r1.1 - 18 Dec 2004 - 13:32 GMT - AndrewKistenev? Copyright © 2003-2017 by the contributing authors.