NegotiateBench

What is NegotiateBench?

NegotiateBench evaluates the ability of large language models (LLMs) to design haggling negotiation algorithms and adapt them based on observed outcomes.

Why?

The goal is to identify which LLMs perform best in environments where no correct solution can be known in advance (ex: during training time).

Where

The current leaderboard is at https://mihaiii-negotiatebench.hf.space/

How does it work?

Negotiation algorithms generated by LLMs compete against each other. On each session, every LLM is prompted with its current algorithm and samples from the negotiation results, and asked to improve its algorithm. This process repeats indefinitely, allowing continuous adaptation and improvement.

In order to ensure fairness, the negotiation data is generated randomly, and each LLM competes against all others on the same data, including with roles swapped.

All updates are automated, and the history of LLM-generated negotiation code is tracked in Git (see example).

How do the LLMs improve their solutions?

I provide the LLMs with their solution from the previous session, the leaderboard from the latest session, and a set of negotiation samples against the top-performing model from that session. I explicitly encourage them to rethink the entire algorithm (not just apply small tweaks) if their position on the leaderboard is too low. For the samples, I also reveal how the opponent valued the object in that specific negotiation round - information that is not available during the actual negotiation.

Problem statement

Note: The problem statement for LLMs to solve was originally designed by Alex Feldgendler for Hola's JS Challenge Summer 2018 (I am not affiliated with them). Minor adaptations have been made, mainly to use Python instead of JavaScript.

Here is the original problem statement.

Below you can find the problem statement as presented to the models to solve:

Let's say there are a book, two hats, and three balls. You and a partner have to decide how to split these objects between the two of you. To you, the book is worth $4, a ball $2, and the hats are worthless. The partner might value the same objects differently; you don't know their valuation, but you know that the total worth of all objects is the same as for you, in this case, $10.

You and your partner take turns making offers to each other about how to split the goods. On each turn, you can either accept the partner's offer (except on the very first turn) or make a counter-offer. Negotiations are limited to a specified number of rounds. To determine the total number of offers that can be made, multiply the number of rounds by 2. If an agreement is reached, each of you receives the value of your share according to your own valuation. If there is no agreement after the last turn (i.e., the last word is a counter-offer rather than acceptance), neither partner receives anything. The same happens if either partner walks away from the negotiations.

Here is how the negotiations might go:

You: I want the book and two balls; you get a ball and both hats.

Other: I don't accept. I want all the balls and a hat; you get the book and a hat.

You: I don't accept. I want the book and a ball; you get two balls and both hats.

Other: I accept. Unknown to you, the partner's valuation was: $2 for a ball, $2 for a hat, and nothing for a book. Your agreement brought $6 to you and $8 to your partner.

In general, there are two or more types of objects, and a positive number of objects of each type. Each type of object is worth some nonnegative integer to each partner. The total value of all objects for each partner is the same, although the particular valuations are generally different. A proposed split must distribute all objects between partners; individual objects cannot be divided.

Your goal is to write a script that tries to maximize the value of your deal.

Solutions

A solution is a Python file with no dependencies. It must have an Agent class with a constructor and a single method:

```python

class Agent: def init(self, me: int, counts: list[int], values: list[int], max_rounds: int): pass
def offer(self, o: list[int] | None) -> list[int] | None:
    pass
```

An instance of this class is created once for a negotiation session. The constructor receives the following arguments:

me is 0 if your turn is first, and 1 if your turn is second.

counts is a list of integers, describing how many of each type of object there is. This list is between 2 and 10 elements long.

values is a list of integers the same length as counts, describing how much every object is worth to you.

max_rounds is the limit on the number of rounds in the negotiations; a round is two turns, one by each partner.

The offer method is called each time it's your turn. Its argument o is a list of integers the same size as counts, describing how many of each type of object the partner offers to you. If your turn is first and this is the first round, o is None.

The offer method should return None if you accept the offer (except when o is None). Otherwise, it should return a list of integers the same size as counts, describing how many of each type of object you want for yourself. Note that both the argument and the return value of offer describe the partition from your perspective—that is, what you get.

There is a timeout of 5 seconds per turn. If the code times out, throws an exception, or returns an invalid value, it is regarded as walking away from the negotiations, and neither partner receives anything.

The module is not allowed to learn, i.e., to keep any data persistent between sessions.

See below for a very simple example of a negotiation script. It will accept any offer that allows it to receive at least half of the total value; otherwise, it simply demands all items with nonzero value.

Code example

```python

class Agent: def init(self, me: int, counts: list[int], values: list[int], max_rounds: int): self.counts = counts self.values = values self.rounds = max_rounds self.total = 0 for i in range(len(counts)): self.total += counts[i] * values[i]

def offer(self, o: list[int] | None) -> list[int] | None: self.rounds -= 1 if o: sum_val = 0 for i in range(len(o)): sum_val += self.values[i] * o[i] if sum_val >= self.total / 2: return None o = self.counts.copy() for i in range(len(o)): if self.values[i] == 0: o[i] = 0 return o

```

Credits / Thanks / Acknowledgments

The haggling negotiation problem statement is taken from Hola's JS Challenge Summer 2018 (I am not affiliated with them). See other solutions written by humans. As far as I know, the problem statement was designed by Alex Feldgendler.
Thanks to Jeremy Howard and the Answer.AI team for FastHTML, which enabled rapid web app development.
OpenRouter for model access.
Huggingface for providing a platform to display the benchmark.