Figure 2: Plot of optimal price curves for seller 1 vs.
seller 2 in the price-quality model at various lookahead depths.
The numbers in parentheses in the lower left-hand corner
of each plot indicate search depths (
,
)
used by seller 1 and seller 2 respectively.
Recall that in two-player zero-sum games such as chess,
minimax search to a fixed depth d works as follows: for a given
starting position, one constructs a game tree of all possible
moves, replies, counter-replies, etc., out to some fixed depth d.
One then applies a heuristic evaluation function to the leaves
of the tree, and one then does a minimax back-up of the values
of the leaf nodes. This is done progressively starting from the
bottom of the tree and working upwards, at each step applying
either a min operation or a max operation depending on which side
is moving. Another way of viewing this is that at depth 1 away
from the leaves, the moves are selected based on a 1-ply search;
at depth 2 away from the leaves, the selection is based on a
2-ply search; and so on, until finally at the root of the tree,
the move that is selected is based on a d-ply search.
Our price-setting algorithm is analogous to this and works by
building up a succession of optimal price lookup tables for
successively greater depths. We begin by constructing a depth 1
optimal price table: for every possible starting price of the other
seller, this table represents the best possible price that the
seller can set to maximize immediate utility at the current
time step. This corresponds to the myopic optimal
pricing algorithm mentioned in the previous section,
which was studied in detail in (Kephart et al., 1998). Since the
state space is discretized and small, and since the consumer
behavior is assumed to be a deterministic function of the prices
of the two sellers, the optimal prices can be computed once
for every state in the state space and stored in a table.
The next step is to construct a depth 2 optimal price table,
which maximizes total utility summed over 2 time steps, assuming
that the opponent will reply with a depth 1 optimal price.
Next we construct a depth 3 table, which maximizes total utility
summed over 3 time steps, assuming that the opponent will
reply with a depth 2 price, and that the player will reply to
that with a depth 1 price. We can continue in this way to
generate optimal price tables of arbitrary depth. The optimal
price table at depth n maximizes utility summed over n time
steps, assuming that the opponent first responds with a
(n - 1)-step optimal price, the player then responds to that
with a (n - 2)-step optimal price, etc.. Optimal price
tables can also be generated assuming discounting of future
utilities governed by a discount parameter
lying
between 0 and 1. In this case the total discounted utility
is maximized, where the predicted utility at m time steps
in the future is weighted by a multiplicative factor of
.
Of course, we don't
expect the n-step trajectory predicted by this procedure to
match the actual trajectory at every step. The later steps of
the predicted trajectory in particular are based on smaller
lookahead and therefore ought to be less accurate in matching
agents using deep lookahead. However, the hope is that the
first predicted step ought to give something reasonable. This
is what has been found in domains such as chess -- Deep Blue's
predicted principal variations of 20-30 moves are generally
not matched in exact detail, but its top level move decisions
are nonetheless extremely accurate and strong.
Our basic finding is that when agents use any amount of lookahead
at all, it can be sufficient to substantially curtail or eliminate
the price war dynamics that result from myopic optimal pricing.
An example of this, taken from the price-quality model,
is shown in figure 2. In this figure,
we have computed the optimal prices at lookahead depths 1, 2 and 3
for seller 1 (
)
and seller 2 (
)
and have systemically plotted each curve for seller 1 against
each curve for seller 2. The lookahead depth is indicated in
parentheses in the lower left-hand corner of each plot.
In each of these plots, the system dynamics for the state
can be obtained by alternately applying the two optimal price
curves. This can be done by a simple iterative graphical
construction, in which for any given starting point, one first
holds
constant and moves horizontally to the
curve, and then one
holds
constant and moves vertically to the
curve. For example, one can see
in the upper-left plot that when both sellers behave myopically,
i.e. lookahead depths (1,1),
the iterative graphical construction leads to a never-ending
cyclic price war, whose trajectory is indicated by the dashed line.
In contrast, when both sellers use either 2-step or 3-step
lookahead, we can see that the price war is eliminated, and that
the system dynamics instead iterates to a fixed point. The case
when one of the sellers is myopic and the other seller uses
greater lookahead is interesting. This is shown in the topmost
three plots, where seller 1 is myopic, and in the leftmost three
plots, where seller 2 is myopic. In the latter case, we see
that when seller 2 is myopic, the price war cycle still exists
but has a diminished amplitude. In contrast, when seller 1 is
myopic, we see that if seller 2 uses 2-step lookahead, i.e. seller 2
has a perfect model of seller 1, the price war is eliminated.
However, if seller 2 uses 3-step lookahead, the price war re-emerges,
but at a diminished amplitude. The re-emergence
of the price war comes about because, even though seller 2 is
looking ahead further, it is now using an incorrect model of seller 1.
The locations of the fixed points shown in figure 2 are
also of some interest, and can be simply explained. We note that in
every case where a fixed point is obtained, the price for seller 1
is given by
, and that for the middle column of
figure 2
(corresponding to seller 2 using 2-step lookahead) the price for
seller 2 is
, whereas in the rightmost column (where
seller 2 uses 3-step lookahead), seller 2's price is instead
. This can be simply understood in terms of seller 2's
modeling of seller 1's behavior. When seller 2 uses 2-step
lookahead, it models seller 1 as being myopic. The choice of
represents the highest possible price at which
seller 1 will not respond myopically by undercutting. This can
be seen by inspection of the profit function given in equation 1;
we can see that the profit in a single time step for seller 1
is 0.07 regardless of whether
or
.
On the other hand, when seller 2 uses 3-step lookahead, it is
modeling seller 1 as using 2-step lookahead. The resulting
price for seller 2,
, corresponds to the point of
indifference to undercutting for seller 1 based on a two-step
optimization. If seller 1 does not undercut, he receives a
profit of 0.07 for the next two time steps, for a total profit
of 0.14. If seller 1 undercuts, he receives a profit of 0.12
on the first time step, followed on the next time step by a
profit only 0.02 (after seller 2 retaliates with a further
undercut), again resulting in a total return over two time
steps of exactly 0.14. Thus the calculated fixed-point price
for seller 2 has shifted to a higher value with greater
lookahead. We note that these fixed points are at a lower
price for seller 2 than the equilibrium point calculated
in (Sairamesh and Kephart, 1998) for the one-shot
(non-iterated) game. For the specific parameter settings
used here, the calculated equilibrium values are
and
. (This point corresponds
to the open circle in the upper-left plot of figure 2.
In the iterated game, this point is not a stable equilibrium
when the players are myopic, because seller 1 can obtain
a higher short-term reward by undercutting seller 2, and
will therefore initiate a price war.)
Figure 3: Plot of myopic optimal price curves for
seller 1 vs. seller 2 in a sample run of the information-filtering
model. The utilities for each seller were generated numerically
using a simulated population of 250,000 consumers. Repeated
iteration of the optimal price rules leads to a cyclic price war,
as indicated by the dashed trajectory.
Figure 4: Plot of optimal price curves for
seller 1 vs. seller 2, each using 3-step lookahead, based on
the same information-filtering model used in figure
3. These optimal price curves still lead to
price-war behavior, but with diminished amplitude.
It is also of interest to examine lookahead depths greater
than 3. One might hope that the optimal price curves
converge to a unique invariant pair in the limit of large
depth, and that the fixed point of the system would
approach a Nash equilibrium. While we have no general proof
that this will always happen, we have found empirically
that, in the price-quality model, the optimal price curves
for depths 4 and above turn out to be identical to
the depth 3 curves.
However, we did not find this to be the case
in the information-filtering model. Instead, it was found that
there is a set of optimal price curves that periodically repeat
with increasing depth, and that the period appears to be
somewhat arbitrary, and can vary depending on the exact values
of various cost parameters, etc., in the simulation.
These findings were obtained regardless of whether
or not discounting is used in the optimized utilities.
An example of results obtained in the information-filtering
model is illustrated in figures 3 and 4.
These results were obtained for a specific
set of seller utilities that were numerically generated using
a model population of 250,000 consumers. Figure 3
shows the myopic optimal price curves, while figure 4
shows the optimal price curves using 3-step lookahead.
We have generally found in the information-filtering model that,
for lookaheads depths of 2 or more, application of the calculated
optimal curves resulted in a system dynamics in which the
price-war cycles were either eliminated completely, or
reduced in amplitude. This can be seen in figure 4,
where the amplitude of the price-war cycle is reduced compared
to that of figure 3.
In summary, we have shown that in two different models, the
use of pricing algorithms based on n-step lookahead either
eliminated or diminished the impact of price-war behavior
which is obtained when both sellers behave myopically.
Important issues for ongoing and future research include:
(1) establishing under what conditions the n-step
lookahead procedure converges in the limit of large n
to a stationary set of optimal price curves; (2) understanding
when the implied dynamics for a given pair of optimal
price curves iterates to a fixed point, and when it yields
limit-cycle behavior; (3) in those cases in which a fixed
point is obtained, whether it corresponds to a Nash equilibrium
of the system.