Generalized DP algorithms

While generalized minimax search is an efficient procedure for successively generating optimal price tables of greater and greater depth, it suffers from some theoretical inconsistencies. First, the generation of an n-step optimal price table for a given seller is based on the assumption that the opponent will be using an (n-1)-step policy. This is problematic if one believes that the opponent is using a search algorithm of equal depth. Furthermore, there is no way that both sellers could be correct in assuming that the opponent is using lesser-depth search. Second, there is the problem that successive steps in the predicted trajectory are based on progressively shallower depth search, until finally at the end of the trajectory, the predicted pricing is myopic. Such predicted trajectories are unlikely to match actual trajectories, as it seems reasonable to assume that agents will use a given fixed depth search every time they are called upon to set a price.

As a way of overcoming these potential problems, we suggest that generalizing the formalism of Dynamic Programming from single-agent Markov decision problems to two-player arbitrary-sum games is a promising area of research. Such an approach could lead to the development of algorithms that yield optimal policies for both players that are fully accurate and self-consistent. Furthermore, DP-like approaches can be extended to large state spaces in which it is not feasible to use lookup tables for all possible states in the state space. This has been shown by numerous works in the field of Reinforcement Learning, in which the lookup tables of DP are replaced by compact state representations and nonlinear function approximators, and the full sweeps through the state space of DP are replaced by following actual trajectories generated by agent policies.

Within the context of the price-quality and information-filtering models mentioned previously, we have investigated a number of heuristic generalizations of DP, which attempt to do simultaneous, self-consistent optimization of the optimal price curves for both sellers. So far we have only examined the case of two-step lookahead, i.e., each seller optimizes two-step utility and also models the other seller as optimizing two-step utility. Within this context, we have examined a simple alterating policy iteration approach in which we start with initial curves tex2html_wrap_inline462 and tex2html_wrap_inline464 and alternately optimize tex2html_wrap_inline304 and tex2html_wrap_inline306 assuming that the other one is held fixed. Empirically, this approach was not found to converge to a unique, self-consistent solution. We have also looked at an asynchronous stochastic version of the above algorithm, in which one element at a time is chosen at random from each optimal price table, and only that element is optimized. This approach was not found to converge, either.

However, we have found empirical convergence with an approach based on an incremental version of DP, in which the prices are moved towards the computed optimal prices by small amounts. (This somewhat resembles a gradient-style minimization of the amount of inconsistency between the two optimal price curves at each time step.) When the asynchronous, stochastic algorithm mentioned above was combined with incremental price adjustments, it quickly converged in the price-quality model to the exact same curves (apart from very small random fluctuations of less than 1%) as obtained by 3 or more lookahead steps in the generalized minimax procedure. A plot of the DP-generated optimal price curves is shown below in figure 5. We note that visually this plot appears identical to the lower right-hand plot in figure 2, and leads to fixed-point dynamics rather than cyclic price wars.

   figure138
Figure 5: Two-step optimal prices for seller 1 vs. seller 2 calculated by stochastic, asynchronous, incremental DP in price-quality model.

The same stochastic, asynchronous, incremental DP algorithm was tested in the information-filtering model, and when the number of consumers in the simulation was large (10,000 to 250,000) it gave similarly good approximate convergence within small fluctuations to stationary optimal price curves. However, with only 1000 consumers in the model, the fluctuations were larger and it was not entirely clear whether the algorithm was converging to a stationary solution. We conjecture that this behavior may be due to differing degrees of smoothness in the utility landscapes. The analytically-defined landscapes in the price-quality model are very smooth, and are also smooth in the filtering model as the number of consumers becomes infinitely large. However, the landscapes become much more ragged with a small finite-size consumer population, and such raggedness may limit the ability of the DP-like approach to converge to a stationary solution.

An example of results using the DP-like approach in the information-filtering model is shown in figure 6. This figure plots the two-step optimal price curves for seller 1 and seller 2 for the same numerically generated utilities as were used in figures 3 and 4. We do find good approximate convergence to stationary curves in this model. The small random fluctuations are somewhat larger than those seen in figure 5; most likely this is due to a coarser resolution in the optimal price tables (.005 vs. .002). In this particular model, we can see that the DP-generated optimal price curves still lead to a price war that actually has a slightly larger amplitude than in the myopic case shown previously in figure 3. Hence there are at least some cases in which the use of lookahead in the sellers' pricing strategy can actually exacerbate the price-war dynamics. Determining under what conditions price wars are amplified by lookahead, and under what conditions they are diminished or eliminated, is an important topic for further research.

   figure152
Figure 6: Two-step optimal prices for seller 1 vs. seller 2 calculated by stochastic, asynchronous, incremental DP in the same information-filtering model as shown in figures 3 and 4.




BACK TO INDEX PAGE | PREVIOUS SECTION | NEXT SECTION