PolyGrad

The paper “World Models via Policy-Guided Trajectory Diffusion” introduces novel world modelling approach “Policy-Guided Trajectory Diffusion” (PolyGrad) that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model.

Drawback of Autoregressive World Models

Prediction error inevitably compounds as the trajectory length grows, as they interleave predicting the next state with sampling the next action from policy.

Examples of On-policy and Off-Policy RL algorithms?

SARSA and Q-Learning respectively. On-Policy vs Off-Policy RL

Model

  • TBA

Method

  • TBA

Techniques

  • TBA

Generalisability:

  • TBA

Limitations:

  • TBA

Extended Research Direction:

  • TBA