PolyGrad
The paper “World Models via Policy-Guided Trajectory Diffusion” introduces novel world modelling approach “Policy-Guided Trajectory Diffusion” (PolyGrad) that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model.
Drawback of Autoregressive World Models
Prediction error inevitably compounds as the trajectory length grows, as they interleave predicting the next state with sampling the next action from policy.
Examples of On-policy and Off-Policy RL algorithms?
SARSA and Q-Learning respectively. On-Policy vs Off-Policy RL
Model
- TBA
Method
- TBA
Techniques
- TBA
Generalisability:
- TBA
Limitations:
- TBA
Extended Research Direction:
- TBA