PolyGrad

The paper “World Models via Policy-Guided Trajectory Diffusion” introduces novel world modelling approach “Policy-Guided Trajectory Diffusion” (PolyGrad) that is not autoregressive and generates entire on-policy trajectories in a single pass through a diffusion model.

Drawback of Autoregressive World Models

Prediction error inevitably compounds as the trajectory length grows, as they interleave predicting the next state with sampling the next action from policy.

Examples of On-policy and Off-Policy RL algorithms?

SARSA and Q-Learning respectively. On-Policy vs Off-Policy RL

siv X siv

Explorer

PolyGrad

PolyGrad

Model

Method

Techniques

Generalisability:

Limitations:

Extended Research Direction:

Graph View

Backlinks

siv X siv

Explorer

PolyGrad

PolyGrad §

Model §

Method §

Techniques §

Generalisability: §

Limitations: §

Extended Research Direction: §

Graph View

Backlinks

PolyGrad

Model

Method

Techniques

Generalisability:

Limitations:

Extended Research Direction: