Dreamliner

Loss of control in flight (LOC-I) kills more people in commercial aviation than any other single cause. The pattern is depressingly consistent: an aircraft enters a stall or unusual attitude, the crew has seconds to do exactly the right thing, and the right thing is counterintuitive. When the wing stops producing lift, you have to push the yoke forward and point the aircraft at the ground to get airflow back over the wing. Air France 447 and Colgan Air 3407 are the cases everyone cites for what happens when this goes wrong.

I spent a week building a reinforcement learning agent that recovers a Boeing 737-300 from stalls and upsets. The agent uses DreamerV3, a world-model algorithm: instead of learning a policy by trial and error against the reward signal directly, it learns a compressed internal model of how the aircraft responds to controls and then plans inside that model. The training environment is a high-fidelity flight dynamics simulator flying the 737-300; the agent only sees state vectors and reward, not visuals.

The thing that genuinely surprised me was watching it figure out the forward-stick recovery on its own. Pushing the nose down at high pitch with low airspeed feels wrong. The ground is the thing you’re trying to avoid. But it’s the only input that brings the wing back to flying, which is why pilots get drilled on it in simulators: the reflex has to override panic. The agent never had a teacher. It found the same recovery anyway, just from the dynamics. That was the moment the research stopped feeling academic.

Existing RL work on stall recovery has used model-free algorithms (PPO and SAC). As far as I could find in the literature, this is the first time anyone has tried a world model on the problem. The strongest checkpoints land at 97-100% success across the stall and upset scenarios with zero crashes. That kind of consistency is uncommon in RL; most agents you train will eventually do something embarrassing if you run enough rollouts.

The interesting question is what this generalizes to. The framework isn’t tied to this particular simulator. You could swap in proprietary Boeing physics, or apply it to other tight-window aviation problems where the crew is suddenly on their own: autopilot disconnects, envelope exceedances, sudden control surface failures. Anywhere the recovery is mechanical and the time budget is small.

The limit is the one any pilot also faces. The agent acts on sensor inputs, and if those sensors lie, it gets fooled the same way a person would. Frozen pitot tubes triggered Air France 447. They’d trigger this too. Aviation-grade reliability is a long way off, but as evidence that a world-model agent can learn real aircraft dynamics from rollouts alone, this is a good result.