Hao Zou

Reliable generation, masked diffusion language models, and long-horizon LLM agents.

I am a Columbia CS researcher working with Prof. Kathleen McKeown and Prof. Zhou Yu, and closely collaborating with Zachary Horvitz and Xiao Yu. My research asks how language models can revise, reason, and act reliably when evidence changes or tasks unfold over many steps.

My recent work connects two threads: (1) masked diffusion language models for controllable editing, faithful summarization, and inference-time reasoning; and (2) LLM agents for realistic computer-use workflows, environment scaling, and harness-based training. Starting Summer 2026, I am continuing in the Columbia CS Department as a Research Staff Associate on an IARPA-funded project on faithful summarization.

Previously, I worked with Prof. Jordan Boyd-Graber at the UMD CLIP Lab, and with Prof. Dongyeop Kang at the University of Minnesota NLP group. I received my B.S. in Computer Science from the University of Minnesota, Twin Cities, and my M.S. in Computer Science from Columbia University in May 2026.

Hao Zou profile photo

Research

My long-term goal is to build AI systems that remain faithful and useful under uncertainty: systems that can detect unsupported claims, localize what should be changed, allocate computation to uncertain parts of a reasoning trace, and act safely in long-horizon environments. I approach this through a bridge between diffusion-based generation/editing and agent evaluation/training.

In diffusion language modeling, I study how non-autoregressive models can use their ability to infill and revise arbitrary spans for faithful summarization, reasoning templates, and efficient inference. In agents, I study how realistic environments, verifiers, and harnesses can make training and evaluation match the way agents are actually deployed.

News

Selected Papers & Projects

Detect Remask Repair placeholder Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts
Hao Zou, Zachary Horvitz, Chandhru Karthick, Zhou Yu, Kathleen McKeown
Under review, EMNLP 2026
arXiv / bibtex

A diffusion-based framework for localized faithfulness repair: detect unsupported spans in an existing summary, remask them, and repair only the unreliable content instead of fully regenerating the summary.

OSWorld 2.0 placeholder OSWorld 2.0: Benchmarking Computer Use Agents on Long-Horizon Real-World Tasks
Large collaborative project
Under review, NeurIPS 2026
project page / arXiv coming soon

A benchmark of 102 realistic computer-use tasks and 30 self-hosted mock websites for evaluating long-horizon GUI agents, including dynamic workflows, tutorial following, simulated-user interaction, multimodal editing, and safety-oriented checks.

Harness-based agent training placeholder Scalable Environments and Training Infrastructure for Harness-Based Agents
Ongoing collaboration with Xiao Yu
In preparation, 2026

Building agent-curated task environments and training infrastructure that connects real inference harnesses to SFT/RL pipelines, reducing the mismatch between how agents are trained and how they are deployed.

No Compute Left Behind placeholder No Compute Left Behind: Rethinking Reasoning and Sampling with Masked Diffusion Models
Zachary Horvitz, Raghav Singhal, Hao Zou, Carles Domingo-Enrich, Zhou Yu, Rajesh Ranganath, Kathleen McKeown
Under review, COLM 2026
arXiv / bibtex

Studies how masked diffusion language models use inference-time compute, including reasoning-as-infilling, answer-conditioned posterior sampling, uncertainty-aware early exit, and reasoning validation.

Natural Question QA placeholder You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions
Tasnim Kabir, Yoo Yeon Sung, Saptarashmi Bandyopadhyay, Hao Zou, Abhranil Chandra, Jordan Boyd-Graber
EMNLP 2024
paper / bibtex

Transforms trivia-style questions into more natural information-seeking questions to improve retrieval alignment and cross-domain question answering.

Diffusion Models in NLP placeholder A Survey of Diffusion Models in Natural Language Processing
Hao Zou, Zae Myung Kim, Dongyeop Kang
arXiv, 2023
arXiv / bibtex

Surveys diffusion formulations for NLP, including non-autoregressive generation, text editing, token-level control, robustness, and scaling directions for diffusion language models.

Causal debiasing placeholder Debiasing Language Models for In-Context Learning Using a Causal Inference-Inspired Method
Hao Zou, Karin de Langis, Dongyeop Kang, Yohan Jo
Manuscript, 2023
bibtex

Uses intervention-inspired estimates of the causal effect of input text to reduce spurious prompt-label bias in in-context learning.

Experience

2025 - Present NLP Lab, Columbia University
Graduate researcher / Staff Associate I. Working on faithful summarization, masked diffusion language models, inference-time algorithms, and LLM agents with Kathleen McKeown, Zhou Yu, Zachary Horvitz, and Xiao Yu.
2024 - 2025 IBM
AI Algorithms Intern. Worked on LLM inference efficiency, KV-cache quantization, and attention benchmarking.
2024 - 2025 Duke University
Graduate research with Enmao Diao on diffusion/flow-matching training objectives and benchmarking.
2021 - 2023 University of Minnesota NLP Group
Undergraduate research with Dongyeop Kang on diffusion models for NLP, causal analysis, and controllable generation.
2021 - 2022 UMD CLIP Lab
Undergraduate research with Jordan Boyd-Graber on question rewriting and retrieval for open-domain QA.
2022 - 2023 Sony Research
Research intern on controllable generation and reinforcement learning from human preferences.