Skip to content

Format Preference Is Noise

Across the archived delimiter-only format test, XML, Markdown, and plain text produced indistinguishable aggregate pass rates.

Key Numbers

Format Aggregate pass rate
XML 0.80
Markdown 0.80
Plain text 0.83

Data

  • Derived summary: data/public/findings.json

The published summary is derived from a local/private archived run, not from a checked-in raw experiment dump.

Public Runner

cd harness
uv run python validate.py e7 --model-name qwen2.5-coder:1.5b --k 3

The public e7 command runs a comparable delimiter-only format sweep for the same finding class. Use it to test your own model or archive, not as a byte-for-byte replay of the archived local run behind the published chart.

Sample Counts

  • 1 archived E7 run
  • 4 local models
  • 3 delimiter formats
  • 96 scored calls

Uncertainty Notes

This zero-signal result applies to delimiter-only coding prompts in one archived run. It does not claim that XML, Markdown, and plain text remain equivalent once prompts include multi-block context, examples, or tool metadata.

Limitations

This finding tests delimiter-only formatting on coding tasks. It does not test multi-block prompts where XML or another container format separates examples, context, and instructions.