Format Preference Is Noise¶

Across the archived delimiter-only format test, XML, Markdown, and plain text produced indistinguishable aggregate pass rates.

Key Numbers¶

Format	Aggregate pass rate
XML	0.80
Markdown	0.80
Plain text	0.83

Data¶

Derived summary: data/public/findings.json

The published summary is derived from a local/private archived run, not from a checked-in raw experiment dump.

Public Runner¶

cd harness
uv run python validate.py e7 --model-name qwen2.5-coder:1.5b --k 3

The public e7 command runs a comparable delimiter-only format sweep for the same finding class. Use it to test your own model or archive, not as a byte-for-byte replay of the archived local run behind the published chart.

Sample Counts¶

1 archived E7 run
4 local models
3 delimiter formats
96 scored calls

Uncertainty Notes¶

This zero-signal result applies to delimiter-only coding prompts in one archived run. It does not claim that XML, Markdown, and plain text remain equivalent once prompts include multi-block context, examples, or tool metadata.

Limitations¶

This finding tests delimiter-only formatting on coding tasks. It does not test multi-block prompts where XML or another container format separates examples, context, and instructions.