Blame - docs/testing/expectation_files.md - chromium/src.git

blob: 9c22ef97b0036ab19933daf20f779ac29f0d0502 [file] [log] [blame] [view]

Brian Sheedy	3e2d85a	2024-01-08 20:51:50	[diff] [blame]	1	# Expectation Files
				2
				3	A number of test suites in Chromium use expectation files to handle test
				4	failures in order to have more granular control compared to the usual approach
				5	of entirely disabling failing tests. This documentation goes into the general
				6	usage of expecation files, while suite-specific details are handled in other
				7	files.
				8
				9	[TOC]
				10
				11	Currently, the test suites that use expectation files can be broadly categorized
				12	as Blink tests and GPU tests. Blink-specific documentation can be found
				13	[here][blink_expectation_doc], while GPU-specific documentation can be found
				14	[here][gpu_expectation_doc].
				15
				16	[blink_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/testing/web_test_expectations.md
				17	[gpu_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/gpu/gpu_expectation_files.md
				18
				19	## Design
				20
				21	The full design for the format can be found [here][chromium_test_list_format] if
				22	the overview in this documentation is not sufficient.
				23
				24	[chromium_test_list_format]: http://bit.ly/chromium-test-list-format
				25
				26	## Code
				27
				28	The parser implementation used by Chromium can be found [here][typ_parser]. This
				29	handles the parsing of the text files into Python objects usable by Chromium's
				30	test harnesses.
				31
				32	[typ_parser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py
				33
				34	## Syntax
				35
				36	An expectation file can be broadly broken up into two sections: the header and
				37	test expectations.
				38
				39	### Header
				40
				41	The header consists of specially formatted comments that define what tags and
				42	expected results are usable in expectations later in the file. All header
				43	content must be before any expectation content. Failure to do so will result in
				44	the parser raising errors. An example header is:
				45
				46	```
				47	# tags: [ linux ubuntu jammy
				48	# mac mac10 mac11 mac12 mac13
				49	# win win7 win10 ]
				50	# tags: [ release debug ]
				51	# results: [ Failure Skip Slow ]
				52	````
				53
				54	Specifically, the header consists of one or more tag sets and exactly one
				55	expected result set.
				56
				57	#### Tag Sets
				58
				59	Each tag set begins with a `# tags:` comment followed by a space-separated list
				60	of tags between `[ ]`. Order does not matter to the parser, and tags are
				61	case-insensitive. Tag sets can span multiple lines as long as each line starts
				62	with `#` and all tags are within the brackets.
				63
				64	Each tag set contains all the tags that can be used in expectations for a
				65	particular aspect of a test configuration. In the example header, the first tag
				66	set contains values for operating systems, while the second tag set contains
				67	values for browser build type. Grouping tags together into different sets
				68	instead of having a monolithic set with all possible tag values is necessary
				69	in order to handle conflicting expectation detection (explained later in
				70	[the conflict section](#Conflicts)).
				71
				72	One important note about tag sets is that unless a test harness is implementing
				73	custom conflict detection logic, all tags within a set should be mutually
				74	exclusive, i.e. only one tag from each tag set should be produced when running a
				75	test. Failure to do so can result in conflict detection false negatives, the
				76	specifics of which are explained in [the conflict section](#Conflicts).
				77
				78	#### Expected Result Set
				79
				80	The expected result set begins with a `# results:` comment followed by a
				81	space-separated list of expected results between `[ ]`. Order does not matter to
				82	the parser, but expected results are case sensitive. Additionally, only values
				83	[known to the parser][typ_known_results] can be used. The expected results can
				84	span multiple lines as long as each line starts with `#` and all values are
				85	within the brackets.
				86
				87	The expected result set contains all the expected results that can be used in
				88	expectations. The specifics of how each expected result affects test behavior
				89	can differ slightly between test suites, but generally do the following:
				90
				91	* Pass - The default expected result for all tests. Let the test run, and expect
				92	it to run without issue.
				93	* Failure - Let the test run, but treat failures as a pass.
				94	* Crash - Let the test run, but treat test failures due to crashes as a pass.
				95	* Timeout - Let the test run, but treat test failures due to timeouts as a pass.
				96	* Skip - Do not run the test.
				97	* RetryOnFailure - Re-enable automatic retries of a test if a suite has them
				98	disabled by default.
				99	* Slow - Indicate that the test is expected to take longer than normal, usually
				100	as a signal to increase timeouts.
				101
				102	[typ_known_results]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py;l=40
				103
				104	### Expectations
				105
				106	After the header, the rest of the file consists of test expectations which
				107	specify what non-standard test behavior is expected on specific test machine
				108	configurations. An expectation is a single line in the following format:
				109
				110	```
				111	bug_identifier [ tags ] test_name [ expected_results ]
				112	```
				113
				114	As an example, the following would be an expectation specifying that the
				115	`foo.html` test is expected to fail on Windows machines with Debug browsers:
				116
				117	```
				118	crbug.com/1234 [ win debug ] foo.html [ Failure ]
				119	```
				120
				121	The bug identifier and tags are both optional and can be omitted. Not specifying
				122	any tags means that the expectation applies to the test regardless of where it
				123	is run. When omitting tags, the brackets are also omitted. Additionally,
				124	multiple bug identifiers are allowed as long as they are space-separated. The
				125	parser looks for certain prefixes, e.g. `crbug.com/` to determine what is
				126	considered a bug. This allows the parser to properly disambiguate one or more
				127	bug identifiers from the test name in the event that an expectation does not
				128	have any tags.
				129
				130	Multiple expected results are allowed and are space-separated like tags. As an
				131	example, `[ Failure Crash ]` would specify that the test is expected to either
				132	fail or crash.
				133
				134	Additionally, the test name is allowed to have up to one wildcard at the very
				135	end to match any tests that begin with the specified name. As an example, the
				136	following would be an expectation specifying that any test starting with `foo`
				137	is expected to fail on Windows machines with Debug browsers.
				138
				139	```
				140	crbug.com/1234 [ win debug ] foo* [ Failure ]
				141	```
				142
Brian Sheedy	a330c64c	2025-03-21 00:17:18	[diff] [blame]	143	The restriction of only having one wildcard at the end can be lifted via the
				144	`full_wildcard_support` annotation found under
				145	[the annotations section](#Annotations).
				146
Brian Sheedy	3e2d85a	2024-01-08 20:51:50	[diff] [blame]	147	#### Priority
				148
				149	When using wildcards, it is possible for multiple expectations to apply to a
				150	test at runtime. For example, given the following:
				151
				152	```
				153	[ win ] foo* [ Slow ]
				154	[ win ] foo/bar* [ Failure ]
				155	[ win ] foo/bar/specific_test.html [ Skip ]
				156	```
				157
				158	`foo/bar/specific_test.html` running on a Windows machine would have three
				159	applicable expectations. In these cases, the most specific (i.e. the
				160	longest-named) expectation will be used.
				161
				162	The order in which expectations are defined is not considered when determining
				163	priority.
				164
				165	## Conflicts
				166
				167	When more than one expectation exists for a test, it is possible that there will
				168	be a conflict where a test run on a particular test machine could have more than
				169	one expectation apply to it. Whether these conflicts are treated as errors and
				170	how conflicts get resolved are both configurable options via annotations found
				171	under [the annotations section](#Annotations).
				172
				173	### Detection
				174
				175	Two expectations for the same test conflict with each other if they do not use
				176	different tags from at least one shared tag set. As an example, look at the
				177	following expectations:
				178
				179	```
				180	# Group 1
				181	[ win ] foo.html [ Failure ]
				182	[ mac ] foo.html [ Skip ]
				183
				184	# Group 2
				185	[ win ] bar.html [ Failure ]
				186	[ debug ] bar.html [ Skip ]
				187
				188	# Group 3
				189	[ linux ] foo.html [ Failure ]
				190	[ linux debug ] foo.html [ Skip ]
				191	```
				192
				193	Group 1 would not result in a conflict since both `win` and `mac` are from the
				194	same tag set and are different values. Thus, the parser would be able to
				195	determine that at most one expectation will apply when running a test.
				196
				197	Group 2 would result in a conflict since there are no tag sets that both
				198	expectations use, and thus there could be a test configuration that causes both
				199	expectations to apply. In this case, a configuration that produces both the
				200	`win` and `debug` tags is possible. This conflict could be resolved by adding
				201	a browser type tag to the first expectation or an operating system tag to the
				202	second expectation.
				203
				204	Group 3 would result in a conflict since there is a tag set that both
				205	expectations use (operating system), but the exact tag is the same. Thus, a
				206	test running on Linux with a Debug browser would have both expectations apply.
				207	This conflict could be resolved by changing the first expectation to use
				208	`[ linux release ]`.
				209
				210	It is important to be aware of the following when it comes to conflicts:
				211
				212	1. The expectation file has no knowledge of which tag combinations are actually
				213	possible in the real world, only what is theoretically possible given the
				214	defined tag sets. A real world example of this would be the use of the Metal
				215	API, which is Mac-specific. While a human would be able to reason that
				216	`[ metal ]` implies `[ mac metal ]`, the latter is necessary for the
				217	conflict detection to work properly.
				218	2. If tag sets include non-mutually-exclusive values and the test suite has not
				219	implemented custom conflict checking logic, there can be false negatives when
				220	checking for conflicts. For example, if `win` and `win10` were both in the OS
				221	tag set, `[ win ] foo.html [ Failure ]` and `[ win10 ] foo.html [ Skip ]`
				222	would not be found to conflict even though they can in the real world due to
				223	`win10` being a more specific version of `win`.
				224	3. Expectations that use wildcards can result in conflict detection false
				225	negatives. Conflict detection is only run on expectations with identical test
				226	names. Thus, while `[ win ] foo* [ Failure ]` and `[ debug ] foo* [ Skip ]`
				227	would be found to conflict since the test name is `foo*` in both cases,
				228	`[ win ] f* [ Failure ]` and `[ debug ] foo* [ Skip ]` would not be found to
				229	conflict.
				230
				231	### Annotations
				232
				233	By default, conflicts result in a parsing error. However, expectation files
				234	support several annotations to affect how conflicts are handled.
				235
				236	`# conflicts_allowed: true` causes conflicts to no longer cause parsing errors.
				237	Instead, conflicts will be handled gracefully depending on the conflict
				238	resolution setting, the default of which is to take the union of expected
				239	results.
				240
				241	`# conflict_resolution: ` specifies how conflicts will be handled when they are
				242	allowed. Supported values are `union` (the default) and `override`. `union`
				243	causes all conflicted expectations to be merged together. For example, the
				244	following:
				245
				246	```
				247	[ win ] foo.html [ Failure ]
				248	[ debug ] foo.html [ Slow ]
				249	```
				250
				251	would be equivalent to `[ win debug ] foo.html [ Failure Slow ]` when running on
				252	a Windows machine with a Debug browser.
				253
				254	`override` uses whatever expectation was parsed last. Using the above example,
				255	A Windows machine with a Debug browser would end up using the
				256	`[ debug ] foo.html [ Slow ]` expectation.
Brian Sheedy	a330c64c	2025-03-21 00:17:18	[diff] [blame]	257
				258	Additionally, by default, only a single wildcard is allowed at the end of a test
				259	name. This behavior is preferred for tests whose names are hierarchical in
				260	nature, e.g. for filepaths. However, if this behavior is not suitable for a
				261	test suite, full wildcard support can be enabled via the
				262	`# full_wildcard_support: true` annotation. This allows an arbitrary number of
				263	wildcards to be used anywhere in the test name. While this is more flexible and
				264	can make sense for certain test suites, it does make it harder for humans to
				265	determine which expectations apply to which tests.