blob: 9c22ef97b0036ab19933daf20f779ac29f0d0502 [file] [log] [blame] [view]
Brian Sheedy3e2d85a2024-01-08 20:51:501# Expectation Files
2
3A number of test suites in Chromium use expectation files to handle test
4failures in order to have more granular control compared to the usual approach
5of entirely disabling failing tests. This documentation goes into the general
6usage of expecation files, while suite-specific details are handled in other
7files.
8
9[TOC]
10
11Currently, the test suites that use expectation files can be broadly categorized
12as Blink tests and GPU tests. Blink-specific documentation can be found
13[here][blink_expectation_doc], while GPU-specific documentation can be found
14[here][gpu_expectation_doc].
15
16[blink_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/testing/web_test_expectations.md
17[gpu_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/gpu/gpu_expectation_files.md
18
19## Design
20
21The full design for the format can be found [here][chromium_test_list_format] if
22the overview in this documentation is not sufficient.
23
24[chromium_test_list_format]: http://bit.ly/chromium-test-list-format
25
26## Code
27
28The parser implementation used by Chromium can be found [here][typ_parser]. This
29handles the parsing of the text files into Python objects usable by Chromium's
30test harnesses.
31
32[typ_parser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py
33
34## Syntax
35
36An expectation file can be broadly broken up into two sections: the header and
37test expectations.
38
39### Header
40
41The header consists of specially formatted comments that define what tags and
42expected results are usable in expectations later in the file. All header
43content must be before any expectation content. Failure to do so will result in
44the parser raising errors. An example header is:
45
46```
47# tags: [ linux ubuntu jammy
48# mac mac10 mac11 mac12 mac13
49# win win7 win10 ]
50# tags: [ release debug ]
51# results: [ Failure Skip Slow ]
52````
53
54Specifically, the header consists of one or more tag sets and exactly one
55expected result set.
56
57#### Tag Sets
58
59Each tag set begins with a `# tags:` comment followed by a space-separated list
60of tags between `[ ]`. Order does not matter to the parser, and tags are
61case-insensitive. Tag sets can span multiple lines as long as each line starts
62with `#` and all tags are within the brackets.
63
64Each tag set contains all the tags that can be used in expectations for a
65particular aspect of a test configuration. In the example header, the first tag
66set contains values for operating systems, while the second tag set contains
67values for browser build type. Grouping tags together into different sets
68instead of having a monolithic set with all possible tag values is necessary
69in order to handle conflicting expectation detection (explained later in
70[the conflict section](#Conflicts)).
71
72One important note about tag sets is that unless a test harness is implementing
73custom conflict detection logic, all tags within a set should be mutually
74exclusive, i.e. only one tag from each tag set should be produced when running a
75test. Failure to do so can result in conflict detection false negatives, the
76specifics of which are explained in [the conflict section](#Conflicts).
77
78#### Expected Result Set
79
80The expected result set begins with a `# results:` comment followed by a
81space-separated list of expected results between `[ ]`. Order does not matter to
82the parser, but expected results are case sensitive. Additionally, only values
83[known to the parser][typ_known_results] can be used. The expected results can
84span multiple lines as long as each line starts with `#` and all values are
85within the brackets.
86
87The expected result set contains all the expected results that can be used in
88expectations. The specifics of how each expected result affects test behavior
89can differ slightly between test suites, but generally do the following:
90
91* Pass - The default expected result for all tests. Let the test run, and expect
92 it to run without issue.
93* Failure - Let the test run, but treat failures as a pass.
94* Crash - Let the test run, but treat test failures due to crashes as a pass.
95* Timeout - Let the test run, but treat test failures due to timeouts as a pass.
96* Skip - Do not run the test.
97* RetryOnFailure - Re-enable automatic retries of a test if a suite has them
98 disabled by default.
99* Slow - Indicate that the test is expected to take longer than normal, usually
100 as a signal to increase timeouts.
101
102[typ_known_results]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py;l=40
103
104### Expectations
105
106After the header, the rest of the file consists of test expectations which
107specify what non-standard test behavior is expected on specific test machine
108configurations. An expectation is a single line in the following format:
109
110```
111bug_identifier [ tags ] test_name [ expected_results ]
112```
113
114As an example, the following would be an expectation specifying that the
115`foo.html` test is expected to fail on Windows machines with Debug browsers:
116
117```
118crbug.com/1234 [ win debug ] foo.html [ Failure ]
119```
120
121The bug identifier and tags are both optional and can be omitted. Not specifying
122any tags means that the expectation applies to the test regardless of where it
123is run. When omitting tags, the brackets are also omitted. Additionally,
124multiple bug identifiers are allowed as long as they are space-separated. The
125parser looks for certain prefixes, e.g. `crbug.com/` to determine what is
126considered a bug. This allows the parser to properly disambiguate one or more
127bug identifiers from the test name in the event that an expectation does not
128have any tags.
129
130Multiple expected results are allowed and are space-separated like tags. As an
131example, `[ Failure Crash ]` would specify that the test is expected to either
132fail or crash.
133
134Additionally, the test name is allowed to have up to one wildcard at the very
135end to match any tests that begin with the specified name. As an example, the
136following would be an expectation specifying that any test starting with `foo`
137is expected to fail on Windows machines with Debug browsers.
138
139```
140crbug.com/1234 [ win debug ] foo* [ Failure ]
141```
142
Brian Sheedya330c64c2025-03-21 00:17:18143The restriction of only having one wildcard at the end can be lifted via the
144`full_wildcard_support` annotation found under
145[the annotations section](#Annotations).
146
Brian Sheedy3e2d85a2024-01-08 20:51:50147#### Priority
148
149When using wildcards, it is possible for multiple expectations to apply to a
150test at runtime. For example, given the following:
151
152```
153[ win ] foo* [ Slow ]
154[ win ] foo/bar* [ Failure ]
155[ win ] foo/bar/specific_test.html [ Skip ]
156```
157
158`foo/bar/specific_test.html` running on a Windows machine would have three
159applicable expectations. In these cases, the most specific (i.e. the
160longest-named) expectation will be used.
161
162The order in which expectations are defined is *not* considered when determining
163priority.
164
165## Conflicts
166
167When more than one expectation exists for a test, it is possible that there will
168be a conflict where a test run on a particular test machine could have more than
169one expectation apply to it. Whether these conflicts are treated as errors and
170how conflicts get resolved are both configurable options via annotations found
171under [the annotations section](#Annotations).
172
173### Detection
174
175Two expectations for the same test conflict with each other if they do not use
176different tags from at least one shared tag set. As an example, look at the
177following expectations:
178
179```
180# Group 1
181[ win ] foo.html [ Failure ]
182[ mac ] foo.html [ Skip ]
183
184# Group 2
185[ win ] bar.html [ Failure ]
186[ debug ] bar.html [ Skip ]
187
188# Group 3
189[ linux ] foo.html [ Failure ]
190[ linux debug ] foo.html [ Skip ]
191```
192
193Group 1 would not result in a conflict since both `win` and `mac` are from the
194same tag set and are different values. Thus, the parser would be able to
195determine that at most one expectation will apply when running a test.
196
197Group 2 would result in a conflict since there are no tag sets that both
198expectations use, and thus there could be a test configuration that causes both
199expectations to apply. In this case, a configuration that produces both the
200`win` and `debug` tags is possible. This conflict could be resolved by adding
201a browser type tag to the first expectation or an operating system tag to the
202second expectation.
203
204Group 3 would result in a conflict since there is a tag set that both
205expectations use (operating system), but the exact tag is the same. Thus, a
206test running on Linux with a Debug browser would have both expectations apply.
207This conflict could be resolved by changing the first expectation to use
208`[ linux release ]`.
209
210It is important to be aware of the following when it comes to conflicts:
211
2121. The expectation file has no knowledge of which tag combinations are actually
213 possible in the real world, only what is theoretically possible given the
214 defined tag sets. A real world example of this would be the use of the Metal
215 API, which is Mac-specific. While a human would be able to reason that
216 `[ metal ]` implies `[ mac metal ]`, the latter is necessary for the
217 conflict detection to work properly.
2182. If tag sets include non-mutually-exclusive values and the test suite has not
219 implemented custom conflict checking logic, there can be false negatives when
220 checking for conflicts. For example, if `win` and `win10` were both in the OS
221 tag set, `[ win ] foo.html [ Failure ]` and `[ win10 ] foo.html [ Skip ]`
222 would not be found to conflict even though they can in the real world due to
223 `win10` being a more specific version of `win`.
2243. Expectations that use wildcards can result in conflict detection false
225 negatives. Conflict detection is only run on expectations with identical test
226 names. Thus, while `[ win ] foo* [ Failure ]` and `[ debug ] foo* [ Skip ]`
227 would be found to conflict since the test name is `foo*` in both cases,
228 `[ win ] f* [ Failure ]` and `[ debug ] foo* [ Skip ]` would not be found to
229 conflict.
230
231### Annotations
232
233By default, conflicts result in a parsing error. However, expectation files
234support several annotations to affect how conflicts are handled.
235
236`# conflicts_allowed: true` causes conflicts to no longer cause parsing errors.
237Instead, conflicts will be handled gracefully depending on the conflict
238resolution setting, the default of which is to take the union of expected
239results.
240
241`# conflict_resolution: ` specifies how conflicts will be handled when they are
242allowed. Supported values are `union` (the default) and `override`. `union`
243causes all conflicted expectations to be merged together. For example, the
244following:
245
246```
247[ win ] foo.html [ Failure ]
248[ debug ] foo.html [ Slow ]
249```
250
251would be equivalent to `[ win debug ] foo.html [ Failure Slow ]` when running on
252a Windows machine with a Debug browser.
253
254`override` uses whatever expectation was parsed last. Using the above example,
255A Windows machine with a Debug browser would end up using the
256`[ debug ] foo.html [ Slow ]` expectation.
Brian Sheedya330c64c2025-03-21 00:17:18257
258Additionally, by default, only a single wildcard is allowed at the end of a test
259name. This behavior is preferred for tests whose names are hierarchical in
260nature, e.g. for filepaths. However, if this behavior is not suitable for a
261test suite, full wildcard support can be enabled via the
262`# full_wildcard_support: true` annotation. This allows an arbitrary number of
263wildcards to be used anywhere in the test name. While this is more flexible and
264can make sense for certain test suites, it does make it harder for humans to
265determine which expectations apply to which tests.