Brian Sheedy | 3e2d85a | 2024-01-08 20:51:50 | [diff] [blame] | 1 | # Expectation Files |
| 2 | |
| 3 | A number of test suites in Chromium use expectation files to handle test |
| 4 | failures in order to have more granular control compared to the usual approach |
| 5 | of entirely disabling failing tests. This documentation goes into the general |
| 6 | usage of expecation files, while suite-specific details are handled in other |
| 7 | files. |
| 8 | |
| 9 | [TOC] |
| 10 | |
| 11 | Currently, the test suites that use expectation files can be broadly categorized |
| 12 | as Blink tests and GPU tests. Blink-specific documentation can be found |
| 13 | [here][blink_expectation_doc], while GPU-specific documentation can be found |
| 14 | [here][gpu_expectation_doc]. |
| 15 | |
| 16 | [blink_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/testing/web_test_expectations.md |
| 17 | [gpu_expectation_doc]: https://source.chromium.org/chromium/chromium/src/+/main:docs/gpu/gpu_expectation_files.md |
| 18 | |
| 19 | ## Design |
| 20 | |
| 21 | The full design for the format can be found [here][chromium_test_list_format] if |
| 22 | the overview in this documentation is not sufficient. |
| 23 | |
| 24 | [chromium_test_list_format]: http://bit.ly/chromium-test-list-format |
| 25 | |
| 26 | ## Code |
| 27 | |
| 28 | The parser implementation used by Chromium can be found [here][typ_parser]. This |
| 29 | handles the parsing of the text files into Python objects usable by Chromium's |
| 30 | test harnesses. |
| 31 | |
| 32 | [typ_parser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py |
| 33 | |
| 34 | ## Syntax |
| 35 | |
| 36 | An expectation file can be broadly broken up into two sections: the header and |
| 37 | test expectations. |
| 38 | |
| 39 | ### Header |
| 40 | |
| 41 | The header consists of specially formatted comments that define what tags and |
| 42 | expected results are usable in expectations later in the file. All header |
| 43 | content must be before any expectation content. Failure to do so will result in |
| 44 | the parser raising errors. An example header is: |
| 45 | |
| 46 | ``` |
| 47 | # tags: [ linux ubuntu jammy |
| 48 | # mac mac10 mac11 mac12 mac13 |
| 49 | # win win7 win10 ] |
| 50 | # tags: [ release debug ] |
| 51 | # results: [ Failure Skip Slow ] |
| 52 | ```` |
| 53 | |
| 54 | Specifically, the header consists of one or more tag sets and exactly one |
| 55 | expected result set. |
| 56 | |
| 57 | #### Tag Sets |
| 58 | |
| 59 | Each tag set begins with a `# tags:` comment followed by a space-separated list |
| 60 | of tags between `[ ]`. Order does not matter to the parser, and tags are |
| 61 | case-insensitive. Tag sets can span multiple lines as long as each line starts |
| 62 | with `#` and all tags are within the brackets. |
| 63 | |
| 64 | Each tag set contains all the tags that can be used in expectations for a |
| 65 | particular aspect of a test configuration. In the example header, the first tag |
| 66 | set contains values for operating systems, while the second tag set contains |
| 67 | values for browser build type. Grouping tags together into different sets |
| 68 | instead of having a monolithic set with all possible tag values is necessary |
| 69 | in order to handle conflicting expectation detection (explained later in |
| 70 | [the conflict section](#Conflicts)). |
| 71 | |
| 72 | One important note about tag sets is that unless a test harness is implementing |
| 73 | custom conflict detection logic, all tags within a set should be mutually |
| 74 | exclusive, i.e. only one tag from each tag set should be produced when running a |
| 75 | test. Failure to do so can result in conflict detection false negatives, the |
| 76 | specifics of which are explained in [the conflict section](#Conflicts). |
| 77 | |
| 78 | #### Expected Result Set |
| 79 | |
| 80 | The expected result set begins with a `# results:` comment followed by a |
| 81 | space-separated list of expected results between `[ ]`. Order does not matter to |
| 82 | the parser, but expected results are case sensitive. Additionally, only values |
| 83 | [known to the parser][typ_known_results] can be used. The expected results can |
| 84 | span multiple lines as long as each line starts with `#` and all values are |
| 85 | within the brackets. |
| 86 | |
| 87 | The expected result set contains all the expected results that can be used in |
| 88 | expectations. The specifics of how each expected result affects test behavior |
| 89 | can differ slightly between test suites, but generally do the following: |
| 90 | |
| 91 | * Pass - The default expected result for all tests. Let the test run, and expect |
| 92 | it to run without issue. |
| 93 | * Failure - Let the test run, but treat failures as a pass. |
| 94 | * Crash - Let the test run, but treat test failures due to crashes as a pass. |
| 95 | * Timeout - Let the test run, but treat test failures due to timeouts as a pass. |
| 96 | * Skip - Do not run the test. |
| 97 | * RetryOnFailure - Re-enable automatic retries of a test if a suite has them |
| 98 | disabled by default. |
| 99 | * Slow - Indicate that the test is expected to take longer than normal, usually |
| 100 | as a signal to increase timeouts. |
| 101 | |
| 102 | [typ_known_results]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/catapult/third_party/typ/typ/expectations_parser.py;l=40 |
| 103 | |
| 104 | ### Expectations |
| 105 | |
| 106 | After the header, the rest of the file consists of test expectations which |
| 107 | specify what non-standard test behavior is expected on specific test machine |
| 108 | configurations. An expectation is a single line in the following format: |
| 109 | |
| 110 | ``` |
| 111 | bug_identifier [ tags ] test_name [ expected_results ] |
| 112 | ``` |
| 113 | |
| 114 | As an example, the following would be an expectation specifying that the |
| 115 | `foo.html` test is expected to fail on Windows machines with Debug browsers: |
| 116 | |
| 117 | ``` |
| 118 | crbug.com/1234 [ win debug ] foo.html [ Failure ] |
| 119 | ``` |
| 120 | |
| 121 | The bug identifier and tags are both optional and can be omitted. Not specifying |
| 122 | any tags means that the expectation applies to the test regardless of where it |
| 123 | is run. When omitting tags, the brackets are also omitted. Additionally, |
| 124 | multiple bug identifiers are allowed as long as they are space-separated. The |
| 125 | parser looks for certain prefixes, e.g. `crbug.com/` to determine what is |
| 126 | considered a bug. This allows the parser to properly disambiguate one or more |
| 127 | bug identifiers from the test name in the event that an expectation does not |
| 128 | have any tags. |
| 129 | |
| 130 | Multiple expected results are allowed and are space-separated like tags. As an |
| 131 | example, `[ Failure Crash ]` would specify that the test is expected to either |
| 132 | fail or crash. |
| 133 | |
| 134 | Additionally, the test name is allowed to have up to one wildcard at the very |
| 135 | end to match any tests that begin with the specified name. As an example, the |
| 136 | following would be an expectation specifying that any test starting with `foo` |
| 137 | is expected to fail on Windows machines with Debug browsers. |
| 138 | |
| 139 | ``` |
| 140 | crbug.com/1234 [ win debug ] foo* [ Failure ] |
| 141 | ``` |
| 142 | |
Brian Sheedy | a330c64c | 2025-03-21 00:17:18 | [diff] [blame] | 143 | The restriction of only having one wildcard at the end can be lifted via the |
| 144 | `full_wildcard_support` annotation found under |
| 145 | [the annotations section](#Annotations). |
| 146 | |
Brian Sheedy | 3e2d85a | 2024-01-08 20:51:50 | [diff] [blame] | 147 | #### Priority |
| 148 | |
| 149 | When using wildcards, it is possible for multiple expectations to apply to a |
| 150 | test at runtime. For example, given the following: |
| 151 | |
| 152 | ``` |
| 153 | [ win ] foo* [ Slow ] |
| 154 | [ win ] foo/bar* [ Failure ] |
| 155 | [ win ] foo/bar/specific_test.html [ Skip ] |
| 156 | ``` |
| 157 | |
| 158 | `foo/bar/specific_test.html` running on a Windows machine would have three |
| 159 | applicable expectations. In these cases, the most specific (i.e. the |
| 160 | longest-named) expectation will be used. |
| 161 | |
| 162 | The order in which expectations are defined is *not* considered when determining |
| 163 | priority. |
| 164 | |
| 165 | ## Conflicts |
| 166 | |
| 167 | When more than one expectation exists for a test, it is possible that there will |
| 168 | be a conflict where a test run on a particular test machine could have more than |
| 169 | one expectation apply to it. Whether these conflicts are treated as errors and |
| 170 | how conflicts get resolved are both configurable options via annotations found |
| 171 | under [the annotations section](#Annotations). |
| 172 | |
| 173 | ### Detection |
| 174 | |
| 175 | Two expectations for the same test conflict with each other if they do not use |
| 176 | different tags from at least one shared tag set. As an example, look at the |
| 177 | following expectations: |
| 178 | |
| 179 | ``` |
| 180 | # Group 1 |
| 181 | [ win ] foo.html [ Failure ] |
| 182 | [ mac ] foo.html [ Skip ] |
| 183 | |
| 184 | # Group 2 |
| 185 | [ win ] bar.html [ Failure ] |
| 186 | [ debug ] bar.html [ Skip ] |
| 187 | |
| 188 | # Group 3 |
| 189 | [ linux ] foo.html [ Failure ] |
| 190 | [ linux debug ] foo.html [ Skip ] |
| 191 | ``` |
| 192 | |
| 193 | Group 1 would not result in a conflict since both `win` and `mac` are from the |
| 194 | same tag set and are different values. Thus, the parser would be able to |
| 195 | determine that at most one expectation will apply when running a test. |
| 196 | |
| 197 | Group 2 would result in a conflict since there are no tag sets that both |
| 198 | expectations use, and thus there could be a test configuration that causes both |
| 199 | expectations to apply. In this case, a configuration that produces both the |
| 200 | `win` and `debug` tags is possible. This conflict could be resolved by adding |
| 201 | a browser type tag to the first expectation or an operating system tag to the |
| 202 | second expectation. |
| 203 | |
| 204 | Group 3 would result in a conflict since there is a tag set that both |
| 205 | expectations use (operating system), but the exact tag is the same. Thus, a |
| 206 | test running on Linux with a Debug browser would have both expectations apply. |
| 207 | This conflict could be resolved by changing the first expectation to use |
| 208 | `[ linux release ]`. |
| 209 | |
| 210 | It is important to be aware of the following when it comes to conflicts: |
| 211 | |
| 212 | 1. The expectation file has no knowledge of which tag combinations are actually |
| 213 | possible in the real world, only what is theoretically possible given the |
| 214 | defined tag sets. A real world example of this would be the use of the Metal |
| 215 | API, which is Mac-specific. While a human would be able to reason that |
| 216 | `[ metal ]` implies `[ mac metal ]`, the latter is necessary for the |
| 217 | conflict detection to work properly. |
| 218 | 2. If tag sets include non-mutually-exclusive values and the test suite has not |
| 219 | implemented custom conflict checking logic, there can be false negatives when |
| 220 | checking for conflicts. For example, if `win` and `win10` were both in the OS |
| 221 | tag set, `[ win ] foo.html [ Failure ]` and `[ win10 ] foo.html [ Skip ]` |
| 222 | would not be found to conflict even though they can in the real world due to |
| 223 | `win10` being a more specific version of `win`. |
| 224 | 3. Expectations that use wildcards can result in conflict detection false |
| 225 | negatives. Conflict detection is only run on expectations with identical test |
| 226 | names. Thus, while `[ win ] foo* [ Failure ]` and `[ debug ] foo* [ Skip ]` |
| 227 | would be found to conflict since the test name is `foo*` in both cases, |
| 228 | `[ win ] f* [ Failure ]` and `[ debug ] foo* [ Skip ]` would not be found to |
| 229 | conflict. |
| 230 | |
| 231 | ### Annotations |
| 232 | |
| 233 | By default, conflicts result in a parsing error. However, expectation files |
| 234 | support several annotations to affect how conflicts are handled. |
| 235 | |
| 236 | `# conflicts_allowed: true` causes conflicts to no longer cause parsing errors. |
| 237 | Instead, conflicts will be handled gracefully depending on the conflict |
| 238 | resolution setting, the default of which is to take the union of expected |
| 239 | results. |
| 240 | |
| 241 | `# conflict_resolution: ` specifies how conflicts will be handled when they are |
| 242 | allowed. Supported values are `union` (the default) and `override`. `union` |
| 243 | causes all conflicted expectations to be merged together. For example, the |
| 244 | following: |
| 245 | |
| 246 | ``` |
| 247 | [ win ] foo.html [ Failure ] |
| 248 | [ debug ] foo.html [ Slow ] |
| 249 | ``` |
| 250 | |
| 251 | would be equivalent to `[ win debug ] foo.html [ Failure Slow ]` when running on |
| 252 | a Windows machine with a Debug browser. |
| 253 | |
| 254 | `override` uses whatever expectation was parsed last. Using the above example, |
| 255 | A Windows machine with a Debug browser would end up using the |
| 256 | `[ debug ] foo.html [ Slow ]` expectation. |
Brian Sheedy | a330c64c | 2025-03-21 00:17:18 | [diff] [blame] | 257 | |
| 258 | Additionally, by default, only a single wildcard is allowed at the end of a test |
| 259 | name. This behavior is preferred for tests whose names are hierarchical in |
| 260 | nature, e.g. for filepaths. However, if this behavior is not suitable for a |
| 261 | test suite, full wildcard support can be enabled via the |
| 262 | `# full_wildcard_support: true` annotation. This allows an arbitrary number of |
| 263 | wildcards to be used anywhere in the test name. While this is more flexible and |
| 264 | can make sense for certain test suites, it does make it harder for humans to |
| 265 | determine which expectations apply to which tests. |