Blame - docs/testing/layout_test_expectations.md - chromium/src.git

blob: e4cbf9c9f8604582b39fb27f13813876ec1001e8 [file] [log] [blame] [view]

pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	1	# Layout Test Expectations and Baselines
				2
				3
				4	The primary function of the LayoutTests is as a regression test suite; this
				5	means that, while we care about whether a page is being rendered correctly, we
				6	care more about whether the page is being rendered the way we expect it to. In
				7	other words, we look more for changes in behavior than we do for correctness.
				8
				9	[TOC]
				10
				11	All layout tests have "expected results", or "baselines", which may be one of
				12	several forms. The test may produce one or more of:
				13
				14	* A text file containing JavaScript log messages.
				15	* A text rendering of the Render Tree.
				16	* A screen capture of the rendered page as a PNG file.
				17	* WAV files of the audio output, for WebAudio tests.
				18
Robert Ma	06f7acc	2017-11-14 17:55:47	[diff] [blame^]	19	For any of these types of tests, baselines are checked into the LayoutTests
				20	directory. The filename of a baseline is the same as that of the corresponding
				21	test, but the extension is replaced with `-expected.{txt,png,wav}` (depending on
				22	the type of test output). Baselines usually live alongside tests, with the
				23	exception when baselines vary by platforms; read
				24	[Layout Test Baseline Fallback](layout_test_baseline_fallback.md) for more
				25	details.
				26
				27	Lastly, we also support the concept of "reference tests", which check that two
				28	pages are rendered identically (pixel-by-pixel). As long as the two tests'
				29	output match, the tests pass. For more on reference tests, see
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	30	[Writing ref tests](https://trac.webkit.org/wiki/Writing%20Reftests).
				31
				32	## Failing tests
				33
				34	When the output doesn't match, there are two potential reasons for it:
				35
				36	* The port is performing "correctly", but the output simply won't match the
				37	generic version. The usual reason for this is for things like form controls,
				38	which are rendered differently on each platform.
				39	* The port is performing "incorrectly" (i.e., the test is failing).
				40
				41	In both cases, the convention is to check in a new baseline (aka rebaseline),
				42	even though that file may be codifying errors. This helps us maintain test
				43	coverage for all the other things the test is testing while we resolve the bug.
				44
				45	*** promo
				46	If a test can be rebaselined, it should always be rebaselined instead of adding
				47	lines to TestExpectations.
				48	***
				49
				50	Bugs at [crbug.com](https://crbug.com) should track fixing incorrect behavior,
				51	not lines in
				52	[TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations). If a
				53	test is never supposed to pass (e.g. it's testing Windows-specific behavior, so
				54	can't ever pass on Linux/Mac), move it to the
				55	[NeverFixTests](../../third_party/WebKit/LayoutTests/NeverFixTests) file. That
				56	gets it out of the way of the rest of the project.
				57
				58	There are some cases where you can't rebaseline and, unfortunately, we don't
				59	have a better solution than either:
				60
				61	1. Reverting the patch that caused the failure, or
				62	2. Adding a line to TestExpectations and fixing the bug later.
				63
				64	In this case, reverting the patch is strongly preferred.
				65
				66	These are the cases where you can't rebaseline:
				67
				68	* The test is a reference test.
				69	* The test gives different output in release and debug; in this case, generate a
				70	baseline with the release build, and mark the debug build as expected to fail.
				71	* The test is flaky, crashes or times out.
				72	* The test is for a feature that hasn't yet shipped on some platforms yet, but
				73	will shortly.
				74
				75	## Handling flaky tests
				76
				77	The
				78	[flakiness dashboard](https://test-results.appspot.com/dashboards/flakiness_dashboard.html)
				79	is a tool for understanding a test’s behavior over time.
				80	Originally designed for managing flaky tests, the dashboard shows a timeline
				81	view of the test’s behavior over time. The tool may be overwhelming at first,
				82	but
				83	[the documentation](https://dev.chromium.org/developers/testing/flakiness-dashboard)
				84	should help. Once you decide that a test is truly flaky, you can suppress it
				85	using the TestExpectations file, as described below.
				86
				87	We do not generally expect Chromium sheriffs to spend time trying to address
				88	flakiness, though.
				89
				90	## How to rebaseline
				91
				92	Since baselines themselves are often platform-specific, updating baselines in
				93	general requires fetching new test results after running the test on multiple
				94	platforms.
				95
				96	### Rebaselining using try jobs
				97
				98	The recommended way to rebaseline for a currently-in-progress CL is to use
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	99	results from try jobs, by using the command-tool
				100	`third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-cl`:
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	101
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	102	1. First, upload a CL.
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	103	2. Trigger try jobs by running `webkit-patch rebaseline-cl`. This should
				104	trigger jobs on
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	105	[tryserver.blink](https://build.chromium.org/p/tryserver.blink/builders).
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	106	3. Wait for all try jobs to finish.
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	107	4. Run `webkit-patch rebaseline-cl` again to fetch new baselines.
				108	By default, this will download new baselines for any failing tests
				109	in the try jobs.
				110	(Run `webkit-patch rebaseline-cl --help` for more specific options.)
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	111	5. Commit the new baselines and upload a new patch.
				112
				113	This way, the new baselines can be reviewed along with the changes, which helps
				114	the reviewer verify that the new baselines are correct. It also means that there
				115	is no period of time when the layout test results are ignored.
				116
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	117	#### Options
				118
Quinten Yearsley	d13299d	2017-07-25 17:22:17	[diff] [blame]	119	### Rebaselining with try jobs
				120
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	121	The tests which `webkit-patch rebaseline-cl` tries to download new baselines for
				122	depends on its arguments.
				123
				124	* By default, it tries to download all baselines for tests that failed in the
				125	try jobs.
				126	* If you pass `--only-changed-tests`, then only tests modified in the CL will be
				127	considered.
				128	* You can also explicitly pass a list of test names, and then just those tests
				129	will be rebaselined.
Quinten Yearsley	a58f83c	2017-05-30 16:00:57	[diff] [blame]	130	* If some of the try jobs failed to run, and you wish to continue rebaselining
				131	assuming that there are no platform-specific results for those platforms,
				132	you can add the flag `--fill-missing`.
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	133
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	134	### Rebaselining manually
				135
				136	1. If the tests is already listed in TestExpectations as flaky, mark the test
				137	`NeedsManualRebaseline` and comment out the flaky line so that your patch can
				138	land without turning the tree red. If the test is not in TestExpectations,
				139	you can add a `[ Rebaseline ]` line to TestExpectations.
				140	2. Run `third_party/WebKit/Tools/Scripts/webkit-patch rebaseline-expectations`
				141	3. Post the patch created in step 2 for review.
				142
				143	## Kinds of expectations files
				144
				145	* [TestExpectations](../../third_party/WebKit/LayoutTests/TestExpectations): The
Quinten Yearsley	d13299d	2017-07-25 17:22:17	[diff] [blame]	146	main test failure suppression file. In theory, this should be used for
				147	temporarily marking tests as flaky.
pwnall	d8a25072	2016-11-09 18:24:03	[diff] [blame]	148	* [ASANExpectations](../../third_party/WebKit/LayoutTests/ASANExpectations):
				149	Tests that fail under ASAN.
				150	* [LeakExpectations](../../third_party/WebKit/LayoutTests/LeakExpectations):
				151	Tests that have memory leaks under the leak checker.