blob: bc8ce092b8f42697fba6b34ae035e488f2936028 [file] [log] [blame] [view]
Erik Chen527badd2018-04-18 21:37:591# Example Investigation of a Heap Dump
2
3This document describes the steps taken to investigate a real memory leak
4discovered by heap profiling in the wild. For investigators less familiar with
5the code base, `Navigating the Stack Trace` should be enough information to
6determine the relevant component, and to forward the bug to a component OWNER.
7
8## Understanding the heap dump summary
9
10The opening comment of [Issue
11834033](https://bugs.chromium.org/p/chromium/issues/detail?id=834033) contains a
12heap dump summary. The highlights are:
13
14* 315723 calls to malloc without corresponding call to free.
15* 806MB of memory.
16* The common stacktrace for all 315723 allocations.
17
18Usually, anything that uses over 10MB of memory is a red flag. With the
19exception of large image resources, most code in Chrome should use much less
20than 10MB. Anything that has over 100k allocations is also a red flag.
21
22### Navigating the Stack Trace - Detailed Breakdown
23
24Let's take a look at the stack trace:
25
26```
27profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
28base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
29<???>
30<???>
31base::allocator::UncheckedMallocMac(unsigned long, void**)
32sk_malloc_flags(unsigned long, unsigned int)
33SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
34SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
35IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
36ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
37extensions::ExtensionActionSetIconFunction::RunExtensionAction()
38extensions::ExtensionActionFunction::Run()
39ExtensionFunction::RunWithValidation()
40extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
41extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
42bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
43extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
44extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
45content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
46content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
47IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
48base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
49base::MessageLoop::RunTask(base::PendingTask*)
50base::MessageLoop::DoWork()
51base::MessagePumpCFRunLoopBase::RunWork()
Avi Drissmanb94ce742023-08-15 15:50:5252base::apple::CallWithEHFrame(void () block_pointer)
Erik Chen527badd2018-04-18 21:37:5953base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
54<???>
55<???>
56<???>
57<???>
58<???>
59<???>
60<???>
61<???>
62<???>
63__71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke
Avi Drissmanb94ce742023-08-15 15:50:5264base::apple::CallWithEHFrame(void () block_pointer)
Erik Chen527badd2018-04-18 21:37:5965-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]
66<???>
67base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*)
68base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*)
69<name omitted>
70ChromeBrowserMainParts::MainMessageLoopRun(int*)
71content::BrowserMainLoop::RunMainMessageLoopParts()
72content::BrowserMainRunnerImpl::Run()
Gabriel Charettefbeeb1c2021-11-10 20:50:0673content::BrowserMain(content::MainFunctionParams)
Erik Chen527badd2018-04-18 21:37:5974content::ContentMainRunnerImpl::Run()
75service_manager::Main(service_manager::MainParams const&)
76content::ContentMain(content::ContentMainParams const&)
77ChromeMain
78main
79<???>
80```
81
82The first step is to divide the stack trace into smaller segments to get a
83better understanding of what's happening at the time of allocations. The best
84way to do this is to segment by name space and/or function prefixes.
85
86```
87profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*)
88base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long)
89<???>
90<???>
91base::allocator::UncheckedMallocMac(unsigned long, void**)
92```
93
94The top of each stack will always contain some `base` and/or `profiling`
95code. This is the code responsible for allocating and recording the memory.
96
97```
98sk_malloc_flags(unsigned long, unsigned int)
99SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long)
100SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long)
101```
102
103Next, we three 3 frames with the prefix `sk`. Searching for
104`sk_malloc_flags` on
105[codesearch](https://cs.chromium.org/search/?q=sk_malloc_flags&sq=package:chromium&type=cs)
106reveals that the component is `third_party/skia`. Looking at the
107[README](https://cs.chromium.org/chromium/src/third_party/skia/README) reveals
108that Skia is a 2D graphics library.
109
110```
111IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*)
112```
113
114Next we see a templated function called `Read` in the namespace `IPC`.
115`IPC` stands for inter-process communication. This suggests that the
116function is responsible for reading an IPC Message, perhaps concerning an
117`SkBitmap`.
118
119```
120ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*)
121extensions::ExtensionActionSetIconFunction::RunExtensionAction()
122extensions::ExtensionActionFunction::Run()
123ExtensionFunction::RunWithValidation()
124extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&)
125extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int)
126bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&))
127extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
128extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*)
129```
130
131Next, we see many frames with the `extension` prefix. Extensions are exactly
132what they sound like - Chrome extensions like AdBlock are used to modify the
133behavior of the browser.
134
135```
136content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&)
137content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&)
138```
139
140`content` is the name of code that glues together web code [like extensions] and
141the rest of Chrome.
142
143```
144IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&)
145```
146
147More `IPC` code.
148
149```
150base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
151base::MessageLoop::RunTask(base::PendingTask*)
152base::MessageLoop::DoWork()
153base::MessagePumpCFRunLoopBase::RunWork()
Avi Drissmanb94ce742023-08-15 15:50:52154base::apple::CallWithEHFrame(void () block_pointer)
Erik Chen527badd2018-04-18 21:37:59155base::MessagePumpCFRunLoopBase::RunWorkSource(void*)
156```
157
158More `base` code. The bottom of most stack traces should go back to
159`MessageLoop`, a primitive Chrome construct used to run tasks.
160
161### Navigating the Stack Trace - Summary
162
163* The top and bottom of the stack should generally be the same and are not very
164 interesting.
165* The prefixes of frames can be used to get a rough idea of the components
166 involved.
167* Function names can be used to get a rough idea of what's going on.
168
169In this case, extension code is calling `ParseIconFromCanvasDictionary` - so
170it's probably trying to parse an icon. This calls into Skia code. Given that
171Skia is a 2D drawing library, and the function is `tryAllocPixels`, Skia is
172allocating some pixels for the icon. This process is being repeated 315 thousand
173times, and the icon is being leaked every time.
174
175
176## Diving into the code
177
178Now that we have a rough idea of what's happening, let's look at the code for
179ParseIconFromCanvasDictionary.
180
181```cpp
182bool ExtensionAction::ParseIconFromCanvasDictionary(
183 const base::DictionaryValue& dict,
184 gfx::ImageSkia* icon) {
185 for (base::DictionaryValue::Iterator iter(dict); !iter.IsAtEnd();
186 iter.Advance()) {
187 std::string binary_string64;
188 IPC::Message pickle;
189 if (iter.value().is_blob()) {
190 pickle = IPC::Message(iter.value().GetBlob().data(),
191 iter.value().GetBlob().size());
192 } else if (iter.value().GetAsString(&binary_string64)) {
193 std::string binary_string;
194 if (!base::Base64Decode(binary_string64, &binary_string))
195 return false;
196 pickle = IPC::Message(binary_string.c_str(), binary_string.length());
197 } else {
198 continue;
199 }
200 base::PickleIterator pickle_iter(pickle);
201 SkBitmap bitmap;
202 if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
203 return false;
204 CHECK(!bitmap.isNull());
205
206 // Chrome helpfully scales the provided icon(s), but let's not go overboard.
207 const int kActionIconMaxSize = 10 * ActionIconSize();
208 if (bitmap.drawsNothing() || bitmap.width() > kActionIconMaxSize)
209 continue;
210
211 float scale = static_cast<float>(bitmap.width()) / ActionIconSize();
212 icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
213 }
214 return true;
215}
216```
217
218There's a lot going on here, but we can use the information we have to focus.
219The leak happens in IPC::ReadParam, so the relevant lines are:
220
221```
222SkBitmap bitmap;
223if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap))
224 return false;
225```
226
227The `IPC` message is being decoded into `bitmap`.
228
229```
230 icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale));
231```
232Looking at subsequent consumers of `bitmap`, we see that it is being added as a
233representation to `icon`. `icon` is an output parameter of this function, so we
234have to look at the calling frame,
235`ExtensionActionSetIconFunction::RunExtensionAction`.
236
237```
238ExtensionFunction::ResponseAction
239ExtensionActionSetIconFunction::RunExtensionAction() {
240...
241 EXTENSION_FUNCTION_VALIDATE(
242 ExtensionAction::ParseIconFromCanvasDictionary(*canvas_set, &icon));
243
244 if (icon.isNull())
245 return RespondNow(Error("Icon invalid."));
246
247 extension_action_->SetIcon(tab_id_, gfx::Image(icon));
248...
249}
250```
251
252In this case, I've already focused on the code that calls
253`ParseIconFromCanvasDictionary`. Let's look at `SetIcon`.
254
255```
256void ExtensionAction::SetIcon(int tab_id, const gfx::Image& image) {
257 SetValue(&icon_, tab_id, image);
258}
259```
260
261```
262template<class T>
263void SetValue(std::map<int, T>* map, int tab_id, const T& val) {
264 (*map)[tab_id] = val;
265}
266```
267
Erik Chendc47db12018-04-23 17:21:52268The icon is being added to a map `icon_`, with `tab_id` as the key. Ah ha!
269Adding elements to a container [and never removing them] is one of the most
270common sources of memory issues.
271
272There are two ways for this memory to be released - the container `icon_` can be
273destroyed, or the element can be removed from the container.
274
275`icon_` is a member of `ExtensionAction`, whose documentation reads:
276```
277// ExtensionAction encapsulates the state of a browser action or page action.
278// Instances can have both global and per-tab state. If a property does not have
279// a per-tab value, the global value is used instead.
280```
281
282This suggests that the lifetime of `icon_` is tied to the lifetime of the
283ExtensionAction, which we can guess is tied to the lifetime of the Extension. As
284long as the extension stays installed and enabled, `icon_` will not be
285destroyed.
286
287Next, we use codesearch to look at all code that removes elements from `icon_`.
288The only place that performs removal is
Erik Chen527badd2018-04-18 21:37:59289
290```
291void ExtensionAction::ClearAllValuesForTab(int tab_id) {
292...
293 icon_.erase(tab_id);
294...
295}
296```
297
298This is called by `ExtensionActionAPI::ClearAllValuesForTab`, which is called by
Erik Chendc47db12018-04-23 17:21:52299`TabHelper::DidFinishNavigation`. The name of this method suggests that each
300time a tab is navigated, the previous tab-specific icon is cleared. However,
301that means that if a tab is closed, then the icon is leaked forever.