| # Example Investigation of a Heap Dump |
| |
| This document describes the steps taken to investigate a real memory leak |
| discovered by heap profiling in the wild. For investigators less familiar with |
| the code base, `Navigating the Stack Trace` should be enough information to |
| determine the relevant component, and to forward the bug to a component OWNER. |
| |
| ## Understanding the heap dump summary |
| |
| The opening comment of [Issue |
| 834033](https://bugs.chromium.org/p/chromium/issues/detail?id=834033) contains a |
| heap dump summary. The highlights are: |
| |
| * 315723 calls to malloc without corresponding call to free. |
| * 806MB of memory. |
| * The common stacktrace for all 315723 allocations. |
| |
| Usually, anything that uses over 10MB of memory is a red flag. With the |
| exception of large image resources, most code in Chrome should use much less |
| than 10MB. Anything that has over 100k allocations is also a red flag. |
| |
| ### Navigating the Stack Trace - Detailed Breakdown |
| |
| Let's take a look at the stack trace: |
| |
| ``` |
| profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*) |
| base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long) |
| <???> |
| <???> |
| base::allocator::UncheckedMallocMac(unsigned long, void**) |
| sk_malloc_flags(unsigned long, unsigned int) |
| SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long) |
| SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long) |
| IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*) |
| ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*) |
| extensions::ExtensionActionSetIconFunction::RunExtensionAction() |
| extensions::ExtensionActionFunction::Run() |
| ExtensionFunction::RunWithValidation() |
| extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&) |
| extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int) |
| bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)) |
| extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*) |
| extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*) |
| content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&) |
| content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&) |
| IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&) |
| base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) |
| base::MessageLoop::RunTask(base::PendingTask*) |
| base::MessageLoop::DoWork() |
| base::MessagePumpCFRunLoopBase::RunWork() |
| base::apple::CallWithEHFrame(void () block_pointer) |
| base::MessagePumpCFRunLoopBase::RunWorkSource(void*) |
| <???> |
| <???> |
| <???> |
| <???> |
| <???> |
| <???> |
| <???> |
| <???> |
| <???> |
| __71-[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:]_block_invoke |
| base::apple::CallWithEHFrame(void () block_pointer) |
| -[BrowserCrApplication nextEventMatchingMask:untilDate:inMode:dequeue:] |
| <???> |
| base::MessagePumpNSApplication::DoRun(base::MessagePump::Delegate*) |
| base::MessagePumpCFRunLoopBase::Run(base::MessagePump::Delegate*) |
| <name omitted> |
| ChromeBrowserMainParts::MainMessageLoopRun(int*) |
| content::BrowserMainLoop::RunMainMessageLoopParts() |
| content::BrowserMainRunnerImpl::Run() |
| content::BrowserMain(content::MainFunctionParams) |
| content::ContentMainRunnerImpl::Run() |
| service_manager::Main(service_manager::MainParams const&) |
| content::ContentMain(content::ContentMainParams const&) |
| ChromeMain |
| main |
| <???> |
| ``` |
| |
| The first step is to divide the stack trace into smaller segments to get a |
| better understanding of what's happening at the time of allocations. The best |
| way to do this is to segment by name space and/or function prefixes. |
| |
| ``` |
| profiling::(anonymous namespace)::HookAlloc(base::allocator::AllocatorDispatch const*, unsigned long, void*) |
| base::allocator::MallocZoneFunctionsToReplaceDefault()::$_1::__invoke(_malloc_zone_t*, unsigned long) |
| <???> |
| <???> |
| base::allocator::UncheckedMallocMac(unsigned long, void**) |
| ``` |
| |
| The top of each stack will always contain some `base` and/or `profiling` |
| code. This is the code responsible for allocating and recording the memory. |
| |
| ``` |
| sk_malloc_flags(unsigned long, unsigned int) |
| SkMallocPixelRef::MakeAllocate(SkImageInfo const&, unsigned long) |
| SkBitmap::tryAllocPixels(SkImageInfo const&, unsigned long) |
| ``` |
| |
| Next, we three 3 frames with the prefix `sk`. Searching for |
| `sk_malloc_flags` on |
| [codesearch](https://cs.chromium.org/search/?q=sk_malloc_flags&sq=package:chromium&type=cs) |
| reveals that the component is `third_party/skia`. Looking at the |
| [README](https://cs.chromium.org/chromium/src/third_party/skia/README) reveals |
| that Skia is a 2D graphics library. |
| |
| ``` |
| IPC::ParamTraits<SkBitmap>::Read(base::Pickle const*, base::PickleIterator*, SkBitmap*) |
| ``` |
| |
| Next we see a templated function called `Read` in the namespace `IPC`. |
| `IPC` stands for inter-process communication. This suggests that the |
| function is responsible for reading an IPC Message, perhaps concerning an |
| `SkBitmap`. |
| |
| ``` |
| ExtensionAction::ParseIconFromCanvasDictionary(base::DictionaryValue const&, gfx::ImageSkia*) |
| extensions::ExtensionActionSetIconFunction::RunExtensionAction() |
| extensions::ExtensionActionFunction::Run() |
| ExtensionFunction::RunWithValidation() |
| extensions::ExtensionFunctionDispatcher::DispatchWithCallbackInternal(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int, base::RepeatingCallback<void (ExtensionFunction::ResponseType, base::ListValue const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, extensions::functions::HistogramValue)> const&) |
| extensions::ExtensionFunctionDispatcher::Dispatch(ExtensionHostMsg_Request_Params const&, content::RenderFrameHost*, int) |
| bool IPC::MessageT<ExtensionHostMsg_Request_Meta, std::__1::tuple<ExtensionHostMsg_Request_Params>, void>::Dispatch<extensions::ExtensionWebContentsObserver, extensions::ExtensionWebContentsObserver, content::RenderFrameHost, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)>(IPC::Message const*, extensions::ExtensionWebContentsObserver*, extensions::ExtensionWebContentsObserver*, content::RenderFrameHost*, void (extensions::ExtensionWebContentsObserver::*)(content::RenderFrameHost*, ExtensionHostMsg_Request_Params const&)) |
| extensions::ExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*) |
| extensions::ChromeExtensionWebContentsObserver::OnMessageReceived(IPC::Message const&, content::RenderFrameHost*) |
| ``` |
| |
| Next, we see many frames with the `extension` prefix. Extensions are exactly |
| what they sound like - Chrome extensions like AdBlock are used to modify the |
| behavior of the browser. |
| |
| ``` |
| content::WebContentsImpl::OnMessageReceived(content::RenderFrameHostImpl*, IPC::Message const&) |
| content::RenderFrameHostImpl::OnMessageReceived(IPC::Message const&) |
| ``` |
| |
| `content` is the name of code that glues together web code [like extensions] and |
| the rest of Chrome. |
| |
| ``` |
| IPC::ChannelProxy::Context::OnDispatchMessage(IPC::Message const&) |
| ``` |
| |
| More `IPC` code. |
| |
| ``` |
| base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) |
| base::MessageLoop::RunTask(base::PendingTask*) |
| base::MessageLoop::DoWork() |
| base::MessagePumpCFRunLoopBase::RunWork() |
| base::apple::CallWithEHFrame(void () block_pointer) |
| base::MessagePumpCFRunLoopBase::RunWorkSource(void*) |
| ``` |
| |
| More `base` code. The bottom of most stack traces should go back to |
| `MessageLoop`, a primitive Chrome construct used to run tasks. |
| |
| ### Navigating the Stack Trace - Summary |
| |
| * The top and bottom of the stack should generally be the same and are not very |
| interesting. |
| * The prefixes of frames can be used to get a rough idea of the components |
| involved. |
| * Function names can be used to get a rough idea of what's going on. |
| |
| In this case, extension code is calling `ParseIconFromCanvasDictionary` - so |
| it's probably trying to parse an icon. This calls into Skia code. Given that |
| Skia is a 2D drawing library, and the function is `tryAllocPixels`, Skia is |
| allocating some pixels for the icon. This process is being repeated 315 thousand |
| times, and the icon is being leaked every time. |
| |
| |
| ## Diving into the code |
| |
| Now that we have a rough idea of what's happening, let's look at the code for |
| ParseIconFromCanvasDictionary. |
| |
| ```cpp |
| bool ExtensionAction::ParseIconFromCanvasDictionary( |
| const base::DictionaryValue& dict, |
| gfx::ImageSkia* icon) { |
| for (base::DictionaryValue::Iterator iter(dict); !iter.IsAtEnd(); |
| iter.Advance()) { |
| std::string binary_string64; |
| IPC::Message pickle; |
| if (iter.value().is_blob()) { |
| pickle = IPC::Message(iter.value().GetBlob().data(), |
| iter.value().GetBlob().size()); |
| } else if (iter.value().GetAsString(&binary_string64)) { |
| std::string binary_string; |
| if (!base::Base64Decode(binary_string64, &binary_string)) |
| return false; |
| pickle = IPC::Message(binary_string.c_str(), binary_string.length()); |
| } else { |
| continue; |
| } |
| base::PickleIterator pickle_iter(pickle); |
| SkBitmap bitmap; |
| if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap)) |
| return false; |
| CHECK(!bitmap.isNull()); |
| |
| // Chrome helpfully scales the provided icon(s), but let's not go overboard. |
| const int kActionIconMaxSize = 10 * ActionIconSize(); |
| if (bitmap.drawsNothing() || bitmap.width() > kActionIconMaxSize) |
| continue; |
| |
| float scale = static_cast<float>(bitmap.width()) / ActionIconSize(); |
| icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale)); |
| } |
| return true; |
| } |
| ``` |
| |
| There's a lot going on here, but we can use the information we have to focus. |
| The leak happens in IPC::ReadParam, so the relevant lines are: |
| |
| ``` |
| SkBitmap bitmap; |
| if (!IPC::ReadParam(&pickle, &pickle_iter, &bitmap)) |
| return false; |
| ``` |
| |
| The `IPC` message is being decoded into `bitmap`. |
| |
| ``` |
| icon->AddRepresentation(gfx::ImageSkiaRep(bitmap, scale)); |
| ``` |
| Looking at subsequent consumers of `bitmap`, we see that it is being added as a |
| representation to `icon`. `icon` is an output parameter of this function, so we |
| have to look at the calling frame, |
| `ExtensionActionSetIconFunction::RunExtensionAction`. |
| |
| ``` |
| ExtensionFunction::ResponseAction |
| ExtensionActionSetIconFunction::RunExtensionAction() { |
| ... |
| EXTENSION_FUNCTION_VALIDATE( |
| ExtensionAction::ParseIconFromCanvasDictionary(*canvas_set, &icon)); |
| |
| if (icon.isNull()) |
| return RespondNow(Error("Icon invalid.")); |
| |
| extension_action_->SetIcon(tab_id_, gfx::Image(icon)); |
| ... |
| } |
| ``` |
| |
| In this case, I've already focused on the code that calls |
| `ParseIconFromCanvasDictionary`. Let's look at `SetIcon`. |
| |
| ``` |
| void ExtensionAction::SetIcon(int tab_id, const gfx::Image& image) { |
| SetValue(&icon_, tab_id, image); |
| } |
| ``` |
| |
| ``` |
| template<class T> |
| void SetValue(std::map<int, T>* map, int tab_id, const T& val) { |
| (*map)[tab_id] = val; |
| } |
| ``` |
| |
| The icon is being added to a map `icon_`, with `tab_id` as the key. Ah ha! |
| Adding elements to a container [and never removing them] is one of the most |
| common sources of memory issues. |
| |
| There are two ways for this memory to be released - the container `icon_` can be |
| destroyed, or the element can be removed from the container. |
| |
| `icon_` is a member of `ExtensionAction`, whose documentation reads: |
| ``` |
| // ExtensionAction encapsulates the state of a browser action or page action. |
| // Instances can have both global and per-tab state. If a property does not have |
| // a per-tab value, the global value is used instead. |
| ``` |
| |
| This suggests that the lifetime of `icon_` is tied to the lifetime of the |
| ExtensionAction, which we can guess is tied to the lifetime of the Extension. As |
| long as the extension stays installed and enabled, `icon_` will not be |
| destroyed. |
| |
| Next, we use codesearch to look at all code that removes elements from `icon_`. |
| The only place that performs removal is |
| |
| ``` |
| void ExtensionAction::ClearAllValuesForTab(int tab_id) { |
| ... |
| icon_.erase(tab_id); |
| ... |
| } |
| ``` |
| |
| This is called by `ExtensionActionAPI::ClearAllValuesForTab`, which is called by |
| `TabHelper::DidFinishNavigation`. The name of this method suggests that each |
| time a tab is navigated, the previous tab-specific icon is cleared. However, |
| that means that if a tab is closed, then the icon is leaked forever. |