Crash in [@ __if_indextoname] via NrIceCtx::StartGathering
Categories
(Core :: Security: Process Sandboxing, defect, P3)
Tracking
()
People
(Reporter: mccr8, Assigned: jld)
References
(Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(3 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr140+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/e16136bf-c417-46f0-bc84-ea6b30250703
Reason:
SIGSYS / SYS_SECCOMP
Top 10 frames:
0 libc.so.6 __GI___ioctl /usr/src/debug/glibc/glibc/sysdeps/unix/sysv/linux/ioctl.c:36
1 libc.so.6 __if_indextoname /usr/src/debug/glibc/glibc/sysdeps/unix/sysv/linux/if_index.c:231
2 libxul.so set_ifname dom/media/webrtc/transport/third_party/nICEr/src/stun/addrs-netlink.c:75
2 libxul.so stun_convert_netlink dom/media/webrtc/transport/third_party/nICEr/src/stun/addrs-netlink.c:133
2 libxul.so stun_getaddrs_filtered dom/media/webrtc/transport/third_party/nICEr/src/stun/addrs-netlink.c:251
3 libxul.so nr_stun_get_addrs dom/media/webrtc/transport/third_party/nICEr/src/stun/addrs.c:208
4 libxul.so nr_stun_find_local_addresses dom/media/webrtc/transport/third_party/nICEr/src/stun/stun_util.c:164
4 libxul.so nr_ice_gather dom/media/webrtc/transport/third_party/nICEr/src/ice/ice_ctx.c:871
5 libxul.so mozilla::NrIceCtx::StartGathering(bool, bool) dom/media/webrtc/transport/nricectx.cpp:934
6 libxul.so mozilla::MediaTransportHandlerSTS::StartIceGathering(bool, bool, nsTArray<moz... dom/media/webrtc/jsapi/MediaTransportHandler.cpp:920
This signature looks useless but it seems that it isn't that uncommon and they all have this same stack. It is a null deref.
Reporter | ||
Comment 1•4 months ago
|
||
It looks like this crash is Nightly only. It first showed up in 140a, in the 20250523091654 build. The crashes are all in the socket process.
Comment 2•3 months ago
|
||
Only thing I can see that might be involved (and landed near the target 2025-05-23) is bug 1954423. Byron, any thoughts here?
Comment 3•3 months ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
For more information, please visit BugBot documentation.
Comment 4•3 months ago
|
||
So this is nightly only on multiple versions of nightly (140, 141, and 142). It has never occured on 140 beta/release or 141 beta it seems. Bug 1954423 isn't restricted to nightly in any way, I don't see how this could be caused by bug 1954423, and we're still a few days off (bad builds start on May 23, with plenty of crashes, but bug 1954423 landed a few days before that).
A large majority of these are happening multiple times on the same install. Arch linux is waaaaaay overrepresented (there is a large number of Fedora 42 crashes up until mid-June, and then it becomes almost entirely Arch). Maybe that's because most other distros don't end up using nightly? Or maybe Fedora 42 and Arch users are the main Firefox users on linux? Not sure about that.
Kernel build dates look pretty recent for each crash report; all of these users had updated their kernel within a month or two. Maybe there was a particular kernel patch that interacts poorly with Firefox, or maybe changed the signature of this crash?
Comment 5•3 months ago
|
||
The severity field is not set for this bug.
:bwc, could you have a look please?
For more information, please visit BugBot documentation.
Comment 6•3 months ago
|
||
This looks like it is probably a linux bug, but not totally sure. I'll assign a priority for now, but I'll leave the needinfo.
Comment 7•3 months ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit BugBot documentation.
Comment 9•2 months ago
|
||
Copying crash signatures from duplicate bugs.
Assignee | ||
Comment 10•2 months ago
|
||
(In reply to Andrew McCreight (out of office until 8/21) [:mccr8] from comment #0)
Reason:
SIGSYS / SYS_SECCOMP
This is a sandbox violation. On Nightly we crash (and paste the syscall number into the address field so it can be aggregated on; it's not really a null pointer) because there's currently no other way to get enough information; on other branches the syscall fails with ENOSYS
. That can be changed in either direction with the MOZ_SANDBOX_CRASH_ON_ERROR
env var.
These particular crashes, which are all in the socket process, seem to be SIOCGIFNAME
; I can also see code to call SIOCGIFFLAGS
(to identify point-to-point interfaces, assumed to the VPNs) and SIOCETHTOOL
and SIOCGIWRATE
(to estimate the speed of wired and wireless interfaces respectively). It would be possible to block those silently (not crash or log an error), but I assume we'd want to allow them. Given that this process already has direct network access, I don't think that's a significant concern for security.
What I don't understand is why this only recently started happening, and why it's relatively low-volume. The code in WebRTC that calls these ioctls isn't new, and hasn't WebRTC been using the socket process for a long time?
Assignee | ||
Comment 11•2 months ago
|
||
I have a patch to allow the ioctls in question (for the socket process only).
Assignee | ||
Comment 12•28 days ago
|
||
Comment 13•26 days ago
|
||
Comment 14•25 days ago
|
||
bugherder |
Comment 15•22 days ago
|
||
While the crashes are Nightly-only, is this worth backporting anywhere to avoid the sandbox violations? Not sure what the impact is for those users.
Comment 16•19 days ago
|
||
firefox-beta Uplift Approval Request
- User impact if declined: Probable WebRTC flakiness for some Linux users under some circumstances. This isn't a crash on non-Nightly, but the sandbox denial could affect functionality.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: Testing note: we don't have STR (see discussion in bug 1990721 for details), but Nightly crash reports stopped after the patch landed
- Risk associated with taking this patch: low
- Explanation of risk level: The patch just adds a few things to an allow list in the sandbox policy, so I'm not really concerned about functional regression, and the operations allowed are things that it's reasonable for the socket process to do.
- String changes made/needed: none
- Is Android affected?: no
Assignee | ||
Comment 17•19 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D265735
Assignee | ||
Updated•19 days ago
|
Comment 18•19 days ago
|
||
firefox-esr140 Uplift Approval Request
- User impact if declined: Probable WebRTC flakiness for some Linux users under some circumstances. This isn't a crash on non-Nightly, but the sandbox denial could affect functionality.
- Code covered by automated testing: no
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: Testing note: we don't have STR (see discussion in bug 1990721 for details), but Nightly crash reports stopped after the patch landed
- Risk associated with taking this patch: low
- Explanation of risk level: The patch just adds a few things to an allow list in the sandbox policy, so I'm not really concerned about functional regression, and the operations allowed are things that it's reasonable for the socket process to do.
- String changes made/needed: none
- Is Android affected?: no
Assignee | ||
Comment 19•19 days ago
|
||
Original Revision: https://phabricator.services.mozilla.com/D265735
Updated•19 days ago
|
Updated•19 days ago
|
Updated•19 days ago
|
Comment 20•19 days ago
|
||
uplift |
Updated•18 days ago
|
Updated•18 days ago
|
Comment 21•18 days ago
|
||
uplift |
Comment 22•18 days ago
|
||
uplift |
Comment 23•18 days ago
|
||
Backed out for causing build bustages
- Backout link
- Push with failures
- Failure Log
- Failure line: /builds/worker/checkouts/gecko/security/sandbox/linux/SandboxFilter.cpp:X:14: error: 'class sandbox::bpf_dsl::Caser<long unsigned int>' has no member named 'Cases'; did you mean 'Case'?
Assignee | ||
Comment 24•18 days ago
•
|
||
Sorry about that; the patch has a dependency on bug 1937025. The fix should just be replacing ….Cases({…},…)
with ….CASES((…),…)
to work with the old bpf_dsl
API, so that shouldn't add any risk to the uplift. I'm testing that now.
Updated•18 days ago
|
Comment 26•18 days ago
|
||
uplift |
Description
•