[SPARC] - Regression - Niagara does not reboot properly

Bug #62485 reported by Fabio Massimo Di Nitto
4
Affects Status Importance Assigned to Milestone
linux-source-2.6.17 (Ubuntu)
Fix Released
High
Fabio Massimo Di Nitto

Bug Description

After fixing the CPU lockups that i thought they were the reason for this issue, it appears that we are still trapping the HyperVisor somewhere at reboot and the machine does poweroff.

ERROR: HV Abort: JBI Error (22) - PowerDown

Fabio

Changed in linux-source-2.6.17:
importance: Undecided → High
status: Unconfirmed → Confirmed
Revision history for this message
Tom Marble (tmarble) wrote :

I have just booted to Edgy running 2.6.17-10-sparc64-smp
and I do not have this problem. The system boots (and reboots)
without hanging or powering off.

It is quite likely due to down-rev'd firmware.
Please note the large number of HV related fixes
in the latest version, which can be found here:
http://sunsolve.sun.com/search/document.do?assetkey=1-21-123482-02

After installing the new firmware, the versions are as follows:
sc> showhost
System Firmware 6.2.4 Sun Fire[TM] T2000 2006/08/18 12:35

Host flash versions:
   Hypervisor 1.2.3 2006/08/18 12:25
   OBP 4.23.4 2006/08/04 20:46
   Sun Fire[TM] T2000 POST 4.23.4 2006/08/04 21:15
sc> showsc version -v
Advanced Lights Out Manager CMT v1.2.4
SC Firmware version: CMT 1.2.4
SC Bootmon version: CMT 1.2.4

VBSC 1.2.4
VBSC firmware built Aug 18 2006, 12:27:38

SC Bootmon Build Release: 00
SC bootmon checksum: EF3B7ADD
SC Bootmon built Aug 18 2006, 12:34:48

SC Build Release: 00
SC firmware checksum: 2B377B21

SC firmware built Aug 18 2006, 12:35:01
SC firmware flashupdate THU SEP 21 15:19:26 2006

SC System Memory Size: 32 MB
SC NVRAM Version = 10
SC hardware type: 4

FPGA Version: 4.2.4.7
sc>

Regards,

--Tom

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

confirmed. The new OBP fixes this issue. So it's an OBP bug triggered by the new kernel. This is worth a release note.

Thanks Tom!

Fabio

Changed in linux-source-2.6.17:
assignee: nobody → tfheen
Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Something like this in the ReleaseNotes should do:

SUN T2000 users should upgrade their ALOM/OBP to at least 123482-02 before upgrading to edgy. Upgrade procedures are provided by SUN at the following URL:
http://sunsolve.sun.com/search/document.do?assetkey=1-21-123482-02
(NOTE: Ubuntu/Canonical takes no responsabilities on the correctness of this documentation/upgrade - make sure to read *ALL* the information and details)

Rationale: the new edgy kernel can triggers an HyperVisor bug in handling IRQs that results in a machine halt/poweroff (ERROR: HV Abort: JBI Error (22) - PowerDown) during reboots.
The machine needs to be re-powered on in order to resume normal operation (including a complete POST) that might result in long service and unscheduled downtime.

Revision history for this message
DaveM (davem) wrote :

Actually, this is a problem in the e1000 driver.

It puts the chip into a power-management state then accesses
it's registers in a way that it shouldn't, which results in a timeout.

I've known about this bug since the 2.6.17 kernel was released
but never had time to look into it and fix properly.

The hypervisor shouldn't abort like that, but the true cause is the
e1000 driver.

So we don't need an "upgrade your OBP" in the release for this
issue since we can fix the failure anyways in the e1000 code.

Tollef Fog Heen (tfheen)
Changed in linux-source-2.6.17:
assignee: tfheen → nobody
Revision history for this message
Matt Zimmerman (mdz) wrote :

We are now in kernel freeze and don't have a kernel fix anyway, so this needs to be documented

Changed in linux-source-2.6.17:
assignee: nobody → fabbione
Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Like the other bug :) i agreed with Jeff about this too.

The problem is that libc6-sparc64* don't have what they are supposed to. They are off of one level /usr/lib* -> /lib* and missing the real optmized libs.

Fabio

Changed in linux-source-2.6.17:
assignee: fabbione → jbailey
Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

ops.. wrong comment on the wrong bug.

Changed in linux-source-2.6.17:
assignee: jbailey → fabbione
Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Matt this was assigned to Tollef to add the info to the release notes.

Fabio

Revision history for this message
Tollef Fog Heen (tfheen) wrote :

This is documented in the release notes now, so removing milestone. (Workaround for Malone not supporting having two bug tasks on the same bug in the same distro where one has a package and one doesn't assigned).

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Spoken with David Miller again. The driver is not buggy. It does indeed do the right thing and the only solution is an ALOM upgrade.

Fabio

Changed in linux-source-2.6.17:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.