* microcode sw/hw error

Posted on February 18th, 2009 by Alex. Filed under Linux.


If a huge amount of data is transfered using the WiFi connection of my notebook, it very often happens that the interface goes down due to a detection of a hardware or software error in the microcode.

Feb 18 12:56:30 jitu kernel: [56587.903516] iwlagn: Microcode SW error detected. Restarting 0×2000000.
Feb 18 12:56:30 jitu kernel: [56589.921886] phy0: failed to restore operational channel after scan
Feb 18 12:56:30 jitu kernel: [56589.921909] iwlagn: No space for Tx
Feb 18 12:56:30 jitu kernel: [56589.921915] iwlagn: Error sending REPLY_TX_PWR_TABLE_CMD: enqueue_hcmd failed: -28
Feb 18 12:56:30 jitu kernel: [56590.128931] Registered led device: iwl-phy0:radio
Feb 18 12:56:30 jitu kernel: [56590.128981] Registered led device: iwl-phy0:assoc
Feb 18 12:56:30 jitu kernel: [56590.129087] Registered led device: iwl-phy0:RX
Feb 18 12:56:30 jitu kernel: [56590.129129] Registered led device: iwl-phy0:TX

This error never occurs using normal browsing, email checking or working through SSH connection. But in a burst of a file transfer from another machine in the local network, the interface goes down after around 200MB and can only be reset by restarting the network. This error survived multiple kernel, mac, firmware, and iwlagn versions and still did not vanish. Just today I found a possible reason.

Basically there are multiple different revisions of the Intel Corporation PRO/Wireless 4965 AGN adapter and the newer ones do not have that problem. The problem might be caused by overheating of the hardware which fails after it crosses a temperature of around 60°C. IBM suggests to switch on the power management of Windows for the same adapter built in their Thinkpads. In Linux however this can also be done by

cat /sys/bus/pci/drivers/iwlagn/0000\:0c\:00.0/power_level

Replace 0000\:0c\:00.0 with your entry. My output looks like

SYSTEM:auto MODE:fixed INDEX:0

INDEX represents the power state of the adapter with 0 representing the highest power consumption. I am currently trying with a power state of 5 by executing

echo 5 > /sys/bus/pci/drivers/iwlagn/0000\:0c\:00.0/power_level

The lower power consumption results in a desirable lower temperature but at the cost of a higher latency. But for me the consistence is more important than some milliseconds. Unfortunately the INDEX variable is reset to 0 after restarting the network, by activating the kill switch or even from resuming after suspend. Therefore I wrote a little script (resetpowerwifi.sh) running in the background which checks the status of INDEX every 5 seconds. Start the script as a background job by copying it to a directory of your choice (e.g. /etc/init.d/). Put the lines

echo “Start background job to pull down the power level of wlan interface to avoid overheating…”
/etc/init.d/resetPowerWifi.sh &

into /etc/rc.local to start it automatically during boot up.



11 Responses to “microcode sw/hw error”

  1. Leonardo Says:

    Hi,
    I’m going to try this. I’ve been looking for a solution to this problem since I upgraded to kernel 2.6.28.x and started suffering random disconnections. I wouldn’t relate it to the temperature (everything worked just fine prior some 2.6.28.x upgrade that messed my connection).

    I’m also building the 2.6.29 rc7 to test with it.

    Did you had positive results with this settings?

    [REPLY]

    jitu Reply:

    Hi,

    unfortunately I thought, this was a solution. For a day or so it was working flawlessly. could copy data like anything. However next day, same problem. Let me know, if you had any success with the 29rc7

    Alex

    [REPLY]

    Leonardo Reply:

    I tried and it didn’t work here either.

    Neither did the 29rc7.

    What did help was to downgrade iwlwifi-4965-ucode to the 228.57.2.21 version. It doesn’t solve the issue, but at least, the problem is less frequent and the temperature is much lower.

    I think is a regression on the kernel since 2.6.28.3 and the new firmware. But I dont like the idea of using such an old kernel… although I might give it a try if this thing keeps popping up.

    [REPLY]

    jitu Reply:

    Hey,

    very strange. For a few days I dont have any issues anymore. I copied several GBs through the wireless link without any error. Although I didn’t change anything regarding the iwl-module. I dont have any problems anymore, since I removed the nvidia driver from the blacklisted modules and boot the kernel (2.6.28.7) without any parameters, so that all messages are printed out to the console. (previously I started the kernel with “root=/dev/sda2 ro resume=/dev/sda5 quiet”) I have no explanation for that.

    [REPLY]

  2. Jan Albin Says:

    Hi!

    Is the solution still working for you? If so, I’d like to know the following:

    * Do you run the default iwlagn driver for the 2.6.28.7 kernel?
    * What version of the NVIDIA driver are you using?
    * Are you still running the script that changes the power level setting?
    * What parameters do you load the iwlagn driver with?
    * Anything else that you may have found out to be the solution?

    I’ve had this problem on my T61 for a long time now – every solution I’ve tried has failed. So if you could help me, I’d _really_ appreciate it. Feels a bit wrong to have to carry along an external USB Wifi adapter all the time..

    Thanks in advance.

    [REPLY]

    jitu Reply:

    Hi,

    I didn’t try to copy large files for some time now, so the current status of working or non-working is unknown to me. In fact it will ever be (I guess) since earlier I experienced a working adapter for days and then all of sudden I get these errors once every 2 minutes.

    I try out tonight again and let you know. In the meantime, here are the version numbers you requested:

    * kernel 2.8.28.8 (I will also try 2.6.29.1 tonight since I compiled it just 2 days back)
    * nvidia-kernel-common: 20080825+1
    * nvidia-(glx|kernel-source|kernel-2.6.2(8.8|9.1)): 180.29-1
    * Nope the script to adjust the power level is not active
    * I dont put any parameters while loading the module. I guess it takes dafault values. Tell me how I figure out the settings and I’ll let you know :D

    [REPLY]

    jitu Reply:

    Hi,

    as promised I tried to copy some bug files through wifi. Yesterday I transfered 7.8GB on a stretch without changing the power levels (where set to maximum) and I did not experienced any microcode errors. The test was done using 2.6.28.8. Let me know, if I can provide more data to help you to solve the problem.

    Cheers,
    Alex

    [REPLY]

  3. Jan Albin Says:

    Hi!

    Thank you.

    Unfortunately the 2.6.28.8 kernel didn’t solve the problem for me.

    I must have tried 6 different kernels, about 10 combinations of module parameters, 5 different versions of the NVIDIA driver from their website, running with or without the power setting script. Sometimes it seems to work a bit better, and then again, after a while, the errors come back.

    When I have the time, I’ll try another operating system to see if it in my case actually could be a hardware problem.

    Regards

    [REPLY]

  4. arl Says:

    I have had iwlagn related problems with all Linux kernel (2.6.27). Seems the same problems are also with newer kernels (seen it).

    The only solution would be buy Atheros etc card?
    But my on-site …

    Seems iwlagn driver is one of the worst Linux driver I’ve seen – this driver blocks the whole machine for 15 secs! (ha ha)

    Also EU is not supported at all?

    There must be some off-by-one bug for finding wifi channel..

    I’m running iwlagn with 11n disabled, and channel 13 is used because 1-9 are heavily used here, resulting 50kB/sec transfer rate (max).

    //arl

    [REPLY]

Trackback URI | Comments RSS

Leave a Reply


RSS Feeds:

Search:


Pages:

Categories:

Archives: