“Wake On Lan” is one to turn off. Also any USB ports that may sniff some battery while power-off. Closely check your BIOS. I had issue, then turned off some questionable items. Battery now drains much slower.
I checked if it is on by doing this commend sudo ethtool ens5f5
→ output: sudo ethtool ens5f5
.
Based on the Arch Wiki Power mgt. page I did this:
“To allow autosuspend only for devices that are known to work, use simple matching against vendor and product IDs (use lsusb to get these values):”
/etc/udev/rules.d/50-usb_power_save.rules
whitelist for usb autosuspend:
ACTION==“add”, SUBSYSTEM==“usb”, TEST==“power/control”, ATTR{idVendor}==“05c6”, ATTR{idProduct}==“9205”, ATTR{power/control}=“auto”
So, I created the 50-usb_power_save.rules
file in /etc/udev/rules.d and added the rule with the appropriate vendor and product IDs.
Do you remember what they were? I do not have enough knowledge to assess whether an item is questionable.
Despite all measures taken so far, the battery drain is still high. Even installing a new battery would get me back to this point in no time.
Hello,
There seem to be a lot of issues with your laptop based on your logs.
This is the most concerning one:
Jan 09 17:54:53 _hostname_ kernel: RIP: 0010:cpuidle_enter_state+0xe2/0x420
Your kernel rips when the CPU is trying to enter the idle state, that’s why power drain is so high.
Did you change something in your BIOS:
Jan 09 17:54:35 _hostname_ kernel: intel_pstate: Disabling energy efficiency optimization
Did you disable turbo boost also:
Jan 09 17:54:41 _hostname_ kernel: intel_pstate: Turbo disabled by BIOS or unavailable on processor
EDID tables and ACPI seem broken too. Do you have the latest BIOS? Also, how did you configure thermald?
Hi there,
thanks for your detailed reply.
From what I gather, this is the main issue that drains the battery, so that means that needs to be fixed. Please tell me how, or point me to a source. Also, what does the term “rip” mean in this context?
Not that I can remember, in any case disabling energy efficiency is the last thing I would do. That said, it is what it is. I’ll try to enable it in the BIOS.
Definitely not. Turbo boost sounds like an energy guzzler to me though. From the way you ask me I should also re-enable that one, right?
Probably not. My PC is from 2019 and I have never updated the BIOS. There is a simple reason for it: FEAR. I have read many stories about people updating their BIOS and running into a mess, so I decided to stay away from it.
UPDATE: I am typing this update from another computer while I have the Lenovo booted into BIOS.
In the BIOS there are only 3 tabs that contain changeable settings: Configuration, Security, Boot. None of these 3 have any Power settings, nor about Turbo Boost.
On the Configuration tab there is an item Bios Back Flash, which is disabled.
The description for this item is Allow Bios to be back leveled to a previous version. I never touched this one because of my fear for Bios updates, so it was disabled by default. I don’t know if this setting is relevant?
The only other thing that may be of relevance is on the 4th tab called Exit.
On there there is Load Default Settings, which can be activated.
Below it there is OS Optimized defaults, which is enabled.
Yes, this is the main issue. I suspect that either bad/buggy BIOS or thermald + Intel DPTF is the main culprit. RIP is the CPU register containing the address of the instruction that is getting executed, I use it colloquially as “rips” instead of “kernel oops” there.
You could try to “Load Default Settings” anyway, and see if anything changes.
Again, how did you configure thermald? Can you post your thermald configs?
If that turns out ok, the last option could be to reverse engineer/or manually extract DPTF profile.
More info about possible DPTF fixes https://wiki.archlinux.org/title/Lenovo_ThinkPad_X1_Extreme#Fix_using_reverse-engineered_DPTF_implementation and https://wiki.archlinux.org/title/Lenovo_ThinkPad_X1_Extreme#Fix_using_dptfxtract_and_manual_DPTF_profile_setting
Laptop model should not matter. I had to use dptfxtract for my Asus Zenbook, too.
I did not configure it after installation. I now understand there should be a file thermald-conf.xml
, but I have not been able to find it.
From /etc/thermald/thermal-cpu-cdev-order.xml:
<CoolingDeviceOrder>
<!-- Specify Cooling device order -->
<CoolingDevice>rapl_controller</CoolingDevice>
<CoolingDevice>intel_pstate</CoolingDevice>
<CoolingDevice>intel_powerclamp</CoolingDevice>
<CoolingDevice>cpufreq</CoolingDevice>
<CoolingDevice>Processor</CoolingDevice>
</CoolingDeviceOrder>
That is the only Thermald file I have been able to find.
Is Thermald even worth having? Should I uninstall it?
Ok, that means thermald is in “auto” mode, and that is not working obviously. Its worth having when properly configured, especially on “newer” Intel hardware. You can remove it and then rely on ACPI to handle everything.
But before you do that, try generating a proper config with dptfxtract
.
I downloaded/installed dptfxtract. This is what it says on the Github page:
“DISCONTINUATION OF PROJECT
This project will no longer be maintained by Intel.
Thermald version 2.0 and later has in built parser for thermal tables. So this utility is not required.
Make sure that thermald “–adaptive” option is used.”
I have Thermald 2.5.1.
Anyway, this is the output when I ran it:
Hey, yes its required. I am using it right now on a couple of my laptops that have much newer and older hardware than yours. How did you install it? You will need to use dtpfxtract-bin
or dtpfxtract-static-bin
package.
Anyway, if it can not find the tables, I would suspect something else is not working on your side, besides thermald which is not working anyway. Next step would be to remove thermald and try cpufreq/cpupower tools and utils instead.
I installed dtpfxtract-bin
, which gave me that “no tables found” output. I will remove Thermald. When I installed Thermald I also installed auto-cpufreq. So, if I understand your comment well, either Thermald or auto-cpufreq is sufficient. I don’t know what utils is.
Nevertheless, there is something odd going on. Yesterday I charged my battery to full and put it on hibernation during the night and most of today. When I opened the computer the battery was still at 99%!
When it is at 40% and I put it on hibernation for the night and open it the next day, the battery level has dropped to 0%, that’s right: zero percent%.
Anyway, I have removed Thermald and dtpfxtract and see what the result is. If there is something else I should do, please tell me because I don’t have a clue.
I just noticed I have auto-cpufreq installed 2x:
Should I remove the version from chaotic-aur?
P.S. I rebooted after removing Thermald and the computer does seem to start up quicker. I could be imagining it.
Yes, that’s true.
Is this pacseek
? What is the output of
yay -Qs cpufreq
Can you, also, send the output of this command
journalctl -b -0 | eos-sendlog
Still the same unfortunately
Jan 16 18:13:14 peter-81fl kernel: intel_pstate: Disabling energy efficiency optimization
Jan 16 18:13:20 peter-81fl kernel: intel_pstate: Turbo disabled by BIOS or unavailable on processor
...
Jan 16 18:13:29 peter-81fl kernel: RIP: 0010:cpuidle_enter_state+0xe2/0x420
But how did you uninstall thermald
? Systemd is trying to restart it repeatedly and spamming your logs with
Jan 16 18:13:17 peter-81fl systemd[1]: Starting Thermal Daemon Service...
Jan 16 18:13:17 peter-81fl systemd[471]: thermald.service: Failed to locate executable /usr/bin/thermald: No such file or directory
Jan 16 18:13:17 peter-81fl systemd[471]: thermald.service: Failed at step EXEC spawning /usr/bin/thermald: No such file or directory
Jan 16 18:13:17 peter-81fl systemd[1]: thermald.service: Main process exited, code=exited, status=203/EXEC
Jan 16 18:13:17 peter-81fl systemd[1]: thermald.service: Failed with result 'exit-code'.
...
Jan 16 18:13:18 peter-81fl systemd[1]: thermald.service: Scheduled restart job, restart counter is at 5.
Auto-cpufreq gets loaded and is not doing anything, managed to find just this
Jan 16 18:13:18 peter-81fl systemd[1]: Started auto-cpufreq - Automatic CPU speed & power optimizer for Linux.
Can you boot live install ISO and send the logs from there? Just to make sure it works on “clean” system
Not sure if this might be relevant in this case but I had some issues to get cpu scaling right with intel_pstate on an Intel machine.
The Liquorix kernel seems to be doing it fine on that machine. It uses acpi_cpufreq and the processor’s frequency can finally go down to 400 MHz from 900-2000 MHz.
See:
Since I don’t need automatic scaling up an down as it is done with auto-cpufreq,
I just use tlp
which permits fine- tuning of some parameters to some extent.
1st I removed dtpfxtract because of dependencies I could not remove Thermald. So, after removing dtpfxtract I removed Thermal too with yay -Rs thermald
.
OMG, how on earth did that happen? Could it be because of the 2 installations I mentioned? Should I remove 1 of them?
- You mean booting from the USB I used to do the original EOS installation with?
- How do I collect the logs and how do I send them.
Apologies for such basic questions.
This is your only assessment, that I totally agree with.
Since the info you have been providing are partial (and sometimes not well formatted), I would suggest that you aim at the KISS principle: Start at the cleanest state possible and work your troubleshooting from there on.
Some of my questions that I would ask for some extra info are:
- How old is your system installation?
- Was this excess battery drain noticed from the start? If not, what had you changed on your system, just before you noticed it?
- Had you ever used the same hardware with (any) other Linux system and had normal battery discharging?
- When posting what you are doing on configuration, post exact file contents in code format, which preserves important details, and not only parts. For example, disabling a udev rule is done in a specific method (deleting the file), not with inline comments, while you should post the full file contents.
- Have you been tracking your custom configuration on system files (
/etc/*
,/usr/*
, etc.)? This is significant, since it seems, IMHO, highly possible to have created this problem. - How many DEs have you been using, and since when? I mean, your first report says xfce, but we see many kde related stuff. Was this done by the installer, or how did you decide on it?
- Why would you install conky, while we would like to minimize daemons during troubleshooting on background running jobs (daemons)?
My sincere suggestion is to reinstall carefully, if not too disturbing, and do not change anything on the system until you have a better clue on the battery drain issue.
If re-installation sounds too much:
- Create a new user and login to see if the same happens
- Undo
/etc/
etc configurations, and uninstall power-related utilities, like thermald, autocpufreq, stacer, etc. A standard Arch system (and kernel) does quite well on those issues. Then, if you notice strange behavior, try one solution at a time. - Get systemd service unit lists, to watch for daemons that you may ignore.
systemctl list-units -t service --all
systemctl --user list-units -t service --all
systemctl list-units -t timer --all
Good luck!
When I was installing these things I think I did tlp and auto-cpufreq together, then uninstalled tlp when someone told me the 2 don’t work together well.
So, are you suggesting I remove auto-cpufreq (jake99 just established it isn’t working anyway) and install tlp? If so, please tell me what steps to take post-installation because it seems with both Thermald and auto-cpufreq I screwed up the (post?)installation process.
Some questions from my side:
- Should I answer your 7 questions or did you just ask them philosophically?
- If you don’t need those answered, what do you mean by “reinstall”? Reinstall some of these utilities (as your bullet list seems to suggest) or reinstall EOS completely? If it is the latter, then thermald, auto-cpufreq and stacer are not installed anyway.
I couldn’t guarantee that you will get better battery life with tlp
.
For me it was a process of “trials and errors” to finally get it somewhat right.
I was gladly surprised that installing the Liquorix kernel did scale down the processor’s frequency to 400 MHz which was reported as the min. hardware limit.
You did right. auto-cpufreq
should not be used in conjunction with tlp
. It says so as well on auto-cpufreq’s Github page.
Have a look at the ArchWiki for tlp
> https://wiki.archlinux.org/title/Tlp
You could use tlpui
from AUR if you feel you need a GUI for configuring tlp’s config file.
Just don’t change too many parameters at once. This would help you to find out what are the consequences of the parameter you just changed an it would be easier to keep track of things.
If you choose to install tlp have a look at
tlp --help
man tlp
You would need to enable and start tlp.service and also restart the service whenever you modify tlp’s config file.