Yes this thread is mentioned in the section, “LRU inversion”. I did not realize that EndeavourOS was one of the distros that gurus and experts look at. I always assumed that Arch/Debian/Fedora/Suse were the places where all the actions took place. It is nice to know that Endeavour is among the top tier Linux Distros.
And oh boy it is a one big article. Will have to go through it. If I am not wrong I saw a link to this article in Fediverse.
Well, is it really honorable that the following zram configuration is used as an negative example ?
The article essentially explains the issue with such a configuration of the zram
But such users have unwittingly created a trap that grows more and more likely with more uptime.
The trap is this: since the swap on the zram device has the highest priority, the kernel prefers zram for all allocations. When zram fills up, it switches to the disk based swap for all future allocations.
That means that without intervention, your precious zram gets filled with whatever pages happened to be swapped out first. That is usually completely inversely correlated with the pages that you actually need now.
E.g. if you’re using a desktop environment with limited uptime sessions, you might be fine using zram this way, but if you’re deploying this configuration in a continuous use in a server context, it’s forseeable that after a certain uptime the system has exceeded the zram and will directly fall back to the slower disk based swap.
That’s a problem with this setup, if you spill over to disk it will run the hot part on the slow disk. But that’s up to the user to decide. Maybe they know, maybe they monitor their zram usage, they never exceed it, and they just have it for a peace of mind/hibernation.
Some people have strong opinions about how their setup runs great with this but was bad with that. Not worth arguing about (that’s why I stopped participating in the thread).
So I read the article. A good one, need more articles such as these. Here are the basic points that it makes
Both zswap and zram are mechanisms to trade off disk IO for CPU cycles. The former, i.e. disk IO, is slow the later, i.e. CPU cycles, is faster.
First claim in the article is that zram is dumb which zswap is not. zswap is integrated with Linux Kernel memory management, zram is not. Thus those ram/memory pages which might be needed can be kept, i.e. hot pages, in zswap while others are swapped out to disk, i.e. cold pages. zram cannot do this. zram cannot distinguish between hot and cold memory pages.
Second claim is that zswap keeps only those pages which can be compressed while zram keeps all the pages, whether they can be compressed or not.
The third claim for disk thrashing or ssd wear leveling, zram may increase disk io in certain circumstances or work loads. So if the work loads involves a lot of file modifications, think lots and lots of documents editing, coding, image/video processing, db operations, etc then zram might not be suitable. The author goes into some details w.r.t Instagram.
The issues is that it is not guaranteed that zswap is intelligent enough to keep ALL the hot pages in RAM. Especially if they cannot be compressed. For zswap there is a parameter, reject_compress_poor, that can be read from the key /sys/kernel/debug/zswap/ which can help. If this value is high and so is swap usage io then maybe zswap should not be used.
The second claim is that ram pages, even if they are hot, are sent to disk if they are not compressible. For example a 4KB memory page if it is going to be compressed to only 3.9kb can be considered as incompressible so will not go to zswap but to swap file/partition. There is a capability called as zswap.writeback introduced in Linux Kernel 6.8 and later which deals with this but is not foolproof. One more reason that we need Linux LTS Kernel 6.6.x in the main Arch repo and not in AUR where it requires over 12 hours of build time and more than 16GB of disk I/O. But I digress.
About the third claim, not everyone has a Instagram or meta type work load. And it is not clear that in desktop type environment or mobile or embedded environment what is the percentages of cold memory pages which are incompressible.
That does not mean that zram is not good or not suitable. Rather when we want to keep data off disk then zram makes sense. If zram is used without a physical backing device then this will effectively lock all anonymous data in RAM. From the article
For examples of the use cases think running a torrent or downloading a torrent for the upcoming Star Wars: Doomsday movie or financial data or if you are part of Hacking collective or going after Triumph organization servers, and so on then zram makes sense. I hope you get the drift. According to the article zram is also good for embedded systems, raspberry PI systems and equivalent. Android apparently uses zram or some phone manufacturers use zram. This Android part needs to be verified.
All in all a very good and thought provoking read. It has some very good pointers on how cgroups can be used with zswap, how zram can be made a bit smart, not intelligent, but smart. How Linux Kernel handles memory management. It talks about Out-Of-Memory (OOM) killers and how they play with swap and memory management. It talks about Fedora usage of zram by default. And many more. Go read it.
I am going to end with a claim that is going to raise a lot of hassles, bordering on blasphemy. Maybe task scheduling and memory management are two arenas where AI can and will play a significant part in making Linux Kernel a better solution. Please do not hang me in the square
While encrypted swap on disk adds complexity it’s not really complex to do. You don’t have to think about it when using zram at all, but imho it’s not something that should be considered a strong use-case indicator.
There are use cases for physical swap only, zswap, zram + physical swap and zram only. In the author’s use case, zram + physical swap is a bad alternative but that doesn’t mean it is a bad alternative for you or someone else.
The important thing is to get educated on how it all works and decide for yourself what is the best option.
Even on my own machines, I don’t always make the same choices since the use cases and physical hardware are often different between machines.
Just to annoy all of you There is a (upcoming) new kid on the block, virtual swap space, or in short VSwap. It’s an general overhaul how swap spaces are handled. This video gives a brief overview. And here is the current topic on the kernel mailing list.
And here is a 20min presentation which analyses disadvantages of the current implemention of zswap and proposes some architectural changes. Its essentially the motiviation behind this alternative approach which is currently in development.
And within this context the approach of an virtualized swap space would be a short term improvement approach and the presentation includes two alternative approaches which would require more effort / changes to be implemented.
Last but not least: The topic on the kernel mailing list includes some brief benchmarks in which an performance increase in the scale of 1/10ths of a second has been demonstrated.
In general, as the improvement proposal came from google and they state that meta also would have an interest to improve upon the current zswap implementation. Thus I conclude that may primarily cater large scale data center deployments and might only result in minor improvements for single user desktop usages.
@dalto would it be possible for you to share on what use cases you use zswap when you use zram and when do you other configurations?
Thanks @1093i3511, will have to go through it. Any idea by when will VSWAP (Virtual SWAP) be incorporated inside the Linux Kernel. Inside 7.1 or 7.3 or much sooner than that?
I don’t have a well-informed answer to that question, unfortunately. All I can tell is that it is a patch series based upon 6.19 … and that this patch series in it’s current version v5 was started initially almost a year ago.
As it addresses several subsystems, this isn’t a trivial task. And as there are potentially conflicting implementations within the mainline kernel already (zram writeback, zswap writeback isn’t). In short, it’s unlikely that it will be integrated into mainline earlier than 7.3 I assume, but I might be wrong. Usually I don’t dig around within the kernel mailing lists and don’t have a bigger picture or the interest to dive deeper into the topic of kernel development, as its way beyond my own comfort zone.
Will it be possible for you to please share your location? I would like to drop by around midnight and borrow your workstation, temporarily off course and without your permission. Be cool dont call 911.
On a serious note, on the workstation you are not using zswap, is it because you have sufficient ram or that you have a slow sdd or something else?
I was curious about @dalto’s thoughts on that note too, because I’ve been exploring the prospect of Zram / Zswap and it caught my eye that his setup was not dissimilar to my own.
I am aware that tasks that can easily consume that much RAM, such as LLM’s, are actually not swapping particularly compressible data, and so using these Zram and Zswap options may actually compound a performance problem, not improve it.
Yes, I bought that RAM at a time when one could do so without having to take a 3rd mortgage.
Yeah, it is because swap is highly infrequent and I don’t want to waste RAM on compressed swap.
Please keep in mind, I haven’t done any extensive analysis on this. I was asked how my machines were configured and I answered. It was not intended to be a recommendation for others.
Generally speaking, I would probably recommend using either zswap or zram.