Huge balooctl6 database file

So I casually checked how many files are indexed by the file indexer. And while doing so I noticed how big the actual database file is! Below you can see that this installation of EndeavourOS is not that old at all, not even a full year. And one of the first things I did was setting up balooctl to only index filenames and not their content. It’s not even indexing hidden files and is restricted to my home directory (where almost everything I have is mounted to).

But 23,40 GiB just for the filenames? That doesn’t sound right to me. I also checked the actual file on the filesystem with my filemanager, because I could not believe it. But it’s only using around 400 MiB of the file?

$ stat -c %w /
2023-09-10 20:10:40.000000000 +0200

$ balooctl6 status
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 980.958
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 23,40 GiB

$ balooctl6 indexSize
File Size: 23,40 GiB
Used:      396,44 MiB

PostingDB:                70,08 MiB    17.678 %
PositionDB:               77,29 MiB    19.497 %
DocTerms:                 59,00 MiB    14.882 %
DocFilenameTerms:         60,74 MiB    15.322 %
DocXattrTerms:                  0 B     0.000 %
IdTree:                   14,08 MiB     3.552 %
IdFileName:               66,92 MiB    16.881 %
DocTime:                  38,71 MiB     9.764 %
DocData:                        0 B     0.000 %
ContentIndexingDB:              0 B     0.000 %
FailedIdsDB:                    0 B     0.000 %
MTimeDB:                   9,61 MiB     2.425 %

Is that normal? Yeah I know I could just delete the DB and reindex everything again. But I would like to understand why it happens, so I can avoid it. Or if its my fault, I would like to understand what did go wrong. Anyone have a clue? Do you guys use baloo at all and if so, what are the database filesizes you get? Commands are balooctl6 status && balooctl6 indexSize .

Did you ever have file contents indexing enabled?

I wonder if the file is huge because it once held the contents?

I also index only file contents and here is my output:

File Size: 2.76 GiB
Used:      1.85 GiB

           PostingDB:     345.70 MiB    18.289 %
          PositionDB:     596.87 MiB    31.578 %
            DocTerms:     248.28 MiB    13.135 %
    DocFilenameTerms:     230.68 MiB    12.204 %
       DocXattrTerms:            0 B     0.000 %
              IdTree:      52.65 MiB     2.785 %
          IdFileName:     248.70 MiB    13.158 %
             DocTime:     128.91 MiB     6.820 %
             DocData:      12.55 MiB     0.664 %
   ContentIndexingDB:            0 B     0.000 %
         FailedIdsDB:            0 B     0.000 %
             MTimeDB:      25.83 MiB     1.366 %
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 3,594,527
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 2.76 GiB
2 Likes

I think content indexing was enabled in the beginning, but I remember deleting the database so it would only index the filenames. After a system update I always use the command balooctl6 check to search for new files to index, as part of the update routine.

Do old entries get deleted at all? I cannot find a command to do that. Maybe they “pile” up in the database. Because I actually create lot of files, move stuff around and even had some directories and such changed.

Without looking into anything: Is this an older installation on btrfs? There was a bad issue that got resolved a few months ago. Just reindex and see where you land.

balooctl6 disable
balooctl6 purge
balooctl6 enable
2 Likes

No, it’s all exclusively ext4 for the indexed drives with the indexed files. The installation is not even a year old. Reindexing is what I want to do, but I created this topic in the hopes to understand what is happening (and if this is normal at all).

That sounds reasonable. Nonetheless I would reindex first. If the index becomes unreasonably large again you can look into it. Then you know it’s reproducible and can observe the effect of future changes. If “it fixes itself” there’s not much point to ponder whatever may have caused it in the past.

1 Like

Yeah, looks like I have to. This is not the first time mind you. In a previous installation I had bigger filesize too (but don’t remember how big actually) and deleted the db, as I could not find any solution in the web. Man I had a few times problems with baloo… it just doesn’t like me. ^^ (Yep, we take everything personal.)

Okay then, thanks for the replies, I will nuke it from the orbit now. It’s the only way to be sure.
Screenshot_20240310_001924

Edit: Just if you are curious, the indexing process took only a few minutes and is already completed. I also removed the “~/Emulation” entry as well, as it is covered already in home. I feel like the search results got faster… maybe placebo, but makes sense with the huge difference. So here the stats now, if anyone is curious: 590 MiB down from 23,40 GiB!

$ balooctl6 status && balooctl6 indexSize
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 953.197
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 590,51 MiB
File Size: 590,51 MiB
Used:      382,29 MiB

PostingDB:                67,58 MiB    17.678 %
PositionDB:               74,48 MiB    19.483 %
DocTerms:                 56,72 MiB    14.837 %
DocFilenameTerms:         58,52 MiB    15.306 %
DocXattrTerms:                  0 B     0.000 %
IdTree:                   13,46 MiB     3.521 %
IdFileName:               64,84 MiB    16.961 %
DocTime:                  37,32 MiB     9.763 %
DocData:                        0 B     0.000 %
ContentIndexingDB:              0 B     0.000 %
FailedIdsDB:                    0 B     0.000 %
MTimeDB:                   9,37 MiB     2.450 %
1 Like

Yeah, keep an eye on it if it grows again. Otherwise you’re probably good.

Somehow, baloo is a mess.

If I add something to the exclude filters in ~/.config/baloofilerc it seems that baloo doesn’t clean up my added excludes in the index database.

2 Likes

Makes sense why it’s named baloo, because the filesize baloons up. Badum tsh. Jokes aside, it’s still useful and valuable to get near instant search results, as I do a lot of this kind of stuff. I have an alias to give results from current working directory and “downwards” (the /dev/null because of hiding the Elapsed time message):

alias search='baloosearch6 --directory "${PWD}" 2>/dev/null'

I only had indexing on for /home/manfred and only file indexing.

Nevertheless, today the indexer was working like hell and there were not many changes in my home directory tree.

So, I disabled baloo altogether as I don’t need it much.

Were you using btrfs or zfs? If so, that was a known issue.

I am using btrfs. I thought it was (past tense) a known issue and that it is no longer an issue. :slight_smile:

I believe it was fixed with the recent release. But that is obviously very recent.

That being said, I haven’t tried testing it myself yet.

I get the idea behind Baloo(ned), but that damn thing has been running non-stop for over a week hogging most of my resources.

I use the search feature for my files, but damn, that service is like the uninvited guest that walks in with clogs in the middle of the night instead of discreetly wearing slippers…

I wish there was a way to give it a lower resource priority instead of hogging everything. The result is good, the way to get there is sloppy and heavy-handed!

1 Like

On my older machine baloo was running constantly on one 1 cpu, like you describe. The only solution was to disable it. On my new machine I use baloo again and don’t have this issue anymore. But instead indexing file content I limit it to indexing filenames. This makes it fast and does not run all the time, but the usefulness goes down.

Tip: You can instruct baloo to update the database, in example when you go away from your computer. The command is balooctl6 check . And if you really want, you can suspend baloo for a bit until later when it can work again, with the commands balooctl6 suspend and balooctl6 resume . (Note I did not use these commands myself, no idea how well it works.) This does not solve your problem, but might help in some situations.

I wish KDE would work on a modern alternative to baloo written from scratch. However there is an older search tool in Linux that can index filenames (without content) too: locate (I think a package named mlocate) I used this in the past before baloo. It will not run in the background, but update the database only when you ask it to with sudo updatedb . But its not integrated into the other KDE suits like Dolphin or KRunner.

1 Like

What bothers me is not being able to tweak Baloo. That thing has been running over a week and hogging all resources. I do use the search feature. Perhaps, I will locate instead.

I understand KDE is full-hands on deck, but something that hogs all resources is counter-productive. At the very least, I’d like to have it only index file names and not content. With power settings putting my computer to sleep after five minutes, this damn service never gets indexing done.

I tried:

balooctl6 config exclude contentindexing

We’ll see.

I don’t know…

I haven’t used Baloo in ages, but doesn’t this work?

2 Likes

Maybe you are hitting a bug, obviously running for weeks is not the intended behavior.

Go to the File Search in the Settings. Disable it and apply. It will ask to remove the existing index file: confirm. Select files names only, enable again and apply. Observe what it’s indexing and if it hangs or indexes unintended locations you maybe want to exclude.

1 Like

This is exactly what I’m doing with baloo too. You can configure it in the KDE Settings > Search > File Search as shown in the screenshot of initial post. If you change it, I recommend to delete the database and restart the indexing. Because its probably full of old entries and indexed content. I recommend to go to the Settings page, change Data to index to File names only, then open the terminal and type following for a fresh index file: balooctl6 disable && balooctl6 purge && balooctl6 enable

1 Like