Booting is stuck on "A start job is running for /dev/disk/by-uuid.." forever

Hello, my system is currently stuck on the booting phase with the following message:

[OK] Reached target Basic System
[***] A start job is running for /dev/disk/by-uuid/... (12mins 20seconds/no limit)

and it seems to be running forever

Here’s what I did recently before encountering this problem:
I ran out of ram memory and the system appeared to be frozen, I had to force shutdown by pressing the power button for a few seconds, then restarted it again. on the boot it showed something along the lines of “EndavourOS file system check” for a few seconds but then it booted normally. I used PC for a few hours then decided to restart it and this is where I encountered the said problem.

so I tried booting from a live usb iso to back up the home directory then do a clean reinstall but I couldn’t find the SSD that had linux installed on displayed in thunar, only the other HDD appeared there.

I opened gparted and it gave me the following error message:
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
then it showed me my SSD with apparent problems with sdb1 and sdb2(that has my home directory):

checking for information of sdb1 gives the following errors:

Can't open /dev/sdb1: No such file or directory
Cannot initialize '::'
mlabel: Cannot initialize drive

Can't open /dev/sdb1: No such file or directory
Cannot initialize '::'

Can't open /dev/sdb1: No such file or directory
Cannot initialize '::'

Unable to read the contents of this file system!
Because of this some operations may be unavailable.
The cause might be a missing software package.
The following list of software packages is required for fat32 file system support:  dosfstools, mtools.

checking for information of sdb2 gives the following errors:

e2label: No such file or directory while trying to open /dev/sdb2
Couldn't find valid filesystem superblock.

tune2fs 1.47.0 (5-Feb-2023)

tune2fs: No such file or directory while trying to open /dev/sdb2
Couldn't find valid filesystem superblock.

Couldn't find valid filesystem superblock.

dumpe2fs 1.47.0 (5-Feb-2023)
dumpe2fs: No such file or directory while trying to open /dev/sdb2

Unable to read the contents of this file system!
Because of this some operations may be unavailable.
The cause might be a missing software package.
The following list of software packages is required for ext4 file system support:  e2fsprogs v1.41+.

I tried running fsck and got the following error:

[liveuser@eos-2023.08.05 ~]$ sudo fsck.ext4 -v /dev/sdb2
e2fsck 1.47.0 (5-Feb-2023)
fsck.ext4: No such file or directory while trying to open /dev/sdb2
Possibly non-existent device?

I tried running e2fsck and got the following error:

sudo e2fsck -f -C 0 /dev/sdb2
e2fsck 1.47.0 (5-Feb-2023)
e2fsck: No such file or directory while trying to open /dev/sdb2
Possibly non-existent device?

I tried running lsblk and it seems to be able to see the other HDD partitions normally (sda1, sda2…etc) but for the SSD that has linux on, it only shows sdb without any partitions. the only thing that recognized sdb had partitions was gparted.

You have issues on the disk level, you cannot fix those by running commands one level deeper on the partitions. First, I would run a SMART check on the ssd with sudo smartctl -a /dev/sdb just to make sure it’s really the aftermath of low RAM and nothing else. A sudo smartctl -t short /dev/sdb && sudo smartctl -a /dev/sdb also won’t hurt.

With sudo fdisk /dev/sdb and then x for Expert menu you should be able to verify the partition table and fix it.

here’s the output of sudo smartctl -a /dev/sdb

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 250GB
Serial Number:    S3Y9NMFN800086W
LU WWN Device Id: 5 002538 ec08da284
Firmware Version: RVT04B6Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Nov 11 09:40:08 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  85) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       5551
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       2617
177 Wear_Leveling_Count     0x0013   098   098   000    Pre-fail  Always       -       31
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   073   049   000    Old_age   Always       -       27
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   093   093   000    Old_age   Always       -       6953
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       369
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       14077265195

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Aborted by host               90%      5347         -
# 2  Short offline       Aborted by host               90%      5137         -
# 3  Short offline       Aborted by host               90%      5106         -
# 4  Short offline       Completed without error       00%      4808         -
# 5  Short offline       Aborted by host               90%      4736         -
# 6  Short offline       Aborted by host               80%      4691         -
# 7  Short offline       Aborted by host               90%      4579         -
# 8  Short offline       Aborted by host               90%      4466         -
# 9  Short offline       Aborted by host               90%      4445         -
#10  Short offline       Aborted by host               90%      4445         -
#11  Short offline       Aborted by host               90%      4439         -
#12  Short offline       Aborted by host               80%      4439         -
#13  Short offline       Aborted by host               90%      4439         -
#14  Short offline       Aborted by host               70%      4426         -
#15  Short offline       Aborted by host               90%      4407         -
#16  Short offline       Aborted by host               90%      4402         -
#17  Short offline       Aborted by host               90%      4322         -
#18  Short offline       Aborted by host               90%      4322         -
#19  Short offline       Aborted by host               90%      4311         -
#20  Short offline       Aborted by host               90%      4280         -
#21  Short offline       Aborted by host               90%      4280         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

and here’s the output of sudo smartctl -t short /dev/sdb && sudo smartctl -a /dev/sdb

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Sat Nov 11 09:43:48 2023 UTC
Use smartctl -X to abort test.
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.4.8-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 250GB
Serial Number:    S3Y9NMFN800086W
LU WWN Device Id: 5 002538 ec08da284
Firmware Version: RVT04B6Q
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Nov 11 09:41:48 2023 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249)	Self-test routine in progress...
					90% of test remaining.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  85) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       5551
 12 Power_Cycle_Count       0x0032   097   097   000    Old_age   Always       -       2617
177 Wear_Leveling_Count     0x0013   098   098   000    Pre-fail  Always       -       31
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   072   049   000    Old_age   Always       -       28
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   093   093   000    Old_age   Always       -       6953
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       369
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       14077265195

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Aborted by host               90%      5347         -
# 2  Short offline       Aborted by host               90%      5137         -
# 3  Short offline       Aborted by host               90%      5106         -
# 4  Short offline       Completed without error       00%      4808         -
# 5  Short offline       Aborted by host               90%      4736         -
# 6  Short offline       Aborted by host               80%      4691         -
# 7  Short offline       Aborted by host               90%      4579         -
# 8  Short offline       Aborted by host               90%      4466         -
# 9  Short offline       Aborted by host               90%      4445         -
#10  Short offline       Aborted by host               90%      4445         -
#11  Short offline       Aborted by host               90%      4439         -
#12  Short offline       Aborted by host               80%      4439         -
#13  Short offline       Aborted by host               90%      4439         -
#14  Short offline       Aborted by host               70%      4426         -
#15  Short offline       Aborted by host               90%      4407         -
#16  Short offline       Aborted by host               90%      4402         -
#17  Short offline       Aborted by host               90%      4322         -
#18  Short offline       Aborted by host               90%      4322         -
#19  Short offline       Aborted by host               90%      4311         -
#20  Short offline       Aborted by host               90%      4280         -
#21  Short offline       Aborted by host               90%      4280         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
  256        0    65535  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

running sudo fdisk /dev/sdv still gives the following warning before giving fdisk options: The primary GPT table is corrupt, but the backup appears OK, so that will be used.

I don’t really have any experience when it comes to dealing with disk errors so can you tell me what to exactly do? I greatly appreciate your help!

first thing you should attempt to do is get a backup of the disk. you may need to run a program like ddrescue or testdisk in order to make this possible.

you could try repairing the disk with gparted
Device>Attempt Data Rescue

I’ve never used the gparted way but it may work for you. After you get all data from disk you should do a badblocks check to determine if the drive needs replacing

1 Like

Follow @thefrog 's advice, you could also try with Clonezilla to get an easy partition backup.

Your SMART data doesn’t give much except that the disk reports to be healthy. Your self tests are all interrupted and cannot give information, maybe overly aggressive power saving settings? Strange since the short test runs only for two minutes.

This should be fixed as well, but later.

After backup or if you like to live dangerously, try to fix the partition table with gparted or parted.

I managed to fix it!

@emk2203 I have no idea why the self tests got interrupted because the output was instantaneous which is kinda concerning. I’d love to learn how to investigate that but thanks for bringing this to my attention!

here’s what I did to fix it:
I did a backup of my gpt partition table (in case things went wrong) using the following command:
sudo sfdisk -d /dev/sdb > PT_sdb.txt

then I used gdisk:
sudo gdisk /dev/sdb

to print the partition table I chose p and it showed both sdb1 and sdb2 like gparted did.
then I tried verifying sdb by pressing v, it gave me this error:

Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!

Warning! One or more CRCs don't match. You should repair the disk!
Main header: OK
Backup header: OK
Main partition table: ERROR
Backup partition table: OK

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: damaged

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************

so I chose repair by pressing r, then loaded the backup partition table by pressing c, confirmed my choice, pressed w to write the table to disk, then verified again to make sure everything is ok using v.
I restarted pc, and voilà everything was running without problems again.

Here’s a very useful resource that helped me greatly: https://rodsbooks.com/gdisk/repairing.html

Thank you @emk2203 and @thefrog for your help!

2 Likes

Good to know that gdisk works for this, and thanks for the reference to rodsbooks.com.

1 Like

Nice problem solving!

Thanks for posting back and sharing the solution!

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.