Have you ever come across a situation where you don't have space in your linux machine and when you check df command, you see
vignesh@localhost:~$ df -h Filesystem Size Used Avail Use% Mounted on tmpfs 768M 2.2M 766M 1% /run /dev/nvme0n1p4 222G 222G 0 100% / <------ tmpfs 3.8G 697M 3.1G 19% /dev/shm tmpfs 5.0M 4.0K 5.0M 1% /run/lock tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup /dev/nvme0n1p1 970M 216M 754M 23% /boot /dev/nvme0n1p3 241M 5.2M 236M 3% /boot/efi /dev/sda2 256G 1.9G 255G 1% /home/vignesh/VMs /dev/sda1 256G 3.1G 253G 2% /home/vignesh/Downloads /dev/sda3 420G 115G 305G 28% /home/vignesh/WorkFiles tmpfs 768M 176K 768M 1% /run/user/1000
and you try the du command to find which folder and through that, which file is using this much space and you end up seeing that the total of du command and the one showed by the df command does not match.
vignesh@localhost:~$ sudo du -sch /* [sudo] password for vignesh: 0 /bin 182M /boot 0 /cdrom 80K /dev 17M /etc 130G /home 0 /lib 0 /lib32 0 /lib64 0 /libx32 0 /media 0 /mnt 667M /opt du: cannot access '/proc/51557/task/51557/fd/4': No such file or directory du: cannot access '/proc/51557/task/51557/fdinfo/4': No such file or directory du: cannot access '/proc/51557/fd/3': No such file or directory du: cannot access '/proc/51557/fdinfo/3': No such file or directory 0 /proc 7.7M /root du: cannot access '/run/user/1000/gvfs': Permission denied 2.3M /run 0 /sbin 1.9G /snap 0 /srv 0 /sys 56K /tmp 8.3G /usr 8.4G /var 150G total
And now you are like
Let's fix it
To fix it, you have to find which process is keeping the deleted file open and kill it.
To find the process ID, you have two ways.
Finding deleted file from proc
To find all the files that are kept open by all the processes, you can run the following command.
sudo find /proc/*/fd -ls
We only want to see the deleted files so for that we can run
find /proc/*/fd -ls | grep '(deleted)'
This shows all the open files and the process number which has opened that file
vignesh@localhost:~$ sudo find /proc/*/fd -ls | grep '(deleted)' [sudo] password for vignesh: 1222058 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/32 -> /tmp/.org.chromium.Chromium.hX85lB\ (deleted) 1222060 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/34 -> /dev/shm/.org.chromium.Chromium.EatHOj\ (deleted) 1222062 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/36 -> /dev/shm/.org.chromium.Chromium.zBwPPh\ (deleted) 1222063 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/38 -> /dev/shm/.org.chromium.Chromium.N0KVxf\ (deleted) 1222064 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/39 -> /dev/shm/.org.chromium.Chromium.LwqpXZ\ (deleted) 1222065 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/40 -> /dev/shm/.org.chromium.Chromium.dWTkhn\ (deleted) 1222066 0 lrwx------ 1 vignesh vignesh 64 Jul 13 15:03 /proc/13964/fd/41 -> /dev/shm/.org.chromium.Chromium.PY7RYu\ (deleted)
Here the process ID of the process is obtained from the path in the proc folder.
For the first entry in the above example, the path is /proc/13964/fd/32
Here, the PID is 13964. So generally, the format is
Using lsof command
Most linux machines come with lsof command, if not, you can use the previous method.
The following command shows all the Open but deleted files
sudo lsof +L1
That give the following output
vignesh@localhost:~$ sudo lsof +L1 lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs Output information may be incomplete. COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME pipewire 2557 vignesh 26u REG 0,1 2312 0 2 /memfd:pipewire-memfd (deleted) pipewire 2557 vignesh 29u REG 0,1 2312 0 1027 /memfd:pipewire-memfd (deleted) pulseaudi 2559 vignesh 6u REG 0,1 67108864 0 2051 /memfd:pulseaudio (deleted) gnome-she 2725 vignesh 35r REG 259,4 828 0 272568322 /home/vignesh/.local/share/gvfs-metadata/home (deleted) gnome-she 2725 vignesh 37r REG 259,4 64 0 272338791 /home/vignesh/.local/share/gvfs-metadata/root (deleted) gnome-she 2725 vignesh 52u REG 0,1 51027 0 7183 /memfd:mutter-shared (deleted)
Here the PID of the process is shown under the PID column
Now it's time to kill the process
You can kill the process with the following command
sudo kill -9 <PID>
If the process was a service, then you can restart the service or if that doesn't work, you can first kill the process and the restart the service using systemctl or init
If you would like to know why this happened, read on.
The df and du command checks the file spaces in entirely different ways.
The df command
The df command checks the partition space based on the block usage.
When a file is written, the file is physically coded at some place in the secondary storage based on the availability. This path can be really random at time. The system then find this place by using an index in the file system. The index tells the file location pointing to the exact sector or block of storage where the file starts and ends.
When the df command is issued, it checks the index to find which files exists and then calculates the total used space based on the total from the start and end blocks of all the indexed files.
This is really fast and hence running a df command completes in less than a second.
The du command
The du command is more of a directory and file based size checking.
In linux, when a file is written, the file is hard linked to the index location in the disk. When a user or program accesses the file, we specify the hard link path and the program uses the hard link path to find the index and then the index is used for further processing of the file.
This way we can create multiple hard links or soft links for a single file. The only restriction for a hard link is that all the hard links should be in the same partition as the file, but since soft links are linking to the hard link paths, they can be in any partition.
The du command actually uses the hard links and calculates the file size from that. This is required since the du command needs to show the storage space usage inside a directory and not the file system.
Thus, du checks each and every hard link that exists in that directory recursively to find the total space used by all the file.
This is a really slow process and can sometimes take more than 10 seconds.
What's the problem?
If a process keeps a file open and keeps writing to it, it is directly referencing the disk index. This means that during this access, you can delete the file in Linux. Linux will not throw a "File is currently in use" error unlike windows.
When you delete the file, what actually happens is that the hard link to that disk index is deleted but since the process is working on that file, the disk index is not removed by the file system.
Now what happens is that when you run the df command, you see the disk is filling up or if the process isn't stopped, the disk will reach 100% but when you try to use the du command to find the location of the file, you will not be able to find the file and the total of the du command run on / will not add up to the total / disk capacity.
I.e, du cannot see the hard link which is already deleted and hence does not include it in the calculation but df which uses the disk index uses it in calculation because that space is not actually free.
So df is more accurate than du but du is more precise than df.