This topic describes the causes of the following issue and how to resolve the issue: A large number of small files in a Windows New Technology File System (NTFS) file system cause space statistic exception.
Disk space is occupied by a large number of files smaller than 1 KB in size
Problem description
For example, 409,600 small files in the Windows NTFS file system on a disk occupy 1.56 MB of disk space. Each file contains several characters and is approximately several bytes in size. However, 594 MB of space on the disk is occupied, which is significantly different from the space statistic of the NTFS file system.
Cause
This issue is caused by the usage of Master File Table (MFT) in the NTFS file system and is related to the file storage method in the NTFS file system. The following figure shows the simple structure of the formatted NTFS file system.
BOOT stores the basic information of the NTFS file system, such as the cluster size. The file information of the NTFS file system is stored in MFTs. Each MFT record is fixed at 1 KB in size. Each record corresponds to a file or another file system object. The record is in the following format: Header + Attribute 1 + Attribute 2 + ...... + Attribute N. The attributes include the name, length, and modification time of the file. If the size of the file information is small enough, 1 KB of space in the record is not used up, and the remaining space can be used to hold file content. When the file size is small enough, DATA stores the content of the file. When the file size is large enough, DATA stores a pointer, which points to another area to save larger amounts of data. In this issue, 409,600 files consume 409,600 MFT records, which take up 400 MB of space in total. NTFS file system logs and bitmaps consume 594 MB of space in total. In this case, you can also run the chkdsk d:
command to obtain the information about system usage.
You can also use the WinHex program to obtain more detailed information about system usage. Download WinHex and run it, and then run the Tools-Open Disk
command to open the disk that you want to analyze to obtain detailed information. For example, MFT occupies 400 MB of space, LogFile occupies 64 MB of space, and BitMap occupies 3.1 MB of space.
Solution
When a file is deleted from the space that is allocated in MFT, the file is masked as empty and is not recycled. This helps you recreate files in an efficient manner. If you want to release the file, you must use a third-party disk cleanup tool. If the partition capacity is not large, you can also create a large file to occupy space. Then, the system releases the deleted MFT space. To reduce the space occupied by MFT or reduce the usage of a large number of small files in the business logic, we recommend that you compress and back up small files on a regular basis.
Disk space is occupied by a large number of small files larger than 1 KB in size
Problem description
For example, 409,600 small files are stored on a disk. Each file is 1 KB in size and the total size is 400 MB. However, the files occupy 1.56 GB of disk space.
Cause
NTFS file systems use a cluster size as an allocation unit to allocate and manage space. By default, the minimum cluster size is 4 KB. When the file size is smaller than 4 KB or the amount of space to allocate is less than 4 KB, the actual allocation size is still one cluster of 4 KB. As a result, the amount of disk space occupied by files exceeds the size of the files.
Solution
If you have a large number of files smaller than 4 KB in size, you can compress and back up files to reduce disk usage. If the small files have a fixed size, you can specify a smaller cluster size during disk formatting to prevent space waste. When you format the disk, you can specify a cluster size. By default, the allocation unit size is 4,096 bytes.
Disk formatting poses risks. Before you format the disk, create snapshots for the disk to back up data to prevent data loss. For more information about how to create a snapshot, see Create a snapshot of a disk.
References
For more information about how to identify and resolve disk issues in NTFS file systems, see Locate and correct disk space problems on NTFS volumes.