Best practices on Lustre parallel file systems
- 1 Introduction
- 2 Best practices
- 2.1 ls vs. ls -l
- 2.2 Avoid Having a Large Number of Files in a Single Directory
- 2.3 Avoid Accessing Small Files on Lustre Filesystems
- 2.4 Use a Stripe Count of 1 for Directories with Many Small Files
- 2.5 Avoid Accessing Executables on Lustre Filesystems
- 2.6 Increase the Stripe Count for Parallel Access to the Same File
- 2.7 Restripe Large Files
- 2.8 Limit the Number of Processes Performing Parallel I/O
- 2.9 Avoid Repetitive "stat" Operations
- 2.10 Avoid Having Multiple Processes Open the Same File(s) at the Same Time
- 2.11 Avoid Repetitive Open/Close Operations
- 3 Troubleshooting
- 4 Working with stripes (advanced users)
Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. Files are distributed across multiple servers, and then striped across multiple disks.
A Lustre file system has three major functional units:
- Metadata servers (MDS) that stores namespace metadata, such as filenames, directories, access permissions, and file layout.
- Object storage server (OSS) nodes that store file data on one or more object storage target (OST) devices.
- Client(s) that access and use the data.
When a client accesses a file, it performs a filename lookup on the MDS. When the MDS filename lookup is complete and the user and client have permission to access and/or create the file, then the layout of an existing file is returned a new file is created.
For read or write operations, the client then interprets the file layout, which maps the file logical offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSS nodes.
After the initial lookup of the file layout, the MDS is not normally involved in file IO operations since all block allocation and data IO is managed internally by the OST. Clients do not directly modify the objects or data on the OST filesystems, but instead delegate this task to OSS nodes.
ls vs. ls -l
If you run the ls command for listing a file or a directory, then it will query the MDS for this information. But when running the command with the -l option, it will also need to access the OSS to look up the file size, which creates additional load on the storage system.
- Use ls if you would like to list files and directories
- Only use ls -l if you also need to know about the file size
Avoid Having a Large Number of Files in a Single Directory
Avoid Accessing Small Files on Lustre Filesystems
Use a Stripe Count of 1 for Directories with Many Small Files
Avoid Accessing Executables on Lustre Filesystems
Increase the Stripe Count for Parallel Access to the Same File
Restripe Large Files
Limit the Number of Processes Performing Parallel I/O
Avoid Repetitive "stat" Operations
Avoid Having Multiple Processes Open the Same File(s) at the Same Time
Avoid Repetitive Open/Close Operations
Working with stripes (advanced users)
Lustre will always try to distribute your data across all OSTs. The striping parameters can be tuned per file or directory.
How to display the current striping settings
The default stripe setting of a file or directory can be shown with the command lfs getstripe:
[sfux@eu-login-24-ng ~]$ lfs getstripe $SCRATCH/__USAGE_RULES__ /cluster/scratch/sfux/__USAGE_RULES__ lmm_stripe_count: 1 lmm_stripe_size: 1048576 lmm_pattern: 1 lmm_layout_gen: 0 lmm_stripe_offset: 3 obdidx objid objid group 3 619261 0x972fd 0 [sfux@eu-login-24-ng ~]$
For directories, use the -d option
[sfux@eu-login-24-ng ~]$ lfs getstripe -d $SCRATCH stripe_count: 1 stripe_size: 1048576 stripe_offset: -1 [sfux@eu-login-24-ng ~]$
- stripe_count = -1 : Use the filesystem default stripe count (= spread data to all OSTs)
- stripe_size = 1048576 : Use 1 MiB stripe/chunk size
- stripe_offset = -1: Let Lustre choose the next OST (you shouldn't change this)
How to change stripe settings
The stripe setting of a directory can be changed with the command lfs setstripe.
- You can not change the striping of existing files
- You can always change the striping parameters of an existing directory
- It is possible to create files with non-default striping parameters with the lfs command
- A subdirectory inherits all stripe parameters from its parent directory (if not changed via lfs setstripe)