Difference between revisions of "Best practices on Lustre parallel file systems"

From ScientificComputing
Jump to: navigation, search
(Best practices)
Line 4: Line 4:
 
A Lustre file system has three major functional units:
 
A Lustre file system has three major functional units:
  
* Metadata servers (MDS) nodes that has one or more metadata target (MDT) devices per Lustre filesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout. The Lustre metadata server is only involved in pathname and permission checks, and is not involved in any file I/O operations
+
* Metadata servers (MDS) that stores namespace metadata, such as filenames, directories, access permissions, and file layout.  
 
* Object storage server (OSS) nodes that store file data on one or more object storage target (OST) devices.
 
* Object storage server (OSS) nodes that store file data on one or more object storage target (OST) devices.
 
* Client(s) that access and use the data.
 
* Client(s) that access and use the data.
  
When a client accesses a file, it performs a filename lookup on the MDS. When the MDS filename lookup is complete and the user and client have permission to access and/or create the file, either the layout of an existing file is returned to the client or a new file is created on behalf of the client.
+
When a client accesses a file, it performs a filename lookup on the MDS. When the MDS filename lookup is complete and the user and client have permission to access and/or create the file, then the layout of an existing file is returned a new file is created.
  
For read or write operations, the client then interprets the file layout in the logical object volume (LOV) layer, which maps the file logical offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSS nodes.
+
For read or write operations, the client then interprets the file layout, which maps the file logical offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSS nodes.
  
 
After the initial lookup of the file layout, the MDS is not normally involved in file IO operations since all block allocation and data IO is managed internally by the OST. Clients do not directly modify the objects or data on the OST filesystems, but instead delegate this task to OSS nodes.
 
After the initial lookup of the file layout, the MDS is not normally involved in file IO operations since all block allocation and data IO is managed internally by the OST. Clients do not directly modify the objects or data on the OST filesystems, but instead delegate this task to OSS nodes.
Line 16: Line 16:
 
==Best practices==
 
==Best practices==
  
===Directory listings===
+
===Avoid using ls -l===
  
==Working with stripes==
+
===Avoid Having a Large Number of Files in a Single Directory===
 +
 
 +
===Avoid Accessing Small Files on Lustre Filesystems===
 +
 
 +
===Use a Stripe Count of 1 for Directories with Many Small Files===
 +
 
 +
===Avoid Accessing Executables on Lustre Filesystems===
 +
 
 +
===Increase the Stripe Count for Parallel Access to the Same File===
 +
 
 +
===Restripe Large Files===
 +
 
 +
===Limit the Number of Processes Performing Parallel I/O===
 +
 
 +
===Avoid Repetitive "stat" Operations===
 +
 
 +
===Avoid Having Multiple Processes Open the Same File(s) at the Same Time===
 +
 
 +
===Avoid Repetitive Open/Close Operations===
 +
 
 +
==Working with stripes (advanced users)==
 +
Lustre will always try to distribute your data across all [[OST]]s. The striping parameters can be tuned '''per file''' or '''directory'''.

Revision as of 09:14, 12 February 2019

Introduction

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. Files are distributed across multiple servers, and then striped across multiple disks.

A Lustre file system has three major functional units:

  • Metadata servers (MDS) that stores namespace metadata, such as filenames, directories, access permissions, and file layout.
  • Object storage server (OSS) nodes that store file data on one or more object storage target (OST) devices.
  • Client(s) that access and use the data.

When a client accesses a file, it performs a filename lookup on the MDS. When the MDS filename lookup is complete and the user and client have permission to access and/or create the file, then the layout of an existing file is returned a new file is created.

For read or write operations, the client then interprets the file layout, which maps the file logical offset and size to one or more objects, each residing on a separate OST. The client then locks the file range being operated on and executes one or more parallel read or write operations directly to the OSS nodes.

After the initial lookup of the file layout, the MDS is not normally involved in file IO operations since all block allocation and data IO is managed internally by the OST. Clients do not directly modify the objects or data on the OST filesystems, but instead delegate this task to OSS nodes.

Best practices

Avoid using ls -l

Avoid Having a Large Number of Files in a Single Directory

Avoid Accessing Small Files on Lustre Filesystems

Use a Stripe Count of 1 for Directories with Many Small Files

Avoid Accessing Executables on Lustre Filesystems

Increase the Stripe Count for Parallel Access to the Same File

Restripe Large Files

Limit the Number of Processes Performing Parallel I/O

Avoid Repetitive "stat" Operations

Avoid Having Multiple Processes Open the Same File(s) at the Same Time

Avoid Repetitive Open/Close Operations

Working with stripes (advanced users)

Lustre will always try to distribute your data across all OSTs. The striping parameters can be tuned per file or directory.