Best practices for MFT node runtime deployment

This topic includes best practice recommendations for deploying and managing MFT node runtimes, focusing on load balancing and optimizing performance, including the use of multiple nodes on the same server.

Node deployment and workload splitting

Distributing the workload across multiple nodes helps prevent any single node from becoming a bottleneck, especially with high traffic or a large number of flows.

Limit flow endpoints per node
- For transfers to the cloud, limit the load to around 20 flow endpoints per node.
- For local-to-local transfers, limit the load to around 50 flow endpoints per node.
info
These are not strict limits, but recommended thresholds for managing load.
Split the workload - We recommend splitting flow endpoints between multiple nodes. You can do this by:
- Deploying nodes on different servers
- Installing multiple nodes on the same server
tip
Implement this strategy by installing a new node on a recommended drive, such as D: with fast disk I/O, on the existing server. Then, transfer a portion of the flows to the new node, which will immediately reduce the load on the initial node.

Resource allocation and disk optimization

The MFT node uses an internal disk cache for all transfers, making disk resources critical for performance.

Fast disk I/O - The server running the node must have fast disk I/O for the disk cache. Standard disks (often network storage) might be too slow and cause bottlenecks.
Sufficient disk space - Ensure the server and the drive where the node is installed (including its cache) have enough disk space to handle the largest potential influx of files. Running out of local cache storage can cause processing to stop.
Node installation location - Avoid installing on the C: drive. Use a dedicated, fast drive like D: for the installation and cache.
Sufficient resources - The server's overall resources, including network bandwidth, CPU, and disk I/O, must be sufficient to handle the entire traffic load, as every file will be copied into the internal cache before being sent to its destination.

Strategic traffic splitting

While splitting by flow count is effective, strategically separating transfer types can help mitigate specific bottlenecks.

Isolate cloud traffic - Consider separating flows by transfer direction to prevent bottlenecks caused by cloud-side performance.
- Place flows that read from a local source and send to the cloud on one set of nodes.
- Place flows that read from the cloud and send to a local source on a different set of nodes.
Flow endpoint execution - Node execution is determined per flow endpoint, not per entire flow. This means you can distribute different target endpoints of a single flow across multiple nodes without limitation.

Addressing queued and slow transfers

To proactively address issues like queued files and slow transfers, we recommend the following steps:

Monitor activity - Use the Activity page to monitor file progress and identify if files are being transferred as expected.
Investigate queuing - The primary current troubleshooting focus is on understanding why files remain in a Queued state for extended periods, as this suggests an issue with queue processing, potentially separate from file removal or permissions errors.
Load testing - After implementing node splitting and upgrading resources, conduct load testing to confirm the system can handle the expected traffic volume and determine if resource limitations have been resolved.

Troubleshooting and log gathering

Review logs - Review the node logs, focusing on periods when slowness or queued transfers were observed. Permissions errors could indicate unauthorized access exceptions. File-not-found errors could indicate that a file was removed mid-transfer.
File deletion - Files are automatically deleted from the source upon pickup* as part of the transfer transaction. Manually removal mid-transfer can lead to errors as the node attempts to process the missing file.