Xrootd Performance

Bill Weeks & Andrew Hanushevsky

Stanford Linear Accelerator Center, February 4, 2005


Xrootd is representative of the next generation of high performance, scalable random access data servers. Like other servers in its class, xrootd achieves high levels of performance by extensively using parallel, low-latency algorithms. In order to test the real-world performance of the server, a series of BaBar analysis jobs were run against a single file. Using a single file allowed data to be served from the file system memory cache and avoided disk-speed anomalies that make performance results hard to interpret. The CPU-intensive work in the analysis job was removed to force the maximum possible request rate from each client while preserving the original data access pattern. Thus an “event” in this context represents a bounded series of server transactions.


The test was run using a single server on a Sun Microsystems Sun Fire V20Z, whose characteristics are:

A series of eight workloads were run. Each run, from 50 clients to 400 clients, in steps of 50, was identical.


Scaling: The most significant feature that the performance graph shows is that xrootd scales linearly with the number of clients (as indicated by the “CPU remaining” line). Linear scalability means that the number of clients that a single xrootd server can support is not limited by the server software but by factors such as memory, CPU, disk speed, and network interface. Linear scaling also explains why network bandwidth utilization and the number of events per seconds uniformly increase as more clients use the server.


Overhead; The amount of per-client user-space load that the server transfers to the system is low. In fact, xrootd user-level overhead (i.e., protocol framing, queue manipulations, thread scheduling, etc) accounts for only 12% of the total CPU utilization by xrootd. The vast majority of the CPU is utilized by NIC processing overhead.


Latency: We measured average server-side latency per request. Our tests showed that the server added an average of 59us for a 4K data transfer operation. The client likely sees a larger latency once network and client-side overhead is included.


Efficiency: Given the above observations, it is not surprising the number of events per second increases as the number of clients. However, the graph also shows that the rate of increase unexpectedly slows after about 200 clients. This effect is, unfortunately, a benchmark induced aberration.  The first two hundred clients were each run on a dedicated machine; after which up to two clients were run on each machine. Our measurements show that running more than one client on a machine adversely affected each client machine’s performance by 9.7%. This loss of efficiency appears as a deviation of the expected event rate.