Optimized Processor - Network Adapter Coupling for High Performance Applications
Seiten
2011
|
1., Aufl.
Shaker (Verlag)
978-3-8440-0348-2 (ISBN)
Shaker (Verlag)
978-3-8440-0348-2 (ISBN)
- Keine Verlagsinformationen verfügbar
- Artikel merken
With the trend towards heavily virtualized multi- and many-core systems, the number of consumers and processing units sharing an I/O device, especially a Network Interface Controller (NIC), is increasing significantly. Therefore, asynchronous user-level interfaces (ULIs) based on the Virtual Interface Architecture (VIA) find increasing adoption, not only in Infini- Band (IB), but also in converged ethernet adapters, as they allow implementing a protected, scalable NIC interface.
VIA offers a scalable, mature interface for virtualized I/O devices with minimal software overhead in the send and receive path. The latest network switch generation has furthermore been able to considerably decrease the switch latency. The network communication latency is thus increasingly dominated by the time spent in the processor and NIC hardware. This thesis therefore aims at reducing the contribution of system hardware to the communication latency with focus on the send side.
Traditionally, the processor and the NIC are considered as completely separate units that are coupled using standardized, generic bus interfaces. The knowledge about the network communication process is limited to the processing units on the one and the NIC on the other hand. This separation however neglects synergetic opportunities that can be used to improve the coupling efficiency and low latency characteristics, especially for user-level interfaces.
Based on the performance and bottleneck analysis of a commercial Infini- Band Host Channel Adapter and using a consolidated view of the processor and the I/O device, I propose a new doorbell concept for user-level interfaces. It is divided into two steps. During the first step, virtual addresses contained in work requests from the consumer are translated in the processing unit, using its TLB. In the second step, send-related data is forwarded from the I/O controller to the device, which helps avoiding costly round trips over the external bus. The concept therefore allows for efficient bridging of the “virtualization gap” between the consumer process and the virtualized I/O device, and thus onloading of the send data movement to the processor. It therefore no longer considers the I/O controller as a simple coupling means but emphasizes its crucial role as bridge between the cache coherent processor fabric and I/O devices.
I further present a new HCA send architecture that takes into account the findings of the initial performance analysis and the requirements imposed by the send data movement onloading concept. It focuses on scalability, especially in queue pair scheduling, and on efficient integration of different doorbell mechanisms that are used for different communication patterns such that they degrade gracefully in overload situations.
The send data movement onloading concept and the device architecture are evaluated using a custom, I/O centric simulator with focus on the send process. It allows analyzing the send performance of HCAs with different coupling degrees, simulating either a closely-coupled device (HyperTransportlike) or a fabric-coupled device (PCI Express-like). The simulation results show that send data movement onloading support for user-level interfaces is able to achieve considerable latency reduction for the send process in both cases, ranging on average from 13 % to 47 % for a closely-coupled and from 21 % to 60 % for a fabric-coupled NIC.
VIA offers a scalable, mature interface for virtualized I/O devices with minimal software overhead in the send and receive path. The latest network switch generation has furthermore been able to considerably decrease the switch latency. The network communication latency is thus increasingly dominated by the time spent in the processor and NIC hardware. This thesis therefore aims at reducing the contribution of system hardware to the communication latency with focus on the send side.
Traditionally, the processor and the NIC are considered as completely separate units that are coupled using standardized, generic bus interfaces. The knowledge about the network communication process is limited to the processing units on the one and the NIC on the other hand. This separation however neglects synergetic opportunities that can be used to improve the coupling efficiency and low latency characteristics, especially for user-level interfaces.
Based on the performance and bottleneck analysis of a commercial Infini- Band Host Channel Adapter and using a consolidated view of the processor and the I/O device, I propose a new doorbell concept for user-level interfaces. It is divided into two steps. During the first step, virtual addresses contained in work requests from the consumer are translated in the processing unit, using its TLB. In the second step, send-related data is forwarded from the I/O controller to the device, which helps avoiding costly round trips over the external bus. The concept therefore allows for efficient bridging of the “virtualization gap” between the consumer process and the virtualized I/O device, and thus onloading of the send data movement to the processor. It therefore no longer considers the I/O controller as a simple coupling means but emphasizes its crucial role as bridge between the cache coherent processor fabric and I/O devices.
I further present a new HCA send architecture that takes into account the findings of the initial performance analysis and the requirements imposed by the send data movement onloading concept. It focuses on scalability, especially in queue pair scheduling, and on efficient integration of different doorbell mechanisms that are used for different communication patterns such that they degrade gracefully in overload situations.
The send data movement onloading concept and the device architecture are evaluated using a custom, I/O centric simulator with focus on the send process. It allows analyzing the send performance of HCAs with different coupling degrees, simulating either a closely-coupled device (HyperTransportlike) or a fabric-coupled device (PCI Express-like). The simulation results show that send data movement onloading support for user-level interfaces is able to achieve considerable latency reduction for the send process in both cases, ranging on average from 13 % to 47 % for a closely-coupled and from 21 % to 60 % for a fabric-coupled NIC.
Reihe/Serie | Berichte aus der Elektrotechnik |
---|---|
Sprache | englisch |
Maße | 148 x 210 mm |
Gewicht | 408 g |
Einbandart | Paperback |
Themenwelt | Technik ► Elektrotechnik / Energietechnik |
Schlagworte | Address Translation • HCA • InfiniBand • Interface • User-Level |
ISBN-10 | 3-8440-0348-7 / 3844003487 |
ISBN-13 | 978-3-8440-0348-2 / 9783844003482 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
Mehr entdecken
aus dem Bereich
aus dem Bereich
Wegweiser für Elektrofachkräfte
Buch | Hardcover (2024)
VDE VERLAG
CHF 67,20