At its annual GPU Technology Conference (GTC) in San Jose this week NVIDIA (NVDA) laid the foundation for its Pascal GPU architecture with NVLink high-speed integration, launched a GPU rendering appliance, and introduced a new Tegra K1 powered development kit for the embedded market. The conference conversation can be followed on Twitter hashtag #GTC14.
NVIDIA announced plans to integrate a high-speed interconnect into its future GPUs. The NVIDIA NVLink will enable GPUs and CPUs to share data five to 12 times faster than they can today. This will eliminate a longstanding bottleneck and help pave the way for a new generation of exascale supercomputers that are 50 to100 times faster than today's most powerful systems.
NVLink will be a part of the 2016 Pascal GPU architecture, which is being co-developed by IBM, which is incorporating it in future versions of its POWER CPUs. NVLink joins IBM POWER CPUs with NVIDIA Tesla GPUs to fully leverage GPU acceleration for a diverse set of applications, such as high performance computing, data analytics and machine learning.
Overcoming a Bottleneck With PCIe
"NVLink enables fast data exchange between CPU and GPU, thereby improving data throughput through the computing system and overcoming a key bottleneck for accelerated computing today," said Bradley McCredie, vice president and IBM Fellow at IBM. "NVLink makes it easier for developers to modify high-performance and data analytics applications to take advantage of accelerated CPU-GPU systems. We think this technology represents another significant contribution to our OpenPOWER ecosystem."
The NVLink interface addresses the bottleneck with PCI Express, which limits the GPU's ability to access the CPU memory system. PCIe is an even greater bottleneck between the GPU and IBM POWER CPUs, which have more bandwidth than x86 CPUs. NVLink will match the bandwidth of typical CPU memory systems, and it will enable GPUs to access CPU memory at its full bandwidth. GPUs have fast but small memories, and CPUs have large but slow memories. Accelerated computing applications typically move data from the network or disk storage to CPU memory, and then copy the data to GPU memory before it can be crunched by the GPU. With NVLink, the data moves between the CPU memory and GPU memory at much faster speeds.
The Unified Memory feature will simplify GPU accelerator programming by allowing the programmer to treat the CPU and GPU memories as one block of memory. NVIDIA GPUs will continue to support PCIe, but NVLink is substantially more energy efficient per bit transferred than PCIe. NVIDIA has designed a module to house GPUs based on the Pascal architecture with NVLink. This new GPU module is one-third the size of the standard PCIe boards used for GPUs today. Connectors at the bottom of the Pascal module enable it to be plugged into the motherboard, improving system design and signal integrity.
GPU rendering appliance
NVIDIA also launched a GPU rendering appliance that dramatically accelerates ray tracing, enabling professional designers to largely replace the lengthy, costly process of building physical prototypes. The new Iray Visual Computing Appliance (VCA) combines hardware and software to greatly accelerate the work of NVIDIA Iray, a photorealistic renderer integrated into leading design tools like Dassault Systèmes' CATIA and Autodesk's 3ds Max. Multiple Iray appliances can be linked, speeding up by hundreds of times or more the simulation of light bouncing off surfaces in the real world. As a result, automobiles and other complex designs can be viewed seamlessly at high visual fidelity from all angles. This enables the viewer to move around a model while it's still in the digital domain, as if it were a 3D physical prototype.
"Iray VCA lets designers do what they've always wanted to - interact with their ideas as if they were already real," said Jeff Brown, vice president and general manager of Professional Visualization and Design at NVIDIA. "It removes the time-consuming step of building prototypes or rendering out movies, enabling designs to be explored, tweaked and confirmed in real time. Months, even years - and enormous cost - can be saved in bringing products to market."
Iray VCA delivers unprecedented rendering performance using eight of NVIDIA's most powerful GPUs, each with 12GB of graphics memory, which together deliver 23,040 CUDA architecture cores. It has both 10GigE and InfiniBand connections, so rendering clusters of multiple Iray VCAs can be easily built up over time. NVIDIA worked with Honda Research as an early adopter, and created a prototype cluster made up of 25 nodes to refine styling designs on future cars.
"For our styling design requirements, we developed specialized tools that run alongside our RTT global standard platform," said Daisuke Ide, system engineer at Honda Research and Development. "Our TOPS tool, which uses NVIDIA Iray on our NVIDIA GPU cluster, enables us to evaluate our original design data as if it were real. This allows us to explore more designs so we can create better designs faster and more affordably."
Mobile Supercomputer for Embedded Systems
NVIDIA launched a new developer platform based on the Tegra K1 for new applications in Robotics, Medical, Avionics and Auto industries. The NVIDIA Jetson TK1 Developer Kit provides developers with the tools to create systems and applications that can enable robots to seamlessly navigate, physicians to perform mobile ultrasound scans, drones to avoid moving objects and cars to detect pedestrians. The embedded platform performs at 326 gigaflops, and includes a full C/C++ toolkit based on NVIDIA CUDA architecture, to makes it easier to program than the FPGA, custom ASIC and DSP processors that are commonly used in current embedded systems.
"Jetson TK1 fast tracks embedded computing into a future where machines interact and adapt to their environments in real time," said Ian Buck, vice president of Accelerated Computing at NVIDIA. "This platform enables developers to fully harness computer vision in handheld devices, bringing supercomputing capabilities to low-power devices."
Jetson TK1 is the first developer platform for the Tegra K1. It includes 2GB memory and input/output connectors for USB 3.0, HDMI 1.4, Gigabit Ethernet, audio, SATA, miniPCIe and an SD card slot. With a Kepler architecture-based Tegra K1 at its core, the mobile processor is a 192 core super chip that is energy efficient, and is designed from the ground up for CUDA. The Kepler architecture helps power the Titan supercomputer as well as many other energy-efficient supercomputers. A number of developers and system builders in the industrial, robotics and medical fields have expressed support for the development platform.
Simon Collins, product manager at GE Intelligent Platforms, said: "Tegra K1 can change what's possible in the rugged and industrial embedded market. We expect to be able to offer solutions in the sub-10 watt space that previously consumed 100 watts or more."
Chris Jones, director of strategic technology development at iRobot Corporation, said: "Having the level of performance and energy efficiency Jetson TK1 offers can potentially support the development of robots with real-time object recognition and compelling autonomous navigation capabilities. Our experience with the previous generation CUDA development kit has already enabled us to make great progress training robots to interact more intelligently with their environment."