数码科技

Freeing the GPU

AMD在一年前发布了Radeon Open Compute Ecosystem(ROCm)。雄心勃勃的ROCm项目围绕曾经非常专有的GPU加速高性能计算世界构建了一个完整的开源生态系统。我们与ROCm高级总监Greg Stoner坐下来了解为什么ROCm可能会给HPC带来重大变化。

ADMIN杂志:如果您从ROCm的简要介绍开始。这是什么,为什么世界应该兴奋?

Greg Stoner:ROCm是一个开源的,基于GPU的计算的HPC级平台,与语言无关。当然还有其他的GPU加速平台。ROCm的不同之处在于,我们并没有停留在基础驱动上,而是实际上打开了所有的工具和库。我们所做的应用程序工作实际上是以源代码形式提供的,我们使用GitHub来提供它。

ROCm为GPU提供了几种不同的语言和路径。我们甚至有这个叫做HIP [异构计算接口的可移植性]的东西,它提供了一个简单的方法来从CUDA移植代码。你可能知道,CUDA是GPU的意识,但它只支持一个供应商的GPU。ROCm是开源的,所以任何厂商都可以使用它并将其移植到他们的平台上。使用CUDA编写的代码可以轻松移植到与供应商无关的HIP格式,从那里您可以编译CUDA或ROCm平台的代码。因此,CUDA程序员有一个舒适的环境,可以使用我们的移植工具将他们的代码交给HIP。

我们还为我们称之为HCC的C ++程序员构建了一个解决方案。HCC是一个C ++编译器的单一来源,实质上可以让您将CPU代码和GPU代码集成到一个文件中。我们想让系统尽可能简单的安装。历史上许多驱动程序都是使用shell脚本安装的,但是我们希望与Linux更紧密的集成,所以我们实际上正在使用传统的软件包安装程序。要在Ubuntu上安装,只需添加repo,然后apt-get install rocm,就可以开始了。

我一直在使用Linux很长一段时间。在我来到AMD之前,我曾经管理过一个系统团队。我曾经运行过大的集群和存储阵列。当时我们都是基于CentOS的。当我加入ROCm项目时,我带着Linux系统敏感性。

AM:OpenCL是不是支持GPU加速的开源解决方案?

GS:OpenCL是一个可靠的解决方案; 它解决了一系列关键需求,但基本上只有C99,而且你知道这在企业领域是行不通的。

所以,我们要做的是确保我们有一个核心的基础,允许我们适应不同的语言,所以我们把标准化的加载器接口,标准化的API,系统运行时接口,这将实际上使驱动程序更像是一项操作系统服务。所以现在,你可以动态地加载你想在平台上使用的一组语言,然后从那里建立用户空间。

我们提供了一套标准的语言,是的,我们支持OpenCL。

AM:那么程序员可以用熟悉的语言工作,还能从GPU加速中受益?

GS:OpenMP有一些非常相似的东西 – 当你想要在GPU上执行时,基本上可以告诉它。我们正在用一组实验操作符来做同样的事情,这些操作符就像GPU计算的指令一样,但是您更像自己的历史编写代码 – 与Fortran或C ++中的操作非常类似,所以更自然。

ROCm也支持Python。我们一直在使用Anaconda [大型和科学用例的Python发行版]。另一方面,ROCm支持内联汇编,因此您可以像在CPU上一样对性能进行性能调整。我们还有一个汇编/反汇编程序,您可以使用它来调整应用程序的性能。我们所有的编译器技术都位于LLVM上,而LLVM实际上是生成本地ISA,而我们是上游的,所以这个平台有一个标准的LLVM编译器。

ROCm有两个部分:核心驱动程序,它变得更像一个OS服务,然后你有编程堆栈,这更像是在操作系统中,在那里你可以添加模块和可组合的软件你有在Unix中。这是我们的方向:把所有东西组合起来。这一切都是开放的。

AM:HPC领域主要关注机器学习和人工智能。你在ROCm提供什么样的支持?

GS:我们已经开源了MIOpen机器学习库,它支持深度卷积网络的GPU加速。我们正在移植几个机器学习库和工具。这些工具中有许多是用CUDA编写的,我们可以使用HIP将它们带入ROCm堆栈。人们并没有意识到像TensorFlow机器学习库这样的东西有超过一百万行的代码,我们正在把它们整合起来,然后我们将为我们的平台手动调整它。然后我们将把它释放回市场,我们将保持我们的ROCm版本。我们现在正在发布这些AI工具的ROCm版本。我们从Caffe开始,现在我们正在开发MXNet,TensorFlow和PyTorch。

AM:你们有没有和ROCm合作过的HPC程序员团队给你提供反馈?

GS:是的,我们做。我们也一直在与美国能源部合作的AMD研究小组工作,我们也一直在与深度学习社区合作,并从中获得反馈。

AM:听起来像是一个巨大的努力。

GS:这是一个非常巨大的努力,我提醒我的团队,我们已经取得了多少成就。但是,这是一个艰巨的努力。

AM:描述一个典型的ROCm用户。谁将会使用这种技术?

GS:我们在石油和天然气行业找到了一个真正的利基市场,他们是我们以前的一些客户。我们也一直在与加拿大的一个射电望远镜项目CHIME项目合作。我们有几个更大的客户在制造领域做了一些有趣的事情,需要在组装中进行手动调整。我们也有人在汽车行业使用ROCm。

AM:它从哪里走?你如何看待ROCm在未来几年的发展?

GS:我们看到越来越多的投资领域是系统管理。我们将继续关注深度学习和基于网络的编程模型。我们只是在可能的开始。

如果你真的看过去年,我们收到了很多的投入。在第一年,我们做了超过九个版本。我们正在移动。这就是你如何驱动功能的功能,然后你做更多的功能功能,并在性能稳定工作。一旦您获得了核心功能,就是关于功能调整,稳定性调整和性能调整。还有一些新的东西即将上线 – 我们正在扩展到的迁移技术。我们现在已经有了基础。最大的部分是在我们后面。

AM:因为它是全部开放的,那么另一个芯片供应商是否有可能将这项技术用于自己的硬件?

GS:当然。实际上,它是建立在HSA(异构系统架构)模型基础之上的,HSA基金会的人们已经在使用ROCm的一些技术。

上午:是你梦想的一部分 – 这成为开源基础设施的标准部分?

GS:是的,除了AMD处理器和Intel Xeon之外,我们还在内核中提供了对Power8的支持 – 在内核中也支持ARM AArch64。随着时间的推移,我很乐意看到我们所做的 – ROCm的核心 – 成为Linux驱动程序和Linux发行版的自然延伸 – 以及所有的编程模型。我们正试图达到这样的程度,它只是人们可以用于异构计算的标准基础架构。这是真的,这就是我们如何构建标准和广泛分布的异构计算解决方案,因此开发人员可以更轻松地完成他们所关心的工作,并使用系统中的所有计算能力。这就是今天的缺失。每个人都建立了这些单一的堆栈,只是为他们工作,

基本上,我把这看作是:我们如何让这类软件活到千年?而且我们正在利用自由软件的力量。GCC创建于90年代中期,当时缺乏商业编译器的一些性能。但是现在,商业编译器几乎不能跟上。它越来越好,越来越好。这就是我看到我们正在与ROCm做什么。我们种了一面旗,现在是一个进化。


reeing the GPU
Joe Casad
AMD released the Radeon Open Compute Ecosystem (ROCm) for GPU-based parallel computing about a year ago. The ambitious ROCm project builds a complete open source ecosystem around the once-very-proprietary world of GPU-accelerated high-performance computing. We sat down with ROCm Senior Director Greg Stoner to find out why ROCm could bring big changes to the HPC space.

ADMIN Magazine: How about if you start with a brief introduction to ROCm. What is it and why should the world be excited?

Greg Stoner: ROCm is an open source, HPC-class platform for GPU-based computing that is language independent. There are other GPU-accelerated platforms, of course; what’s different about ROCm is, we didn’t stop at the base driver but actually opened up all the tooling and libraries on top of it. The application work that we do we actually deliver in source form, and we use GitHub to deliver it.

ROCm offers several different languages and paths to code for the GPU. We even have this thing called HIP [Heterogeneous-compute Interface for Portability], which provides an easy way to port code from CUDA. As you might know, CUDA is GPU aware, but it only supports the GPUs of one vendor. ROCm is open source, so any vendor can work with it and port it to their platform. Code written in CUDA can port easily to the vendor-neutral HIP format, and from there, you can compile the code for either the CUDA or the ROCm platform. So, CUDA programmers have a comfortable environment to be in, and they can bring their code across to HIP using our porting tools.

We also built a solution for C++ programmers we call HCC. HCC is a C++ compiler single source that essentially lets you integrate the CPU code and the GPU code into one file. We wanted to make the system as simple as possible to install. Historically many drivers are installed with shell scripts, but we wanted closer integration with Linux, so we are actually working with conventional package installers. To install on Ubuntu, you just add the repo, and then it’s apt-get install rocm, and you’re ready to go.

I’ve been working with Linux for a long time. I ran a sysop team before I came to AMD. I used to run big clusters and storage arrays. We were all CentOS-based back then. I brought along that Linux sysop sensibility when I joined the ROCm project.

AM: Isn’t OpenCL an open source solution that supports GPU acceleration?

GS: OpenCL is a solid solution; it solves a set of critical needs, but it’s basically only C99, and you know that doesn’t work in the enterprise space.

So, what we wanted to do is make sure we have a foundation at the core to allow us to adapt different languages, so we put in things like standardized loader interfaces, standardized APIs, a system run-time interface, which will actually make the driver act more like an OS service. So now, you can dynamically load a set of languages that you want to use on the platform and then build up the userland from there.

We supply the standard set of languages, and yes we do support OpenCL.

AM: So the programmer gets to work in a familiar language and still benefit from GPU acceleration?

GS: OpenMP has something very similar ? you basically tell it when you want to execute against the GPU. We’re doing the same thing with a set of lab operators, which are like directives for GPU computing, but you write your code more like how you historically did ? very similar to how you do it in Fortran or C++, so it’s more natural.

ROCm also supports Python. We’ve been working with Anaconda [Python distribution for large-scale and scientific use cases]. The other thing is, ROCm supports inline assembly, so you can performance-tune the hot loops just like you do on a CPU. We have an assembler/disassembler as well, which you can use to work on performance-tuning your application. All our compiler technologies sit on LLVM, and LLVM actually generates native ISA, and we upstream all that, so there’s a standardized LLVM compiler for the platform.

There are two pieces to ROCm: the core driver, and that’s becoming more like an OS service, and then you have the programming stack, and that’s more like it is with in an operating system, where you can add in modules and composable software like you have in Unix. That’s the direction we’re headed: making everything composable. And it’s all open.

AM: The HPC space has a big focus on machine learning and AI. What kind of support do you provide in ROCm?

GS: We have open sourced the MIOpen machine-learning library, which supports GPU acceleration for deep convolution networks. We’re in the process of porting several machine-learning libraries and tools. Many of these tools were written with CUDA, and we can use HIP to bring them into the ROCm stack. People don’t realize that something like the TensorFlow machine-learning library is over a million lines of code, and we’re bringing that across, and then we’ll hand-tune it for our platform. Then we will release it back to the market, and we’ll maintain our ROCm-ready version. We’re in the process of releasing ROCm versions of these AI tools now. We started with Caffe, and now we’re working on MXNet, TensorFlow, and PyTorch.

AM: Do you have teams of HPC programmers out there who have been working with ROCm and are giving you feedback?

GS: Yes we do. We have also been working with AMD Research Group, which works with the US Department of Energy, and we’ve been working with the deep learning community as well and getting feedback from them.

AM: Sounds like a massive effort.

GS: It is a very massive effort; I remind my team how much we’ve achieved. But yes it’s been a Herculean effort.

AM: Describe a typical ROCm user. Who is going to be using this technology?

GS: We found a real niche in the oil and gas business, and they were some of our earlier customers. We’ve also been working with the CHIME project, a radio telescope project out of Canada. We’ve had a couple of bigger customers that were doing some interesting things in the manufacturing space and need to do hand-tuning in assembly. We’ve also got people using ROCm in the automotive industry.

AM: Where does it go from here? How do you see ROCm evolving over the next few years?

GS: One area that we see we’re going to be investing in more and more is system management. We’re going to continue to focus on deep learning and network-based programming models. We’re just at the beginning of what is posible.

If you really look at the last year, we received lots of input. In the first year we did over nine releases. We were moving. That’s how you drive feature function, then you do more feature function and work on performance stability. Once you get the core functionality in place, it’s about feature tuning, stability tuning, performance tuning. And there are some new things that are coming online ? migration technologies that we’re extending into. We’ve got the foundation now; the biggest part is behind us.

AM: Since it’s all open, is it possible that another chip vendor could then employ this technology for their own hardware?

GS: Sure. Actually, it’s built on the HSA [Heterogeneous System Architecture] model, and there are people in the HSA Foundation that are using some of the technologies from ROCm already.

AM: Is that part of your dream ? that this becomes a standard part of the open source infrastructure?

GS: Yes it is, In addition to AMD processors and Intel Xeon, we been bringing up Power8 support in the core ? also ARM AArch64 support in the core. Over time, I would love to see what we’ve done ? the core of ROCm ? become a natural extension of the Linux driver and Linux distros ? and all of the programming model on top of it. We’re trying to get to a point where it’s just a standard basic infrastructure that people can use for heterogeneous computing. That’s really what this is about; it’s how we build heterogeneous computing solutions that are standards and broadly distributed, so it’s easier for developers to do the work they care about and use all the compute power in the system. That’s what’s missing today. Everyone’s built these monolithic stacks that just work for them, and now we need to figure out how to get the entire industry working together on this class of programming.

Basically, I look at this as: How do we get this class of software to live for a millennium? And we’re harnessing the power of free software. GCC was created in the mid-90s, and it lacked some of the performance of commercial compilers at the time. But now, the commercial compilers can barely keep up with it. It keeps getting better and better. That’s how I see what we’re doing with ROCm. We planted a flag, and now it’s an evolution.


via:admin-magazine.com

原文作者:Joe Casad

MOEPC.NET编译,转载请保留出处。

剧毒术士马文

留学中 Comp.Arch|RISCV|HPC|FPGA 最近沉迷明日方舟日服 联系方式请 discord 或者 weibo 私信。目前不在其他平台活动。 邮箱已更新为[email protected]。 看板娘:ほし先生♥

相关文章

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

返回顶部按钮