与大多数媒体稍带浮躁的Day One Review不同，Anandtech更倾向于深入探索，更有研究的价值。
Back in May we took a first look at the first of these cards, NVIDIA’s GeForce GTX 1080 Founders Edition. Launched at $700, it was immediately the flagship for the FinFET generation. Now today, at long (long) last, we will be taking a complete, in-depth look at the GTX 1080 Founders Edition and its sibling the GTX 1070 Founders Edition. Architecture, overclocking, more architecture, new memory technologies, new features, and of course copious benchmarks. So let’s get started on this belated look at the latest generation of GPUs and video cards from NVIDIA.
GeForce GTX 1080
GeForce GTX 1070
Cards, Pricing, & Availability
Pascal’s Architecture: What Follows Maxwell
HPC vs. Consumer: Divergence
GP104: The Heart of GTX 1080
FP16 Throughput on GP104: Good for Compatibility (and Not Much Else)
Feeding Pascal: GDDR5X
Feeding Pascal, Cont: 4th Gen Delta Color Compression
Asynchronous Concurrent Compute: Pascal Gets More Flexible
This from a technical perspective is all that you need to offer a basic level of asynchronous compute support: expose multiple queues so that asynchronous jobs can be submitted. Past that, it’s up to the driver/hardware to handle the situation as it sees fit; true async execution is not guaranteed. Frustratingly then, NVIDIA never enabled true concurrency via asynchronous compute on Maxwell 2 GPUs. This despite stating that it was technically possible. For a while NVIDIA never did go into great detail as to why they were holding off, but it was always implied that this was for performance reasons, and that using async compute on Maxwell 2 would more likely than not reduce performance rather than improve it.Moving to Maxwell, Maxwell 1 was a repeat of Big Kepler, offering HyperQ without any way to mix it with graphics. It was only with Maxwell 2 that NVIDIA finally gained the ability to mix compute queues with graphics mode, allowing for the single graphics queue to be joined with up to 31 compute queues, for a total of 32 queues.
Preemption Improved: Fine-Grained Preemption for Time-Critical Tasks
Simultaneous Multi-Projection: Reusing Geometry on the Cheap
Display Matters: New Display Controller, HDR, & HEVC
All of that said, there is one new feature to the video decode block and associated display controller on Pascal that’s not present on GM206: Microsoft PlayReady 3.0 DRM support. The latest version of Microsoft’s DRM standard goes hand in hand with the other DRM requirements (e.g. HDCP 2.2) that content owners/distributors have required for 4K video, which is why Netflix 4K support has until now been limited to more limited devices such as TVs and set top boxes. Pascal is, in turn, the first GPU to support all of Netflix’s DRM requirements and will be able to receive 4K video once Netflix starts serving it up to PCs.Meanwhile in terms of total throughput running a quick 1080p benchmark check with DXVAChecker finds that the new video decoder is much faster than GM204’s. H.264 throughput is 40% higher, and HEVC throughput, while a bit less apples-to-apples due to hybrid decode, is 13% higher (with roughly half the GPU power consumption at the same time). At 4K things are a bit more lopsided; GM204 can’t sustain H.264 4Kp60, and 4Kp60 HEVC is right out.
Fast Sync & SLI Updates: Less Latency, Fewer GPUs
What are Double Buffering, V-sync, and Triple Buffering?
SLI: The Abridged Version
Deprecated: 3-Way & 4-Way SLI
GPU Boost 3.0: Finer-Grained Clockspeed Controls
Observations on Clocking with Pascal
NVIDIA Works: ANSEL & VRWorks Audio
Meet the GeForce GTX 1080 & GTX 1070 Founders Edition Cards
GPU 2016 Benchmark Suite & The Test
Rise of the Tomb Raider
Ashes of the Singularity
The Witcher 3
Grand Theft Auto V
Power, Temperature, & Noise
Though we often treat FinFET as the solution to planar’s scaling problems, FinFET is more than just a means to enable 20nm/16nm geometry. It’s also a solution in and of itself to voltages. As a result, GP104’s operating voltages are significantly lower than GM204’s. Idle voltage in particular is much lower; whereas GTX 980 idled at 0.856v, the GP104 cards get to do so at 0.625v. Load voltages are also reduced, as GM204’s 1.225v boost voltage is replaced with GP104’s 1.062v boost voltage.
It’s interesting to see that despite its lower rated clockspeeds, GTX 1070 actually averages a bin or two higher than GTX 1080. As our samples have identical maximum boost clocks ? something I should note is not guaranteed, as the maximum boost clock varies from card to card ? we get a slightly more apples-to-apples comparison here. GTX 1070 has a lower TDP, which can hurt its ability to run at its highest clocks, but at the same time it’s a partially disabled GPU, which can reduce power consumption. Meanwhile the GTX 1070’s cooler is a bit less sophisticated than the GTX 1080s ? losing the vapor chamber for heatpipes ? but on the whole it’s still a very powerful cooler for a 150W card. As a result our GTX 1070 sample is able to get away with slightly better boosting than GTX 1080 in most situations. This means that the cards’ on-paper clockspeed differences are generally nullified and aren’t a factor in how the cards’ overall performance differs.As a percentage of the maximum boost clock, the average clockspeeds of the the GTX 1080 and GTX 1070 both drop more significantly than with GTX 980, where the latter only drops a few percent from its maximum. This is due to a combination of the temperature compensation effect we discussed earlier and both cards hitting 83C (though so does GTX 980). Either way both cards are still happily running in the 1700MHz range, and the averages for both cards remain north of NVIDIA’s official boost clock. Though this does give us a good idea as to why the official boost clock is so much lower than the cards’ maximum boost clocks.
Performance & Recommendations: By The Numbers