The D3D problem - tying the CPU up
AMD has made a big deal about Mantle - a co-developed graphics application programming interface (API) that redresses some of what the company believes to be lacking in DirectX 11. You see, while DirectX is great as a vendor-agnostic API, AMD is at pains to point out that, right now, significant CPU overhead is incurred by having the driver translate commands from the API to ones the GPU can understand and action.
And it's clearly not just about the driver. Modern games ask the GPU to render complex scenes that require the CPU to fulfill lots of draw calls (or commands) per frame. The purpose of these calls is to tell the GPU to draw an object, or to do some new work. The parallel power of cutting-edge GPUs is such that thousands of draw calls are required to keep them busy and efficient, putting the onus on the CPU to mete them out.
The process works by having the calls passed on from the application, to the API, and then to the graphics driver, but running this via DirectX can add inefficiencies and additional rendering time along the way, causing performance to be bound by the ability of the CPU rather than the intrinsic power available from the GPU.
DirectX has markedly improved in this respect - DirectX 9 (and previous) APIs were horrible in this regard - but the current mechanism by which draw calls are sent over and understood by the GPU inhibits the ability of mainstream CPUs to deliver adequate performance to quality graphics cards. This situation is particularly prevalent when dealing with high-end cards, and the real-world upshot is stifled performance that, in theory, could be made better via a more-efficient API that facilitates the passing of draw calls to a greater degree.
What's needed, therefore, is a console-like API that provides lower-level, less-overhead access from the CPU to the GPU, enabling the latter to work more efficiently at rendering high frame rates by giving the GPU the code it needs in an easy-to-execute manner. In a nutshell, this is part of what Mantle is, and it has been implemented in the latest patch of Battlefield 4.
Being an AMD technology, Mantle works with the company's GCN-based products, that is, discrete Radeon R9/R7/HD 7000/HD 8000 cards and the IGP in the latest Kaveri GPU. That said, the publicly-available driver is only optimised for the Radeon R9 290-series of GPUs.
How it's run
Once the Catalyst 14.1 beta 6 Mantle-supporting drivers are installed and Battlefield has been patched up to the latest build as at February 4, flitting between the DX11 and Mantle APIs is a simple matter of changing one setting in the video section, exiting the game and then loading it back up.
The most commonly-used application for measuring frame-rates is FRAPS. It works by collecting data in the DirectX pipeline and then logging it into files that provide a number of performance variables. Newer, more-advanced tools, such as Nvidia's FCAT, strive for greater accuracy, but for a single-GPU system, the FRAPS output is as good as anything else available.
FRAPS, however, isn't compatible with the Mantle API - it is designed for DirectX, after all - so the folks over at DICE have added a few shortcuts that enable logging. Appreciating the Mantle patch and driver are available, all you need to do, to try it yourself, is bring up the console and type 'PerfOverlay.FrameFileLogEnable 1' to start logging and then 'PerfOverlay.FrameFileLogEnable 0' to finish. BF4 spits out log-files that can be used to calculate the effective frames-per-second metric. Cursory examination reveals no obvious image-quality differences when running either code-path.
The obvious expectation is for the Mantle API to run faster than DirectX11 when a mainstream (draw-call-limited) CPU is paired with a high-end card. AMD cites examples such as A10-7700K APU and Radeon R9 290X graphics, though it's doubtful that most enthusiasts would consider such a combination. Switch over to a faster, better CPU, such as the Intel Core i7-4770K, and the performance uplift potential is limited - the CPU's innate power is enough to overcome the DX11-induced hurdles. Here, AMD's own testing shows, as expected, limited performance increases when switching APIs.
We chose to test Battlefield 4 with two combinations. On the one hand we have an AMD A10-7850K Kaveri APU tied to a Radeon R9 290X, and on the other, an Intel Core i5-4670K with the same card. The Intel system is also run with a reference GeForce GTX 780 Ti, to see how Nvidia's card fares against the Mantle-run R9 290X. Testing is done on Windows 8.1, 64-bit, and save for differences in CPU and motherboard, both system are otherwise identical.
Our map uses an outdoor scene that's considered CPU-limited for the most part. Baseline performance is the A10-7850K and Radeon R9 290X running via good ol' DX11. The score's actually decent for 1,920x1,080 and ultra-quality settings. Invoking the Mantle path increases performance by around 10 per cent.
What's more telling is that a Core i5-4670K is faster than the Mantle-infused A10-7850K when using regular DirectX. Running Mantle has less of a positive effect, most likely due to the beefier processing and draw-call ability of the Intel chip, but there's still a repeatable increase.
Nvidia fans will point to the fact that a regular GTX 780 Ti is comfortably faster than the Mantle-driven Radeon R9 290X in our scene. Sure, we could artificially engineer situations where Mantle performance looks better by finding particularly CPU-limited scenes that are rife with large structures collapsing, as AMD has likely done in its benchmarking notes, but doing so would be disingenuous.
Battlefield 4 is the poster-child for AMD's Mantle technology. Though the theory surrounding the new API makes a lot of inherent sense, practicalities mean that genuine increases in performance require pairing a mainstream CPU with an expensive graphics card. Anything less than this eclectic combination and gains are unlikely to be perceivable at what we'd term high-quality video settings.
Perhaps the biggest stumbling block for AMD right now is that a roughly-comparable CPU from Intel has enough power to run the same high-end graphics card at a faster pace, via DirectX, than AMD can manage through Mantle.
Mantle has to demonstrate significant speed-ups over DirectX if it is to be taken on by a broad range of developers and games engines. Coding for Mantle compatibility requires additional resources that smaller studios are likely unable to burden. Mantle has some good ideas on how to reduce resource overhead and enable a better gaming experience, so it may be incumbent on Microsoft to learn a few lessons from this vendor-specific API and roll them into the next iteration of DirectX. Such a move would work on all fronts, giving AMD claim for invigorating the industry and all gamers extra performance for free.
We'll be looking into Mantle performance across a larger number of CPUs and GPUs in the near future. AMD says that Mantle is very much a work in progress, but from what we have seen thus far, there's potential to elevate performance for mass-market PCs... which is a good a reason to have Mantle as any.