SDL Wiki

CategoryGPU

The GPU API offers a cross-platform way for apps to talk to modern graphics hardware. It offers both 3D graphics and compute support, in the style of Metal, Vulkan, and Direct3D 12.

A basic workflow might be something like this:

The app creates a GPU device with SDL_CreateGPUDevice(), and assigns it to a window with SDL_ClaimWindowForGPUDevice()--although strictly speaking you can render offscreen entirely, perhaps for image processing, and not use a window at all.

Next the app prepares static data (things that are created once and used over and over). For example:

To render, the app creates one or more command buffers, with SDL_AcquireGPUCommandBuffer(). Command buffers collect rendering instructions that will be submitted to the GPU in batch. Complex scenes can use multiple command buffers, maybe configured across multiple threads in parallel, as long as they are submitted in the correct order, but many apps will just need one command buffer per frame.

Rendering can happen to a texture (what other APIs call a "render target") or it can happen to the swapchain texture (which is just a special texture that represents a window's contents). The app can use SDL_WaitAndAcquireGPUSwapchainTexture() to render to the window.

Rendering actually happens in a Render Pass, which is encoded into a command buffer. One can encode multiple render passes (or alternate between render and compute passes) in a single command buffer, but many apps might simply need a single render pass in a single command buffer. Render Passes can render to up to four color textures and one depth texture simultaneously. If the set of textures being rendered to needs to change, the Render Pass must be ended and a new one must be begun.

The app calls SDL_BeginGPURenderPass(). Then it sets states it needs for each draw:

Then, make the actual draw commands with these states:

After all the drawing commands for a pass are complete, the app should call SDL_EndGPURenderPass(). Once a render pass ends all render-related state is reset.

The app can begin new Render Passes and make new draws in the same command buffer until the entire scene is rendered.

Once all of the render commands for the scene are complete, the app calls SDL_SubmitGPUCommandBuffer() to send it to the GPU for processing.

If the app needs to read back data from texture or buffers, the API has an efficient way of doing this, provided that the app is willing to tolerate some latency. When the app uses SDL_DownloadFromGPUTexture() or SDL_DownloadFromGPUBuffer(), submitting the command buffer with SDL_SubmitGPUCommandBufferAndAcquireFence() will return a fence handle that the app can poll or wait on in a thread. Once the fence indicates that the command buffer is done processing, it is safe to read the downloaded data. Make sure to call SDL_ReleaseGPUFence() when done with the fence.

The API also has "compute" support. The app calls SDL_BeginGPUComputePass() with compute-writeable textures and/or buffers, which can be written to in a compute shader. Then it sets states it needs for the compute dispatches:

Then, dispatch compute work:

For advanced users, this opens up powerful GPU-driven workflows.

Graphics and compute pipelines require the use of shaders, which as mentioned above are small programs executed on the GPU. Each backend (Vulkan, Metal, D3D12) requires a different shader format. When the app creates the GPU device, the app lets the device know which shader formats the app can provide. It will then select the appropriate backend depending on the available shader formats and the backends available on the platform. When creating shaders, the app must provide the correct shader format for the selected backend. If you would like to learn more about why the API works this way, there is a detailed blog post explaining this situation.

It is optimal for apps to pre-compile the shader formats they might use, but for ease of use SDL provides a separate project, [SDL_shadercross](https://github.com/libsdl-org/SDL_shadercross) , for performing runtime shader cross-compilation. It also has a CLI interface for offline precompilation as well.

This is an extremely quick overview that leaves out several important details. Already, though, one can see that GPU programming can be quite complex! If you just need simple 2D graphics, the Render API is much easier to use but still hardware-accelerated. That said, even for 2D applications the performance benefits and expressiveness of the GPU API are significant.

The GPU API targets a feature set with a wide range of hardware support and ease of portability. It is designed so that the app won't have to branch itself by querying feature support. If you need cutting-edge features with limited hardware support, this API is probably not for you.

Examples demonstrating proper usage of this API can be found here .

Performance considerations

Here are some basic tips for maximizing your rendering performance.

In general try to remember the golden rule of performance: doing things is more expensive than not doing things. Don't Touch The Driver!

FAQ

Question: When are you adding more advanced features, like ray tracing or mesh shaders?

Answer: We don't have immediate plans to add more bleeding-edge features, but we certainly might in the future, when these features prove worthwhile, and reasonable to implement across several platforms and underlying APIs. So while these things are not in the "never" category, they are definitely not "near future" items either.

Question: Why is my shader not working?

Answer: A common oversight when using shaders is not properly laying out the shader resources/registers correctly. The GPU API is very strict with how it wants resources to be laid out and it's difficult for the API to automatically validate shaders to see if they have a compatible layout. See the documentation for SDL_CreateGPUShader() and SDL_CreateGPUComputePipeline() for information on the expected layout.

Another common issue is not setting the correct number of samplers, textures, and buffers in SDL_GPUShaderCreateInfo. If possible use shader reflection to extract the required information from the shader automatically instead of manually filling in the struct's values.

Question: My application isn't performing very well. Is this the GPU API's fault?

Answer: No. Long answer: The GPU API is a relatively thin layer over the underlying graphics API. While it's possible that we have done something inefficiently, it's very unlikely especially if you are relatively inexperienced with GPU rendering. Please see the performance tips above and make sure you are following them. Additionally, tools like RenderDoc can be very helpful for diagnosing incorrect behavior and performance issues.

System Requirements

Vulkan: Supported on Windows, Linux, Nintendo Switch, and certain Android devices. Requires Vulkan 1.0 with the following extensions and device features:

D3D12: Supported on Windows 10 or newer, Xbox One (GDK), and Xbox Series X|S (GDK). Requires a GPU that supports DirectX 12 Feature Level 11_1.

Metal: Supported on macOS 10.14+ and iOS/tvOS 13.0+. Hardware requirements vary by operating system:

Uniform Data

Uniforms are for passing data to shaders. The uniform data will be constant across all executions of the shader.

There are 4 available uniform slots per shader stage (where the stages are vertex, fragment, and compute). Uniform data pushed to a slot on a stage keeps its value throughout the command buffer until you call the relevant Push function on that slot again.

For example, you could write your vertex shaders to read a camera matrix from uniform binding slot 0, push the camera matrix at the start of the command buffer, and that data will be used for every subsequent draw call.

It is valid to push uniform data during a render or compute pass.

Uniforms are best for pushing small amounts of data. If you are pushing more than a matrix or two per call you should consider using a storage buffer instead.

A Note On Cycling

When using a command buffer, operations do not occur immediately - they occur some time after the command buffer is submitted.

When a resource is used in a pending or active command buffer, it is considered to be "bound". When a resource is no longer used in any pending or active command buffers, it is considered to be "unbound".

If data resources are bound, it is unspecified when that data will be unbound unless you acquire a fence when submitting the command buffer and wait on it. However, this doesn't mean you need to track resource usage manually.

All of the functions and structs that involve writing to a resource have a "cycle" bool. SDL_GPUTransferBuffer, SDL_GPUBuffer, and SDL_GPUTexture all effectively function as ring buffers on internal resources. When cycle is true, if the resource is bound, the cycle rotates to the next unbound internal resource, or if none are available, a new one is created. This means you don't have to worry about complex state tracking and synchronization as long as cycling is correctly employed.

For example: you can call SDL_MapGPUTransferBuffer(), write texture data, SDL_UnmapGPUTransferBuffer(), and then SDL_UploadToGPUTexture(). The next time you write texture data to the transfer buffer, if you set the cycle param to true, you don't have to worry about overwriting any data that is not yet uploaded.

Another example: If you are using a texture in a render pass every frame, this can cause a data dependency between frames. If you set cycle to true in the SDL_GPUColorTargetInfo struct, you can prevent this data dependency.

Cycling will never undefine already bound data. When cycling, all data in the resource is considered to be undefined for subsequent commands until that data is written again. You must take care not to read undefined data.

Note that when cycling a texture, the entire texture will be cycled, even if only part of the texture is used in the call, so you must consider the entire texture to contain undefined data after cycling.

You must also take care not to overwrite a section of data that has been referenced in a command without cycling first. It is OK to overwrite unreferenced data in a bound resource without cycling, but overwriting a section of data that has already been referenced will produce unexpected results.

Functions

Datatypes

Structs

Enums

Macros


CategoryAPICategory


[ edit | delete | history | feedback | raw ]

[ front page | index | search | recent changes | git repo | offline html ]

All wiki content is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Wiki powered by ghwikipp.