Asus GeForce GTX, NVidia CUDA and hardware acceleration h.264

Asus GeForce GTX, Nvidia CUDA and hardware acceleration h.264

Xeoma supports CUDA hardware acceleration on Nvidia graphics cards. As of Xeoma version 23.12.7, hardware acceleration is available for AMD Radeon graphics cards as well.
Hardware acceleration of decoding can be applied on Xeoma’s server side or client side, the conditions and requirements differ between them. Let’s look at each option in detail:

Server

We often get questions about which graphics cards to choose for better work of a video surveillance system. That’s why we decided to create a list of models recommended by our technical specialists.

Asus GeForce GTX 950 (or higher, but it won’t affect H264 decoding speed). The disadvantage of this option is that there may not be enough video memory, i.e. depending on the resolution it is possible to get only 250fps instead of 510fps.

Asus GeForce GTX 960 with exactly 4096(!) MB of video memory.

Asus GeForce GTX 1060 with at least 6144 MB of video memory – 1.5 times faster than the GTX 960.

Asus GeForce GTX 1070 with 8192 MB of video memory.

To summarize, we would like to highlight the following models and their advantages.

The best option in terms of price and quality is the Asus GeForce GTX 1060 with at least 6144 MB of video memory. It will have up to 25 video streams at 30fps Full-HD. And at 10fps up to 75 cams (but you have to check if there is enough video memory).

If your budget allows, it’s better to give preference to Asus GeForce GTX 1070 with 8192 MB of video memory. In this case you will be able to connect more cameras, because the 1060 in some variants is limited by the size of the video memory. It gives theoretically about 750 fps Full-HD, i.e. it is 1.5 times faster than 900-series (which has only 510 fps Full-HD).

GTX 950 at 30fps Full-HD will have 8 (possibly up to 16-17) video streams.

GTX 960 4GB at 30fps Full-HD will have up to 16-17 video streams.

If you decrease resolution or fps – there will be proportionally more cameras, however, the resolution may not be lower than HD (otherwise, hardware acceleration does not apply).

Please note that not all graphics cards allow users to use the hardware acceleration feature in Xeoma. The exact requirements depend on 3 factors: OS type, camera stream type, GPU architecture. Here is the breakdown:

OS	Stream	Minimal Architecture
Windows	H.264	Fermi
Windows	H.265	Pascal
Linux	H.264	Maxwell
Linux	H.265	Pascal
MacOS	H.264	Maxwell
MacOS	H.265	Pascal

Here “minimal architecture” refers to the GPU’s own architecture, each card model has one indicated in its specifications. You need a card of the same or higher architecture as shown in the table. Here are the architecture names in ascending order (valid as of February, 2024):

Tesla
Fermi
Kepler
Maxwell
Pascal
Volta
Turing
Ampere
Ada Lovelace
Hopper
Blackwell

For example, on a Linux system you can use video cards with Kepler architecture to display the client part on screen, but that architecture would not be suitable for hardware acceleration.

Your computer should have enough RAM as well since it is also consumed during decoding via CUDA (appr. 140-200 MB for a Full-HD stream). It is desirable to have at least 16 GB. 6 GB will theoretically be enough for 40-42 cameras, 8 GB — for 55-57 cameras (with a little less fps). Otherwise, there won’t be enough speed of the video card.

As of Xeoma version 22.3.16, some of the modules can take advantage of CUDA as well. The minimal requirement is the same for all of them on all OS types – Pascal. They are:

Object Recognizer
Sports Tracking
Cross-Line Detector
Smoke Detector

Client

The client machine may handle the video decoding process, if the server hasn’t done it on its end. It is generally recommended to have things set up this way, as it reduces the overall load on both the server and the network connecting the server and the client (see “Forced video decoding on the client (for all users)” in the User permissions editing dialog). Note that this applies only to the cameras that provide their video in H.264 or H.265 encoding; the vast majority of modern cameras do that.

While handling the decoding the client may take advantage of hardware acceleration as well, which can be managed by the client machine’s graphics card(s) or CPU. Unlike the server, the client depends a lot less on the specific models and architectures of GPUs when it comes to hardware acceleration. Instead, the graphics drivers become the key factor in making it possible. We highly recommend keeping your graphics drivers up to date.

This hardware independence is possible because of a set of technologies (APIs), supported by both the GPU/CPU manufacturers and the OS types. Please see the table below for the full breakdown:

OS	API	Description
Windows	Intel Quick Sync	Intel Quick Sync Video works on Intel CPUs.
	NVIDIA CUDA	Compute Unified Device Architecture works on Nvidia GPUs.
	DXVA2	DirectX Video Acceleration 2.0 works with most kinds of GPUs.
	D3D11VA	Direct3D 11 Video Acceleration works with most kinds of GPUs. Modern alternative to DXVA2.
	Vulkan	Vulkan works with most kinds of GPUs. Modern alternative to OpenGL.
Linux	Intel Quick Sync	Intel Quick Sync Video works on Intel CPUs.
	NVIDIA CUDA	Compute Unified Device Architecture works on Nvidia GPUs.
	VAAPI	Video Acceleration API works with most kinds of GPUs.
	VDPAU	Video Decode and Presentation API for Unix works with most kinds of GPUs.
	Vulkan	Vulkan works with most kinds of GPUs. Modern alternative to OpenGL.
	V4L2M2M	Exclusive to ARM! Video4Linux works with most kinds of GPUs.

We hope that this article was helpful for you.

Updated on December, 11 2023