DeepStack Case Study: Performance from CPU to GPU version

jaydeel · Aug 30, 2021

I'm posting this thread for anyone considering making the transition from the DeepStack CPU version to the GPU version.

Hopefully this will give you an impression of what you can expect in terms of enhanced performance.

Please note some of my observations are system-specific. My system specs:

I7-4770 processor
16 GB RAM
500 GB SSD
8 TB Purple hard drive
PNY NVIDIA Quadro P400 V2
Headless
9 x 2MP cameras, all continuously dual streaming
5/9 cameras using DeepStack (all Dahua and triggered via ONVIF using IVS tripwires)
EDIT: DeepStack default object detection only - no face detection, no custom models
This server is used only for Blue Iris AND a PHP server (used only for home automation)

The next 6 screenshots show DeepStack processing times data for 5 cameras over a period of 3 weeks. Two images are shown for each week:

the 1st (left) shows all the data (full-scale);
the 2nd (right) shows a subset of the data on an expanded scale (0 to 600 msec).

1) Period 1: the week before the P400 was installed.

2) Period 2: the week during which the P400 and DS GPU version were installed (this event took place before midnight on day 2)

3) Period 3: the week after the P400 card was installed.

The next 2 images compare the Deepstack processing time statistics for Periods 1 & 3 (7 days on the CPU version vs 7 days on the GPU version). Please note that the statistical analyses excludes the long-duration event 'outliers' and use only the data points between 0 and 1000 msec. This approach was taken to provide a 'cleaner', more meaningful comparison when the system is running 'normally' (not stressed).

This last screenshot is a mark-up of the expanded scale data for Period 2.

Observations:

Overall: as expected, the GPU version yields much faster and less noisy results... and the frequency and severity of very long events is greatly reduced.
The worst processing times for the GPU version are rarely slower than the best processing times for the CPU version.
Statistics: The GPU version is ~3.4X faster (139 msec mean vs 469 msec) and ~2.8X less disperse (43 msec stdev vs 122 msec).

Note also that both Periods 1 & 3 contain a similar number of events (1030 vs 1075) and a similar ratio of confirmed: total events (0.41 vs 0.38). The later is consistent with the motion detection schemes being unchanged over the 3 week duration of this experiment.
DeepStack 'Confirmed' events have the same statistics as 'Cancelled' events, regardless of the version. I'm not sure I expected this for the CPU version, but the data is convincing.
Using the settings 'Use main stream if available' and 'Save DeepStack analysis details' increased the GPU version processing time by roughly 20%. (Please note that I conducted this experiment for a little over a day only, and I have not yet performed an independent evaluation of the two settings, so one of them may be dominating the apparent difference. If so, my bet is on the former.)

Please note that observations 2, 3 & 5 are applicable to my system only. If you have a less powerful CPU, the ratios should be higher, Conversely, if you have a more powerful CPU, you may need a better NVIDIA card to observe similar improvement.

jaydeel · Aug 31, 2021

I'm reposting 3 images; this time they all have the same y-axis range (0-1000 msec).

The stats in the previous post apply to screenshots 1 & 3.

1) Period 1: DeepStack CPU version performance (1,030 events)

2) Period 2: Transition (1,162 events)

3) Period 3: DeepStack GPU version performance (1,075 events)

kc8tmv · Aug 31, 2021

Impressive documentation. You would not happen to have a "step by step / how to" for moving from the CPU version to the GPU version would you?

sebastiantombs · Aug 31, 2021

Step 1 - install Nvidia CUDA capable card, preferably one with a large number of CUDA cores.
Step 2 - Follow the installation instructions for the GPU version as posted on the DS forum/page. You can skip the last step in those instructions regarding Visual Studio.
Step 3 - You're good to go.

kc8tmv · Aug 31, 2021

sebastiantombs said:
Step 1 - install Nvidia CUDA capable card, preferably one with a large number of CUDA cores.
Step 2 - Follow the installation instructions for the GPU version as posted on the DS forum/page. You can skip the last step in those instructions regarding Visual Studio.
Step 3 - You're good to go.

No uninstalling of the CPU version?

Rob2020 · Aug 31, 2021

Interesting, thank you for taking the time to crunch the numbers and post the results.

jaydeel · Aug 31, 2021

kc8tmv said:
You would not happen to have a "step by step / how to" for moving from the CPU version to the GPU version would you?

Same as @sebastiantombs said... I kept notes on the links so I'll add those:

Note: #3 has a prerequisite -- you must register for the NVIDIA Developer Program. I did not understand all the jargon, but I managed nonetheless.

As for uninstalling the CPU version first, I cannot recall explicitly doing this. The GPU package installer may have taken care of this.

A few more details...
I started the conversion at about 9:30p and was done by 11:00 pm. This included dragging the PC out of the closet, attaching a USB keyboard/mouse, installing the P400 card, attaching a monitor, then (arghh) futzing around getting the display to work (because I had the PC setup to use an extended dual display and attached display was the #2 monitor vs the HDMI plug as the #1 monitor). I think I spent no more than 40 minutes installing the GPU version and getting it set up in Blue Iris. Quite surprising to me, everything worked the first time! I hit the sack at 11:30.

tech101 · Aug 31, 2021

Thank you for posting this Jaydeel this is very useful info.

sebastiantombs · Aug 31, 2021

The GPU version will over write the CPU version with the files it needs. I didn't uninstall and there was no problem.

bp2008 · Aug 31, 2021

That is quite an improvement from a very low-end GPU.

jaydeel · Aug 31, 2021

One last chart...

This chart shows CPU usage before and after enabling the settings 'Use main stream if available' and 'Save DeepStack analysis details'
Both settings were disabled before the leftmost vertical purple line; ditto after the rightmost purple line.
These data were collected every 10 minutes. The chart has 675 data points.

Observations:

Enabling 'Use main stream if available' on 5/5 DeepStack-enabled cameras increased the CPU usage just a few percent.
Also enabling 'Save DeepStack analysis details' on 5/5 DeepStack-enabled cameras increased the CPU usage from ~20% to ~35%.

Bottom line... continuously saving DeepStack analysis details has a measurable impact, but perhaps not a huge one If you've got CPU cycles to spare,

Flintstone61 · Aug 31, 2021

Maybe it’s just me, but I can’t help but hearing Biden as I read Sebastiantombs posts.

Flintstone61 · Aug 31, 2021

No offense. It’s just my brain.

samplenhold · Aug 31, 2021

Quick question: If your MB/CPU combination has the built in Graphics, and you also install a separate graphics card, can you use both? Like for my system, I have many cams using the NVIDIA NVDEC for HA. But I could not get all cams to use that. There seems to be a number of cams limitation. So the rest are using Intel for HA. Is there a way to get the onboard GPU to also be used by BI for HA? Can you specify which GPU to use for DeepStack? What if I added a second graphics card? Could that be used also?

wittaj · Aug 31, 2021

Yep, you can go into each individual camera and select the GPU number and the type. You can use multiple graphic cards if you have them.

You can install the GPU version of Deepstack instead of Windows.

kklee · Sep 1, 2021

bp2008 said:
That is quite an improvement from a very low-end GPU.

I'm running the exact same video card and have similar results. It's reasonably priced and much low power consumption compared to mainstream Gaming cards.

wittaj · Sep 1, 2021

So, I wanted to get a GPU to offload OpenALPR to it. The documentation says it can, but I couldn't get it to work and OpenALPR took a look at it and couldn't figure out what to do other than get a bigger computer and GPU card LOL.

So before I take it back, I thought I would try it with Blue Iris and DeepStack.

I am seeing similar improvements that was posted here and what @sebastiantombs and others had indicated elsewhere. The GPU is looking to be about 8 times faster than the CPU version.

jaydeel · Sep 13, 2021

FYI... if you've been waiting for a price drop...
PNY Quadro P400 Graphic Card

CCTVCam · Sep 14, 2021

Just a quick question, what's the difference between a Quadro and a gaming card with the same number of CUDA cores? I see a gaming card for 1/2 the price of the above Quadro with 384 CUDA cores.

Will it be faster having more CUDA cores and more memory or is there another factor?

whoami ™ · Sep 14, 2021

Nvidia hand picks the best silicon for Quadro. So the chips for gaming cards are from the same batch. Quadro are enterprise grade cards and come with different memory timings and clock speeds. Quadro cards allow for things like VM GPU passthrough while consumer gaming cards have driver limitations placed on them and require a hack.

applications can be core heavy or memory intensive. so depends on the application. core clock and memory timings are also a factor so cards with the same resources aren't necessary equal.

Deepstack is not a heavy workload. You would need to run multiple instances of DS to compute in parallel to attempt to place a heavy load on a Quadro P400. If you were using something like a old GTX1070 it'd be such overkill you wouldn't even realize it was working.

If you will also want the card to decode video on more than 4-6 cams memory will be the limiting factor. i would be looking at the Quadro T600 with 4 GB GDDR6 @ 160 GB/s then.

Search

DeepStack Case Study: Performance from CPU to GPU version

jaydeel

BIT Beta Team

jaydeel

BIT Beta Team

kc8tmv

Getting the hang of it

sebastiantombs

Known around here

kc8tmv

Getting the hang of it

Rob2020

Getting comfortable

jaydeel

BIT Beta Team

tech101

Known around here

sebastiantombs

Known around here

bp2008

jaydeel

BIT Beta Team

Flintstone61

Known around here

Flintstone61

Known around here

samplenhold

wittaj

kklee

Pulling my weight

wittaj

jaydeel

BIT Beta Team

CCTVCam

Known around here

whoami ™

Pulling my weight