tldr; Weird problems due to odd H/W failure lead to a rebuild on a new PC
I thought I'd share my tale of what I found when my BI system started to behave strangely.
My system is a i7-7700K with 16GB RAM, SSD system drive, and two surveillance disks (4TB WD and 8TB Skyhawk). This system has been running for years and was very stable, running Deepstack as well with decent CPU utilization, around 30% when idle). There are 10 cameras, mix of 2MP, 4MP, and 8MP cameras. Most of the cameras are configured for motion trigger with AI analysis via DS. There are clones of 75% of the cameras recording continuously from main stream.
Over the past month or so, DS seemed to consume more and more CPU, I thought it was related to enabling the dark model. Idle CPU would fluctuate between 40% and 60%, and hit 100% when DS was running.
I then added a Quadro P400 GPU and configured DS for GPU, which seemed to help with the CPU usage, but it was still peaking around 80% during DS analysis, which was odd.
After a couple of days of fiddling around with BI configurations, the CPU issue got even worse. Eventually, DS kept timing out returning error 100. Disabling AI didn't seem to make a difference. CPU usage would fluctuate wildly without any motion detections, hitting 100% spike every few seconds and dropping down to around 40%.
I ran a hardware monitoring utility and it showed that the CPU was running hot, 90C and was current throttled. I also tried running the Intel XTU tuning utility and it also showed the same thing.
Problem found, whenever the CPU went above 40%, throttling would kick in due to temperature, bottlenecking the system. It looks like I have a hardware failure of some kind. Probably made worse by the heat we've been having here. I pulled the big Noctua heatsink off and checked the thermal paste in case it had dried out, but it was okay.
Fortunately, I had another CPU/Motherboard handy, so I built a new BI server using an i7-8700K with the Quadro P400 GPU.
I did a fresh installation of Windows on a new NVME drive, and moved the two spinning disks to the new system.
I installed BI and DS for GPU and restored a copy of the BI config from the original machine.
CPU still fluctuates quite a bit when idle on the new system, but idles as low as 20% now. I deleted all the cameras and recreated everything just in case, but BI utilization still jumps around quite a bit. Still an major improved over the old system. I guess it's just the way I have things configured. Everything's optimized, HA, substreams, direct to disk, etc.
Everything working great now and I have a little head room now if I add or upgrade to higher resolution cameras.
I thought I'd share my tale of what I found when my BI system started to behave strangely.
My system is a i7-7700K with 16GB RAM, SSD system drive, and two surveillance disks (4TB WD and 8TB Skyhawk). This system has been running for years and was very stable, running Deepstack as well with decent CPU utilization, around 30% when idle). There are 10 cameras, mix of 2MP, 4MP, and 8MP cameras. Most of the cameras are configured for motion trigger with AI analysis via DS. There are clones of 75% of the cameras recording continuously from main stream.
Over the past month or so, DS seemed to consume more and more CPU, I thought it was related to enabling the dark model. Idle CPU would fluctuate between 40% and 60%, and hit 100% when DS was running.
I then added a Quadro P400 GPU and configured DS for GPU, which seemed to help with the CPU usage, but it was still peaking around 80% during DS analysis, which was odd.
After a couple of days of fiddling around with BI configurations, the CPU issue got even worse. Eventually, DS kept timing out returning error 100. Disabling AI didn't seem to make a difference. CPU usage would fluctuate wildly without any motion detections, hitting 100% spike every few seconds and dropping down to around 40%.
I ran a hardware monitoring utility and it showed that the CPU was running hot, 90C and was current throttled. I also tried running the Intel XTU tuning utility and it also showed the same thing.
Problem found, whenever the CPU went above 40%, throttling would kick in due to temperature, bottlenecking the system. It looks like I have a hardware failure of some kind. Probably made worse by the heat we've been having here. I pulled the big Noctua heatsink off and checked the thermal paste in case it had dried out, but it was okay.
Fortunately, I had another CPU/Motherboard handy, so I built a new BI server using an i7-8700K with the Quadro P400 GPU.
I did a fresh installation of Windows on a new NVME drive, and moved the two spinning disks to the new system.
I installed BI and DS for GPU and restored a copy of the BI config from the original machine.
CPU still fluctuates quite a bit when idle on the new system, but idles as low as 20% now. I deleted all the cameras and recreated everything just in case, but BI utilization still jumps around quite a bit. Still an major improved over the old system. I guess it's just the way I have things configured. Everything's optimized, HA, substreams, direct to disk, etc.
Everything working great now and I have a little head room now if I add or upgrade to higher resolution cameras.