Finally Completed - New EPYC build & Limit Test - (Extremely Unnecessary Builds)

Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Before the comments flood in, this build was done as a project/for-science build.
This is not an endorsement of this type of build for a reasonable person, or (really) anyone tbh.
This is simply a shameless attempt to try and get the IPCT contributor badge ;)


Once a year I get bonus "mad money" and I do some project that might not otherwise make sense:
  • 2018 was pfSense & first big pile of IP cameras from Andy.
  • 2019 was Ubiquity network build out
  • this is 2020's project. The EPYC server build.
I believe there is potentially some benefit to knowing what certain builds might be capable of before spending the money (or when shopping used). @bp2008 kind of led the way with his exploration of Ryzen 3950x.

TLDR - Executive Summary : Performs great, but still buy a used desktop. Server motherboards do not have most consumer motherboard creature comforts.

This build set out to answer some nagging questions probably only I had.
  1. Is CPU memory bandwidth a factor in performance (for Blue Iris) beyond standard consumer grade dual-channel memory.
  2. How exactly does an EPYC chip perform compared to the extreme AMD consumer chips like Ryzen 9-3950x, Threadripper 3960x, Threadripper 3990x.
  3. Server chips are meant to run 24x7, and in data centers. Does that equate to them being more or less energy efficient for 24x7 operation?
  4. How much can a graphics card offload from the CPU, and what is the trade-off in power usage.
  5. Do consumer GPU's offload CPU equally between consumer and business/creator versions of the same product?
  6. Are there limits to NVDEC that impact the consumer cards (GTX970, RTX2060) that do not apply to business/creator cards (Quadro, Tesla) <- wont be testing a $6k Tesla card unfortunately.
  7. Could AMD systems someday make sense, or will Intel chips with integrated GPU's always reign supreme for this use case?
  8. How efficient/cost effective would a new 7nm Server chip be compared to my extremely old i7-2600k which is pulling Blue iris duty and runs high CPU 24x7.
BI Stats is short on comparison systems at the extremely high end. (Blue Iris Update Helper) so I set out to build a system that was not represented at all in 2020. Although my initial premise that memory bandwidth was the sole problem that needed solving, a fellow over at Level1Techs provided a useful link to research & compare AMD chip architectures: https://www.amd.com/en/products/specifications/processors/ and I got some solid recommendations from them since I've never bought a server before.

I budgeted for an AMD EPYC 7262 server CPU (8C16T) $600 because it supports octa-channel memory unlike the cost-optimized 7232P with a crippled memory controller (article here: AMD EPYC 7002 Rome CPUs with Half Memory Bandwidth). The closest consumer variant would be the Ryzen 9 3950x, which is 16C/32T, dual-memory-channel. The Threadripper 3960x does offer quad-channel memory but is also 24C/48T and the CPU itself is nearly $2000 USD so it shouldn't be a fair comparison imho.

While this build is completely unnecessary for simple NVR computer work, since (someday) AMD R7, R9, Threadripper & EPYC CPU's will show-up on the secondhand market, I figured it would help the community to know where they stand against their Intel counterparts.
It's also important to know you could just as easily buy 3-4 standard desktops and support similar numbers of cameras, and likely spend less to buy those extra systems than cramming everything onto a single system.

This EPYC server will finally replace my i7-2600k pulling Blue Iris duty and consistently drawing 120-125 watts (with 140 watt peak).
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Cost & Components (including notes & changes).

I originally planned to purchase an AMD EPYC 7262 server CPU (8C16T) for about $600. It was a nice jump in cores from the i7-2600k (4C/8T) I would be replacing and was on the newest AMD 7nm Node for best performance/efficiency. However, I found a forum article at ServeTheHome.com that indicated you could get these chips at a discount via an upgrade option so I ended up going that route. I SHOULD have used that Upgrade option to get the 8C/16T EPYC 7262 for $420, but instead opted for the EPYC 7302P for $718.55 after Tax & Shipping. I also ended up with a higher quality power supply than I had intended, as the price kept climbing on everything else I didn't want to throw a cheap power supply in and risk burning anything up. It also happened to have enough power cables/connectors to support all the GPUs, Hard Drives & fans I would eventually be connecting to the system.

So here's the end result of about 3 months of research.
1592148401551.png

I had decided to reuse an old full-tower computer case (AZZA Hurrican 2000, circa 2010) to "save money" which makes me laugh a little now, but I had it already sitting in a closet and it was definitely large enough to accommodate the motherboard and expected number of hard disk drives. However, that necessitated getting some more disk drives which thankfully I obtained on sale and shucked successfully (my first time doing that). I also didn't want to roll the dice with 10 year old (included! translucent blue) fans caked in dust, so replaced them all (well mostly all). At this point, I had blown my budget but I was still trying not to make bad choices that might cause component failures for the next few years. I also had to select intake fans with enough CFM to create a positive pressure environment (more inflow than exhaust) to prevent the system from sucking dust in from the mechanical room environment. A side benefit of the Noctua iPPC fans are they are "dust rated", so even though I have filters on all the intake fans I'm hoping they run 10 years without dust grinding them to a halt.

I think the lesson from this was I should have gone with the "upgrade" 8C/16T and waited on the hard drives to land almost exactly on budget.
1592149236797.png
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Build Log Pictures

Motherboard - TYAN S8030GM4NE-2T (Specs Link) Notes: (2) 10GbE + (2) GbE + Dedicated IPMI. Supports (5) PCIE4.0 x16 slots. Supports 14 SATA drives + 6 NVME.
S8030_b.jpg20200526_MB Parts.jpg20200526_4094 Pins.jpg

CPU - AMD EPYC 7302P - 16 core/32 thread @3.0 GHz. Supports up to 128 PCIE4.0 lanes. Octa-Channel Memory controller (8), 128MB L3 cache.
20200526_CPU_MEM_Stock HS.jpg20200526_CPU Pins.jpg20200526_CPU Mounted.jpg

Comparable/Similar CPU's.
CPU Comparison.PNG

CPU Cooler (Stock provided, was good enough to post). Result of contact (none) after Stock Paste post is also shown. This heatsink was bouncing around in the CPU box.

20200526_Stock Heatsink.jpg20200526_FirstPost.jpg20200527_CPU_stockpaste.jpg20200526_FirstPostMem.jpg20200526_FirstPostCPU.jpg20200526_FirstPostAllMem.jpg

CPU Cooler Upgraded.
20200526_CPU Cooler.jpg20200526_CPU Cooler Parts.jpg

Power Supply - Seasonic Prime Titanium TX-1000 : Selected for quality and number of supported accessory cables.
20200526_PSU.jpg20200526_PSU Pack.jpg20200526_PSU Rating.jpg20200526_PSU Connections.jpg20200526_PSU Cabling.jpg

Misc Odds & Ends. SSD's, HDDs, RAM, cabling etc.

20200526_SSD& Misc Cables.jpg20200526_RAM.jpg20200527_HDD Drives.jpg

Turns out it was hard to get over-sized replacement (these are exhaust) fans for a 10 year old case. So had to drill and mount some closely sized ones.
20200527_111320.jpg20200527_TopFans.jpg

Fresh CPU Thermal Paste job before mounting the Noctua CPU cooler.
20200528_CPU_Pasted.jpg
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Memory Testing -
will keep this short because my original assumption that I just needed more memory bandwidth to leap past the critical hurdle everyone hits in Blue Iris is probably not true (in some further testing maybe quad channel helps, but 8-channel is possibly overkill). Further with sub-streams being implemented AFTER I ordered this server but BEFORE I finished building it makes this a less significant contribution. But it took hours swapping modules in and out, then figuring out they had to go in a specific order and redoing the whole thing. Maybe it saves someone else this work in the future.
PassMark - CPU Test EPYC.GIFPassMark - Memory Test.GIF


I ended up selecting NPS4OC configuration, because it had a tiny bit more memory bandwidth, getting stuck at about 1400 MP/s and then switching to NPS1/AUTO to push this system up to 2400 MP/s.
1CH-AIDA64-Mem Benchmark.GIF1CH-MemoryMark.GIF2CH (C0D0)-AIDA64-Mem Benchmark.GIF2CH-C0D0-Big MemoryMark.GIF2CH-C0D0-MemoryMark.GIF4CH (C0D0G0H0)-AIDA64-Mem Benchmark.GIF4CH-C0D0G0H0-MemoryMark.GIF8CH NPS0-AIDA64-Mem Benchmark.GIF8CH NPS0-MemoryMark.GIF8CH NPS2-AIDA64-Mem Benchmark.GIF8CH NPS2-MemoryMark.GIF8CH NPS4-AIDA64-Mem Benchmark.GIF8CH NPS4-MemoryMark.GIF8CH NPS4OC-AIDA64-Mem Benchmark.GIF8CH NPS4OC-Big MemoryMark.GIF8CH NPSAUTO-AIDA64-Mem Benchmark.GIF8CH-Big MemoryMark.GIF8CH-MemoryMark.GIF
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Baseline

For testing, I used 3 cameras (4.1 MP, 5.0 MP, 7.4 MP), I added 10 "test feed" accounts/onvif accounts to each camera.

I configured the main stream at max resolution and max FPS:
  • Camera 1: 4.1MP @ 20fps, H.264H, CBR, 13312 bitrate.
  • Camera 2: 5.0MP @ 30fps, H.264H, CBR, 8192 bitrate.
  • Camera 3: 7.4MP @ 20fps, H.264H, CBR, 10240 bitrate.
  • the maximum bitrate from the camera UI was selected for all cameras.
Then I proceeded to "Add New Camera", specifying for each a different account to use:
  • Direct-To-Disc recording was selected for all cameras.
  • Direct-To-Disc recording was assigned round-robin to each of (3) 12TB WD white label drives.
  • Limit Decoding was disabled on all cameras
  • "Clone Master" was selected on the "General" tab for every camera. (thanks to @fenderman for that guidance)
  • Hardware Accelerated Decode was disabled in Blue Iris settings.
  • Camera setting "hardware accelerated decode" was left at default for every camera (for non-GPU) tests.
  • Live Preview Rate baseline was capped at 1 FPS to arrive at the initial stable load maximum.
  • The built-in graphics is actually provided by the BMC chip (AST2500) onboard that provides very limited graphics & resolution. The EPYC processor actually has no onboard graphics built into it, which is different than most Intel consumer line CPU's.
  • With this multi-stream-per-camera approach I was able to simulate MP/s loads well past the baseline CPU limit with just 3 cameras. I frequently observed MP/s loads that would run for 1m, 5m, 10m but eventually Blue Iris "Status" would turn red and MP/s would crash.
Blue Iris testing no GPU Acceleration - will probably put this in a table.
  • Lo-Rez (1024x768) AST2500 built-in graphics. Stable. 25500 kB/s - 2400 MP/s @ 75% CPU (no SMT). Pulling 19 camera streams. Stable for 4 hours. Network usage shows 220 Mbps
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Blue Iris testing GPU & Non-GPU Acceleration (NVDEC, Quadro M4000)
1592434108915.png

*BMC Beep just indicates the CPU was bouncing up to 100% often enough to trigger BMC warning, might indicate long-term instability, even though I did not observe any frame drops.
 

Attachments

Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Reserved - Summary, Things learned along the way

Just some points and headaches I encountered on this journey.
  • Server motherboards do not offer much to make your life easy, as compared to Consumer/Prosumer boards. For example, mine only had 2 USB3.1gen1 ports which made hooking up a Mouse, Keyboard, USB Stick with windows etc more challenging than I expected. I happened to have an old add-in USB_3.0 card that fit a connector on the motherboard and a keyboard that integrated a USB_2.0 port on it for the mouse dongle. However, comparing to my Z370-H-GAMING system (with (4) USB3.1gen1, (2) USB3.1gen2, (2) USB2.0 PLUS ADDITIONAL internal connectors for addl (4) USB3.1gen1 & (4) USB2.0 ports) the server motherboard was significantly short on USB ports.
  • My Server motherboard didn't have RGB, and it didn't have Fan curves. By default, the fans ran at 30% PWM, so I had to select some fixed percentage or Full Speed. Since this will be in a mechanical closet (and to test most things without burning up the CPU) I had to select Full Speed. At full speed it sounds like a small jet starting up.
  • Creator graphics cards have the power connectors in a different place, I didn't see the low profile connector on the end at first and assumed I had purchased a bad graphics card on ebay. Nvidia online chat wasn't a big help but I eventually discovered my error.
  • Even in a dual NIC situation, with over 250 IP addresses to select from, ensure you don't accidentally put two cameras on the same IP because weird stuff happens and then you make silly complaints to @EMPIRETECANDY before you realize your a dimwit.
  • The EPYC Processor & Motherboard selected offer 5 PCIE 4.0x16 slots. These are full speed slots for graphics cards that don't exist yet, but could easily accommodate 5 legitimate graphics (Quadro, RTX2060) cards, provided the power supply could handle the load.
  • Choosing to "offload" decoding of some cameras to Quadro M4000, GTX970 or BOTH was not efficient. I was able to reduce about 1-2% CPU PER CAMERA, provided I kept the GPU load around 90% (allowing some buffer for spikes). But the power consumption trade-off was poor (the KillaWatt reading was 60-70 watts higher as compared to running the same load decoding directly on the CPU). In fairness, a more modern GPU might have tested better but these legacy 28nm fab GPU's are big power hogs while offering no real improvement/watt. When I upgrade my gaming desktop to 2070 1660 SUPER I can retest to see if it offers offload improvement.
  • I've hit 2400 MP/s "wall" now on this CPU. I tested with multiple network interfaces sharing the load, with 1 and 2 graphics cards sharing the load without making any improvement. Ok, I have moved past the 2400 MP/s wall by turning off a bunch of memory security features in the BIOS, things like: DRAM Scrub, Poison Scrubber Control, Redirect Scrubber Control, TSME, Data Scramble, Data Poisoning. I am not entirely sure which BIOS options were important and still need to read up on them but figured I didn't need Memory Security features. New maximum is 3200-3300 MP/s at 68-75% CPU and no GPU offload for almost 5 hours.
 
Last edited:

ctgoldwing

Getting comfortable
Joined
Nov 8, 2019
Messages
392
Reaction score
574
Location
Beacon, NY
well??? this is a great topic and sounds like you have the build and results well documented! I am very anxious to see the results. I have been playing mind games (and putting together a build on pcpartpicker.com) for a 3990X.
Looking forward to installment 2,3, etc :)
 
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Limits I encountered using the "Clone Master" approach. Different cameras allowed me to pull different numbers of "clone master" video streams before dropping FPS.

IPC-HDW2531RP-ZS : 6 stream limit. (61Mbps)
IPC-B5442E-Z4E : 11 stream limit. (96Mbps)
DH-IPC-PFW8802-A180 : 30 stream limit (max 20 accounts limit in interface). (458Mbps)
IPC-K35A 3MP : 4 streams (48Mbps)
SD29204S-GN (old black ptz) : 2 streams (22Mbps)
IPC-HUM8230P-E1 : Can't get substream working
 
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Here's where I gave up after switching on substreams, and adding 53 cameras on Blue Iris 5.2.9.18 x64 (running as service). None of these are being offloaded to NVDEC on the Quadro M4000 because of energy efficiency being low with that card. Power consumption 156 watts. If I did all my math right this is recording 6451 MP/s, but thanks to substreams decoding the much smaller resolution streams I am not hitting the same software decoding wall as before.

1592771584914.png
1592771619891.png

Monday I will test with two GTX 1660 Super (just to validate how much NVDEC offload Turing provides before putting those in our gaming PCs).
 
Last edited:
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
Here we go with TWO GTX1660 SUPER cards installed and each one is decoding ( 10 * 7.4MP * 20FPS ) streams [top line in red box]. Those 20 streams have sub stream disabled to place substantial load on each card. Total KillaWatt Power Draw: 262 Watts.

1592967455683.pngDual1660S-3300MPs.GIFDual1660S-3300MPs_p2.GIF

Next I have reverted back to sub streams [bottom line in red box], and I starting adding more streams to each card. Then something strange happens if I try and add a 25th stream (regardless of which card I add it to). Further, same strange behavior when I tried removing a stream from one card and placing it onto another (thereby keeping 24 streams, but 13/11 split). This clearly isn't hitting Video Decode/CUDA capacity, and isn't quite maxing out memory either. I suspect this illustrates the consumer graphics driver limit imposed by Nvidia. When this happens all the streams fallback to the CPU until you restart Blue iris Service, and the last one I changed reverts to "Hardware Acceleration:None" after that service restart.

1592966449483.png

I'll unfortunately have to wait awhile to test with a Turing Quadro card, as they are quite expensive at $2k (Quadro 5000) to $4k (Quadro 6000).
 
Last edited:

bp2008

Staff member
Joined
Mar 10, 2014
Messages
10,953
Reaction score
9,566
Location
USA
Very insightful. Thanks for sharing. That is a pretty annoying failure mode for the nvidia cards, hitting some arbitrary limit and then shutting down HWVA across the board.
 
Joined
Apr 26, 2016
Messages
1,049
Reaction score
746
Location
Colorado
This build set out to answer some nagging questions probably only I had.

1. Is CPU memory bandwidth a factor in performance (for Blue Iris) beyond standard consumer grade dual-channel memory.​
A: Yes, definitely at least up to quad memory, unless using substreams. However, I found the software decoder limit and my own BIOS config limited me more than memory channels for a long while.
2. How exactly does an EPYC chip perform compared to the extreme AMD consumer chips like Ryzen 9-3950x, Threadripper 3960x, Threadripper 3990x.​
A: Much lower clocks, especially boost is significantly lower. Overall lower power consumption is a side benefit. It operated between 95w (<5%) and 195w (95% +).
3. Server chips are meant to run 24x7, and in data centers. Does that equate to them being more or less energy efficient for 24x7 operation?​
A: I don't think I was able to measure efficiency, but I was a little startled that just having the system ON was consuming 95 watts. Also, due to MB BIOS, my only fan options were fixed-percent PWM or 100%. Software that normally allows control like Speedfan & HWMonitor, don't seem to have access to the fans on this server motherboard. Once again something you might take for granted on a consumer motherboard doesn't appear to be possible on this MB with V1.00 BIOS.
4. How much can a graphics card offload from the CPU, and what is the trade-off in power usage.​
A: For the old graphics (GTX970, M4000), yes offload of 600-700 MP/s per card but it results in a bad trade for significantly higher power consumption. Trying with the modern GTX1660's because (I believe from reading) that NVDEC capability is primarily based on generation & VRAM, was able to offload significantly with modest power consumption increase (of 50-100w depending load) considering I was running TWO graphics cards.
5. Do consumer GPU's offload CPU equally between consumer and business/creator versions of the same product?​
A: I believe so but only to a point. I definitely hit a software limit on the GTX1660 (that I assume doesn't exist on the P2000), and the (older) M4000 compared to GTX970 weren't very good either way so nothing stood out there. If I really HAD to do more MP/s or couldn't use substreams for some reason) like due to it being critical, I'd want to test-drive a newer Quadro with enough VRAM to handle the stream volume without having to worry about the software limits. There are "patches" out in the wild to supposedly address this on the consumer driver, but I just can't allow myself to run random Russian/Polish EXE files and hope they are safe hehe.
6. Are there limits to NVDEC that impact the consumer cards (GTX970, RTX2060) that do not apply to business/creator cards (Quadro, Tesla) <- wont be testing a $6k Tesla card unfortunately.​
A: I believe so but was unable to prove it as the M4000 vs GTX970 was inconclusive. I will wait until they are 1-2 generations old and pickup on the cheap.
7. Could AMD systems someday make sense, or will Intel chips with integrated GPU's always reign supreme for this use case?​
A: I think they will do fine, but I think they are at a disadvantage because most Intel chips will have the iGPU which provides offload capability on-chip. While I think the efficiency/heat benefit is a positive, the implementation of sub-streams will further lower the threshold for what it takes to run Blue Iris (to the benefit of most hobbyists). I wouldn't sweat running an 8 core AMD with iGPU, as I suspect it will be lower power and still efficient.
8. How efficient/cost effective would a new 7nm Server chip be compared to my extremely old i7-2600k which is pulling Blue iris duty and runs high CPU 24x7.​
A: At idle they aren't far enough apart in killawatts, at much higher CPU loads (and considering the EPYC can handle massive workloads compared to this i7-2600k) the EPYC is a clear winner. Unfortunately, I don't have an i9-7980XE, I'm sure it would test pretty well against the EPYC (but it's also a $1800 cpu), but for the money people should seriously consider sub-streams or running TWO smaller systems than trying to cram everything onto one imho.
 
Last edited:

DanDenver

Getting the hang of it
Joined
May 3, 2021
Messages
48
Reaction score
52
Location
Denver Colorado
This is very helpful, I am building a much more modest system but your learnings will help me to tune my system for sure. I custom ordered from Action Computers (south Denver) a I7-10700, 16GB RAM, dual memory channel setup. I will only be running about 12 cams total. Currently I only have 8. Right now hose 8 are running on my craigslist computer which Is an I7-2600. the CPU hums along around 40-60% with spikes to 90+. So I don’t have much headroom left, looking forward to the 10700
 
Top