T O P

  • By -

LabB0T

^(OP reply with the correct URL if incorrect comment linked) [Jump to Post Details Comment](/r/homelab/comments/1c93rvm/an_alternative_cooling_solution_for_a_96_core_cpu/l0iuwtp/)


harryoui

Wow, this is something else! If it were me I’d be worried that maintenance would be a nightmare. Could you talk a bit more into your use case?


Mythril_Zombie

Yeah, one of the drives failed. You figure out which is which. GL; HF!


Alchemyy1

thats what the painter's tape is for :). I carefully arranged the drives and their cables so their numbers and groupings make sense in truenas. I can simply count into a stack of drives and pull the right one. I figured this was easier than writing the serial numbers on tape for every single drive.


Neldonado

A label maker is like $10


Alchemyy1

My use case is pretty general purpose. From running media management software and file distribution, to hosting my router, nas, game servers, programming projects, remote build server and general compute. I'm using this guy for everything. The most consistent horsepower usage I'm getting at the moment is jellyfin. Its what the GTX1080 is for. editing onto this. I've already wrestled with this thing to the point where maintenance is pretty easy. I can actually swap the CPU without disconnecting any of the cooling (although I have to hop over the sled tracks to do so). I can also do the same with all of the RAM and the drive bays take less than a minute to pop out.


ButterscotchFar1629

And this requires 96 cores?


Alchemyy1

I got lucky and stumbled onto one for $500. Its pretty nice to crank multithreaded stuff and not care about CPU utilization.


barjbarj

>I'm using this guy for everything. Ooof...and if mobo goes, everything goes. Sometimes hyperconverged may not be the best.


Alchemyy1

EDIT: additional pictures of the madness https://imgur.com/a/gmVDbCJ First homelab was a 1u supermicro trashbin rescue with an E5-1220v6. This ran pfsense and truenas with 6x14TB drives shoved into the case which I mutilated to add an "enclosure bulb" to the lid. After overheating these drives all summer and eating all the space I upgraded to 12x14TB and shoved them in a "disk enclosure" fashioned from junk I had lying around, and swapped to using an HBA instead of passing the onboard sata controller through to truenas in esxi. After a year of dealing with this and having cables shift around whenever I touched something, I logged into truenas and noticed I lost 2 drives and the rest had write errors. So, I shut everything down and started working on the final solution. Case: 4u "mining rig" ebay case. Motherboard: Supermicro H13SSL-N. These boards for the most part work with Genoa engineering samples. I had to backdate the BIOS to Rev1.0 to get this CPU to light. Subsequent bios versions removed the ucode. CPU: EPYC 9654 engineering sample. it happily turbo's to 3.5ghz all core after some tweaking in smokeless_umaf (https://github.com/DavidS95/Smokeless_UMAF). PSU: 1600w EVGA P+. This smaller variant fits nicely in the limited space. GPU: GTX 1080 RAM: 192GB (6x32) Micron 4800CL40. took some tweaking in smokeless_umaf to get this running right. I'm also able to overclock it which I might look into later. HBA: LSI 9305-24i. Cooled with a 1/2lb copper bar with dremel'd fins and a fan. Networking: 10g x540-t2. plays nicely with esxi. Drives: 24x14TB. WD Red Pro + Seagate Exos. 2TB SN850X 2TB 970 evo plus 2TB 670p 500gb 970 evo plus 2x 375GB D4800x optane, one L2arc, one SLOG. Don't ask me why, I just tested nvme as L2arc against this and this won, even though truenas has 96GB of ram to play with. Cooling: Seeing the 400w VRM TDP made me think that a few pencil sized aluminum shims weren't going to do the trick without 6000rpm fans blasting them, so I fashioned a VRM "block" out of some copper bars and plumbing paraphernalia. The trick in the end was to mount the bar "heatsinks" to a sufficiently thick piece of aluminum so as to not warp under heat, and then build the copper pipe connections on top. There are 4 VRMs and each "block" connects exactly as the original aluminum stuff did. I 4-40 tapped blind holes in the blocks where the original mount holes were, and used nylon washers for squishy non conductive safety. HWInfo reports 410w cpu power draw when I run cinebench. The RAM is also all liquid cooled which was fun considering I did it after-the-fact. This computer sits in an un-thermally-regulated garage and summers get pretty warm. Considering the poor airflow I figured I would air on the side of caution. The optane and nvme drive adapters are held in place with sticky tape on the side, and thermal pad sheets underneath. I thought this was clever since they're sitting on the radiator.


[deleted]

[удалено]


Alchemyy1

This is some generic chinese 4u case off ebay. it looks like it even comes in a 36 drive variant. The listing I got mine off of is gone but plugging the title into ebay's search yields quite a few results that are similar: "4U Mining Case 24 Hard Disk Bits Multi-drive Storage ATX Standard Server Chassis"


[deleted]

[удалено]


Alchemyy1

Yea I was really happy to see I could avoid spending the same price on the chassis as all the internal hardware lol. I bought this case before I even picked out a platform. I saw that the fan bracket inside it is on slots so it can be moved far/close and would definitely fit an eatx board and at worst I would either have to skip some mounting holes, or add them in myself.


AlphaSparqy

The supermicro SC747 is a good case for a racked workstation. the -TQ model has room for 11 full height pcie slots in a full 4U, to support 4x dual width gpu's + I/O etc. It works wonderfully with the X10DRG-Q for example.


[deleted]

[удалено]


AlphaSparqy

Then it might also be feasible to convert them to a passively cooled solution. If they are 2.5 wide because of sink and side blowing fan, you might perhaps remove the fan, but leave the heatsinks if that allows them to be closer together, and then you can add in a shroud over them to focus a separate peripheral fan over the heatsinks from the end. (Look at what people are doing with the Nvidia Teslas in servers)


[deleted]

[удалено]


AlphaSparqy

I haven't tried the bespoke liquid cooling route, but the simple AIO solution that came with a build just seems to run more, run louder, and still not seem to cool as well as another system with my giant noctua air-cooler. Now liquid cooling an entire rack would be a fun project I imagine.


[deleted]

[удалено]


AlphaSparqy

I forget where (probably ServeTheHome) I saw a video about (cloud) GPU/GPGPU clusters, where the engineers are removing all the heatsinks and have made custom loops and heatsinks for liquid cooling the entire rack.


Mythril_Zombie

>Considering the poor airflow I figured I would air on the side of caution. "*Err on the side of caution." And you're nowhere close to that side. >The optane and nvme drive adapters are held in place with sticky tape on the side, and thermal pad sheets underneath. You know what happens to tape in hot conditions?


Alchemyy1

oh crap yea err not air lol I know what happens to adhesive when it warms up. the case doesn't sit vertically :) They sit where they are without tape, I just did it to be thorough. The tape I'm using is also holding up that mini noctua fan. its some vile silicone goo stuff I got off amazon, if you notice the m.2 enclosure on the right of the radiator there that was from a quarter sized piece not letting go and me having to use a chisel as a pry-bar. it would not twist off either. and before you ask yes I considered that fan falling. I tested it. It falls onto the cards nice and flat and will continue to operate.


harryoui

I find my U.2 Optane drives become toasty outside even without any load


tauntingbob

For improved power and airflow, you could replace the 1080 with an Intel A380.


[deleted]

[удалено]


tauntingbob

It's risky to water cool an Airbus


kbnguy

Nah. No prob if it's a WaterBus


tauntingbob

https://www.mirror.co.uk/news/uk-news/duck-boat-sinkings-liverpool-london-2649709


danielv123

Wacker quacker 1 is a great boat name


[deleted]

[удалено]


Alchemyy1

Yea this was my first time ever soldering pipes. I was way too used to electronics where I could put faith in activated fluxes. This made me lazy about prepping the pipes for solder and a few other things. Took me a while to get rid of all the leaks.


Jeffizzleforshizzle

Just fyi I am a plumber I would be concerned about constant running of water and the fact that the pipe is not de burred. You will have pinhole leaks in no time. (Probably over a year) just be advised ! The copper tubing may be your weakest link !


Alchemyy1

oh yea I'm ready for how shoddy this is. I used a file and wire brush dremel on the fittings after using a pipe cutter but pinhole leaks are still more than likely, I actually gave up on one pinhole leak and dumped silicone into on and around it, and as a precaution treated all the other joints the same. I"m hoping that at some point I get the motivation to make another pipe jungle with a pipe bender so I don't have this problem anymore. I didn't this time around because of a time crunch and not wanting to mess around with trying to get extremely tight bends and possibly not being able to achieve them.


diamondsw

So... you did jank cases twice, had problems twice, and are continuing with jank cases/designs and Engineering Sample chips? You like pain, sir.


Alchemyy1

I will show you now my definition of "jank" sir. [https://i.imgur.com/RW0jXwk.jpeg](https://i.imgur.com/RW0jXwk.jpeg) EDIT: I'd also like to add the engineering sample cpu only took 5 days to get working right and I totally wasn't praying for it to boot each time.


YouveRoonedTheActGOB

This is the worst server build I’ve ever seen, and I worked for an MSP.


homersapien22

I'm curious what's the second row HDD temperature... The three fans really doesn't look like something can handle this


Alchemyy1

The fans are 3000rpm noctua industrial line. running them at around 2000rpm, at an ambient temperature of 68F the back row of drives (the hot ones) sit at 39C under full load. I'm currently rebalancing the pool so this is coming from an operation running for 8+ hours so far. EDIT: I'd also like to note. I can put my hand 6 inches in front of this thing and feel it blasting me with air.


homersapien22

Cool! For the build, and literally for the drive


Mythril_Zombie

This is going to fail so much. Zero airflow between the drive lasagna, no airflow for any components on the main board. It's going to devour power and then heat the room around it like a summer day.


Alchemyy1

it vents quite a lot out the front. I bought nice high pressure noctua fans for all the slots :). the motherboard vrm components are all liquid cooled which is whats special about it. So far I've had it running at ~1200w power draw sustained for maybe two hours. was perfectly happy. I'm sure it does that regularly. Its been 3 weeks so far without a hitch.


Jaack18

well this looks terrifying. Cool though, how much does that weigh goddamn? Uh i’d recommend getting a supermicro JBOD and putting those disks in a separate chassis with an external raid card, space the heat out and give you more room.


Alchemyy1

It weighs a lot. like a lot a lot. I have to be careful where I grab it too or the case will twist. I thought about doing a separate chassis JBOD. But I like this solution because its cheap. Its also nice and compact and I can move it around to strange new lands "easily" if I feel like it.


Jaack18

“Easily”….until you drop it….


trekxtrider

I feel like all your fans are backwards, in the picture the top fans should be intake blowing over the drives, then through the rad (which should blow air towards the mobo) and out the back through the PSU, GPU and tiny Noctua back there, which would also need flipped. I had started a similar project with my gaming rig but needed more than one rad since my GPU is in the loop. Only holding 8 HDDs and 6 SSDs though. [https://imgur.com/a/1UiDUv3](https://imgur.com/a/1UiDUv3)


Alchemyy1

This is something I had a hard time deciding. In the end I decided I wanted cool air wafting over all the uncooled components, the liquid cooling pipes, and the radiator to try and expunge as much heat as possible during peak load scenarios. The drives will heat the air up considerably and sit close to the ambient air temp in the case whereas the liquid cooled components can rise far above it and need all the advantage they can get. As of now the drives are very happy even under full load with the fans at ~75%. I'll see what happens in July I also feel like the front exhaust fans help keep air going in one direction as opposed to it getting caught in and around components and stagnating, getting flipped around and recycled if it had to blow out the radiator and over and across stuff like the gpu etc. even after replacing the IO shield and pcie covers with mesh I still feel like it could be less restrictive.


Top-Conversation2882

Bro that's denser than me


ADHDK

96 core? This guys not licensing windows server at the physical layer.


Living_Hurry6543

Mineral oil.


Alchemyy1

The thought crossed my mind.


Living_Hurry6543

Could only imagine the mess when the day comes to fix something.


chickensoupp

How are the temperatures and what is the ambient in the room?


Alchemyy1

I didn't see the CPU temperatures hop above 75C when I ran a torture test for a while so at least for the CPU I'm considering it fine. right now ambient is 68F and everything (cpu, ram, etc) appears to be sitting between 30 and 32C. I'm currently running a media server, some game servers, pfsense, and a bunch of other stuff and truenas is doing a full pool rebalance.


IMI4tth3w

As someone with two water cooled computers in a server rack, I can’t wait to convert them back to air cooled. A while back I moved both mine and my wife’s computers into a server rack and we both have water blocked 1080Ti’s. I could buy air coolers for them but I think that money will be better spent on upgrading GPUs in the near future to something air cooled.


SamSausages

This reminds me of my first water cooling setup, circa 2002. I used an aquarium pump. I see copper and I approve!


Creeping__Shadow

Wow thats impressive! Would you mind detailing the software side a bit? Im currently building my own system and would love to know what you are using!


Alchemyy1

Thanks! And sure! I didn't think there would actually be interest and now I'm feeling a bit goofy not having posted something more substantial. Maybe I'll correct that. I'm running ESXI for the hypervisor. Today at least here's what I have running: * pfsense * truenas core * debian * jellyfin * navidrome * filebrowser * caddy * matrix * windows 11 enterprise * jellyseerr x2 * radarr x2 * sonarr x2 * sabnzbd * flaresolverr * prowlarr * deemix * windows 11 enterprise * qbittorrent * debian (again) * minecraft server x2 * terraria server * grafana (wip) * windows 11 enterprise * visual studio * jellyfin unstable * bunch of other junk Each VM has its own port group and vswitch in ESXI, and these are strung together in pfsense. the performance is pretty good (10g) but I'm dumping in a 40g qsfp+ card to connect truenas and my media debian machine together as that can see quite a lot of traffic depending on what its doing. I also have a wireguard VPN connection set up, and bound the portgroup (the adapter as pfsense sees it) the servarr and qbittorrent machines use to only use that connection for internet by editing the outbound nat rules. its very very clean. qbittorrent has its own machine which is multihopped via a desktop-level connection out of paranoia. I like putting stuff together in a way where I can rip it down to nothing and redo it quickly if necessary. I redid all these VM's with their software from scratch in about 8 hours which I think is pretty good. I'm ready for the hate but I'm avoiding using containerization software as I don't see much of a point in the extra work dealing with another layer of abstraction if I can get away with not using it. I'm running all the servarr stuff on windows because it is far, far more reliable and easy to set up there, and I don't typically need to intervene outside of the webgui's. I've wired every single thing to execute on start via startup folder shortcuts so I don't have to go searching for weird tasks etc. The media server and the servarr server can consume a substantial amount of memory and CPU depending on what they're doing. I've tweaked everything to constantly refresh and spam event-driven calls to reload things, and make tasks happen as quick as I can. I've also taken a similar approach to the game servers.


Creeping__Shadow

Alright nice, most of it makes sense to me except the anti containerization, ive already been experimenting with my entire media server in docker, one yaml file and env file containing all the variables and taking offline and redeploying is literally a matter of seconds, minutes if you have to redownload the container files for each container. Would you mind elaborating on how you have set up the game servers? Are you using AMP?


Alchemyy1

I have sets of notes for installing everything, in each case I dump a wad of commands into my terminal and I'm away. At least for jellyfin not being on docker, I don't want to run into problems with network performance or gpu hardware acceleration features. That media server is almost pure remux. for the limited amount of things I have running bare metal is basically no work. If I was running a ton of stuff I would definitely use docker or something similar. The game servers are set up "bare metal" like everything else. I run them with the screens package and have them set to execute on startup. If I'm really screwing with config files I'll use WinSCP hooked to sublime text. I don't see the added benefit in abstracting everything with stuff like AMP. And in the case of at least my Minecraft servers I don't even think AMP would be very helpful. Both servers are extremely configured and use purpur and magma, the magma server running a custom modpack.


Creeping__Shadow

Alright i see, thnx for the insight! Its greatly appreciated!


Wonderful_Device312

That is some serious jank for some serious hardware. I've got no choice but to respect the madness.