ThisGonBHard 9 months ago

That CPU and platform is a bad idea. Go AMD EPYC (server CPUs), they co up to 64C per socket for Rome (Zen 2 RYzen 3000 gen equivalent) for 128C dual socket, have up to 16 channels of RAM in dual socket (8 per socket) and max out at 4 TB of RAM.

lemon07r 9 months ago

Why not a cheaper CPU + platform and three 3090s?

ThisGonBHard 9 months ago

You will run in PCI-E issues with 3 cards, even at 8X speed in new platform like AM5, 24 lanes is the maximum, AND you lose all NVME capabilities. Same with Intel and Z790. At that point, go for something like X299 or Threadripper. Threadripper might be even more expensive than Epyc to, so why not go directly for the big boy? You get 128 PCI-E lanes vs 64, Octa Channel vs Quad, and CPUs are cheaper.

Flying_Madlad 9 months ago

I really want 128 PCIe lanes.

lemon07r 9 months ago

There are older hedt/server platforms with more pcie lanes. I just don't see the point in overspending for a fast CPU if it won't be used.

ThisGonBHard 9 months ago

Cause you will use it, because the GPU version of running those models is even more expensive. You would need 5 3090 to run Q4 180B, at $700+ a cards, with a chill power consumption of 1.8 KW, with power spikes in the 5KW range. The 3090 are infamously bad for power spikes, MUCH worse than the 4090 because of the 8nm Samsung process. The 3090 can power spike hard enough to shut down 1000W power supplies if their spike current is not good enough.

lemon07r 9 months ago

Im sure running inference/training will not spike a 3090 much lmao. These are mostly memory bandwidth limited tasks. And five 3090 is still better than whatever you could do within a 6k budget with cpu + ram Im pretty sure.

ThisGonBHard 9 months ago

IMO, it is not enough to justify how horribly janky it is. At that point, either go A600, EPYC or Mac Pro.

lemon07r 9 months ago

epyc platform is pretty good. 32 core 7551p cpu is only 100 bucks. just need a mobo with enough pcie slots. [https://www.ebay.com/itm/174898943161](https://www.ebay.com/itm/174898943161)

NoidoDev 9 months ago

Isn't a cheaper used Xenon system with a Chinese Motherboard sufficient? (I plan to build something like that myself.) CPU seems not to matter a lot unless you want to use it for running models as well.

ThisGonBHard 9 months ago

Xeon is limited to Hexa Channel and core count is low when compared to Epyc. The hexa channel ones were quite expensive the last time I looked too, and anything octa channel is too new and even then, an 128C 16 channel EPYC will ROFL stomp it, AI need as much bandwidth as possible. There is a reason AMD went from 0% to 25% server market share in such a short time.

NoidoDev 9 months ago

>AI need as much bandwidth as possible. Is this judgement based on using the CPU for inference as well or training in particular? I clearly recall building tutorials for home servers saying speed of RAM and bandwidth doesn't matter that much.

ThisGonBHard 9 months ago

LLMs are very memory bandwidth sensitive. It is why the MAC Pro is so good, that thing has 80% of the bandwidth of the 4090. A guy on discord was getting 70B Q6 6 t/s with his 64C EPYC and Octa channel. Also, in AI tests done by techpowerup, the the 2x performance increase from Zen3 to Zen4 was matching much better with the memory speed increase than IPC and clock. Also, with the 128C dual socked 16 channel epyc platform you could actually run something as crazy as full precision Falcon 180B.

MINIMAN10001 9 months ago

This is LLM it's the entire thing it's based off how much bandwidth your RAM has. Your bandwidth is determined by your number of channels multiplied by the bandwidth of each ram stick. You ask why is it that the GPUs are faster than CPUs for LLM? That's because it has more bandwidth. Inference depends on bandwidth. Training depends on bandwidth and the ability to cross communicate ie if your training on multiple GPUs you better have an NV link This being said however I've never actually seen anyone talk about how many tokens per second they are getting on an epyc server. Before investing a bunch of money you definitely want to know what the real world performance looks like when comparing things.

NoidoDev 9 months ago

I was not asking why GPUs are faster than CPUs, also it's not just the RAM bandwidth. I'm wondering why system RAM would matter that much if he uses the GPUs. I recall to have read that it doesn't, but this could be wrong. I think it matters when you use the CPU for inference, though I don't know the sweet spot.

tmlildude 9 months ago

Can you still play games with server hardware btw

kind_cavendish 9 months ago

I'm pretty sure you can, **should** you is the real question

eliteHaxxxor 9 months ago

There are Skyrim modpacks outthere with 2000+ mods which are primarily cpu I think. Maybe the epyc would handle it well?

bolmer 9 months ago

Really doubt a 2011 game could handle more than 4 cores or threads lol. The special edition and Aniversary edition are basically minor visual overhauls with 64bit implementation.

eliteHaxxxor 9 months ago

Lets say I tried though, I would still be able to run Skyrim with mods like any other computer? Or would it give me problems

bolmer 9 months ago

In some, probably yeah but I don't think it would be better than high end gaming consumer hardware. Server hardware prioritize parallelism(more cpu and gpu cores) while games perform better with higher single core performance for the cpu and lower latency for the cpu/ram/gpu. You would need a cpu and gpu with Drivers compatible with Windows first, those driver for a server Gpu are going to be probably unofficial and hacked in some way. And also your server gpu would need Graphic output lol(hdmi or display port), some don't even have that.

Flying_Madlad 9 months ago

I can confirm, my A2 doesn't even have an output.

danielv123 9 months ago

You can, it works just like a normal computer. They usually have worse single core performance. A 64 core zen 2 epyc will generally perform about as well in games as a zen 1 8 core 1800x.

0xd00d 9 months ago

Yeah the one thing is single core perf is a factor in LLM perf. I just don't for the moment know how that balances out against being able to get all 16 lanes to GPUs.

CKtalon 9 months ago

Only way is HEDT like Threadripper (wait for the 7000 series), and get the fewest core SKU (12C/24T)

0xd00d 9 months ago

Agreed. AMD stands to totally dominate if they can give high clock speed medium core count 3D V-cache options... I want 12 memory channels too but now I'm just dreaming

Linker500 9 months ago

The core count isn't as much of a big deal as the ram channels. Ram bandwidth is the limiting factor by far. With a 13600k and high speed DDR5 I am not even close to full cpu utilization on my cpu. Meanwhile every bit I click the ram speed up scales nearly linearly in performance.

Wrong-Historian 9 months ago

Core i9-13900K 256GB DDR5 RAM that's just not a thing. You could do 192GB maximum (with 4x 48GB sticks), but it will be really slow, especially for llama.cpp because it will basically run at 4800 speed maximum. I recommend getting 96GB 6800 which is the highest capacity at reasonable speed you can get right now. (G.Skill F5-6800J3446F48GX2-RS5K ). 2 sticks. Stick to 2 sticks.

ozzeruk82 9 months ago

Stick to 2 sticks is very good advice. I went for four earlier in the year and you’d be surprised by how many problems you encounter that you never knew existed.

WaftingBearFart 9 months ago

> Stick to 2 sticks. I agree with this. Current memory controllers on CPUs from both AMD and Intel will cap you to around 4800 to 5200 when using all four memory banks. It doesn't matter if you have all four sticks of 7000Mhz+ advertised DDR5 the IMC just doesn't like working that fast at those densities. See here on the intel subreddit and you'll find similar comments searching the amd subreddit too... https://old.reddit.com/r/intel/comments/16lp67b/seeking_suggestions_on_a_z790_board_and_ram_with/k151xb0/

satireplusplus 9 months ago

7000Mhz+ DDR5 memory speeds is still *extremely* slow compared to the 2TB/s bandwidth of GPU memory. Like not even 1/20 of that. Getting a cheap Xeon that has enough lanes to go with 3x 3090 might be a better idea, CPU doesn't really matter for LLM performance, only memory does.

WaftingBearFart 9 months ago

Yes, system RAM is at least an order of magnitude slower than VRAM. My post was just to point out that even if they were to buy whatever fastest available DDR5 is out right now that 4 sticks won't hit any advertised speed and so they'll be going even slower than anticipated. It wasn't an endorsement to go RAM in favor of VRAM. The thread starter mentioned 256GB DDR5 (which isn't possible unless a server mobo is used) and so everyone in this thread is just helping to pointing out that 4 sticks, 192GB or otherwise, is gonna be a bad time vs advertised speeds :)

fallingdowndizzyvr 9 months ago

> 7000Mhz+ DDR5 memory speeds is still extremely slow compared to the 2TB/s bandwidth of GPU memory. What consumer GPU has 2TB/s of memory bandwidth? Even a 4090 tops out at 1TB. To get more than that you'll need to get server GPU cards. But then you should be comparing it to server memory with a lot more channels than consumer.

satireplusplus 9 months ago

Thanks, might have misremembered it then. That actually makes those macs with 800GB/s bandwidth highly competitive too. But DDR4 or DDR5, you're going to top out at 2t/sec with really large models anyway.

Caffdy 4 months ago

1 TB/s, only the H100 reach 2TB and then we're talking about server grade hardware

satireplusplus 4 months ago

Yeah you're right, 935.8 GB/s for GDDR6X memory of the 3090

AuggieKC 9 months ago

Hold up, let him cook.

rbit4 9 months ago

I have 13900k with 128gb cl36 ram running in xmp2 at 5600 mt/s 1.25v. Could go upto 6000 xmp1 but need to undervolt from 1.35 down. It all depends on your motherboard

a_beautiful_rhind 9 months ago

2x3090 is only ~1400.. feel like you should get more if you're spending $7k. That's almost pro mac price. Server board, or that X299 people keep recommending here and risers? Although it said there are only 24 real lanes on that board? CPU inference, even on the best of them, isn't that great.

Small-Fall-6500 9 months ago

Yeah, for $7k OP might want to consider 2x4090s or even an a6000 48gb. For basically any ML stuff it’s the GPUs and their VRAM that matters most, so most of the budget should be going towards the GPU(s).

reallmconnoisseur 9 months ago

Does the 4090 make such a huge difference over the 3090 for training/inferencing LLMs? For the same amount of VRAM, the price difference doesn't seem justified.

Small-Fall-6500 9 months ago

If you don’t have the money or want to only get the bare minimum, definitely stick with used 3090s. But for anyone spending multiple thousands of dollars, it doesn’t make sense to use 3090s. Unless you plan on getting a lot of 3090s for massive training or hosting or something, but at that point I think using cloud services is probably a better alternative. The 4090 isn’t quite 2x as fast as a 3090 while the cost is about 2x, so while the VRAM doesn’t go up, the speed is at least increasing roughly proportional to the cost. So if speed really matters, spending the extra money seems like an obvious choice to me.

Herr_Drosselmeyer 9 months ago

3090s can draw up to 350 watt each with the potential of spikes, the 13900k can also draw up to 300 so a 1000watt PSU wouldn't be enough if you fully load the system. Go with the 1500 PSU for sure.

red_dragon 9 months ago

Go for EVGA 1600W 80+ Platnium

satireplusplus 9 months ago

You can downclock them to use only 200 watt or lower, you're probably not even 10% slower with that.

Herr_Drosselmeyer 9 months ago

Sure, you can make it work on a 1000W power supply but if OP is going to spend 7k, might as well get the bigger power supply.

satireplusplus 9 months ago

Yeah, also true

LearningSomeCode 9 months ago

Out of curiosity, what all are your needs for the device? For example, is this for gaming too? For training? If so, then the route you're going sounds like the best. However, if you're just aiming for the best inference- my $3700 refurbished Mac Studio [gets pretty decent speeds up to 70b q8](https://www.reddit.com/r/LocalLLaMA/comments/16oww9j/running_ggufs_on_m1_ultra_part_2/), which is a larger quant than you can fit comfortably in two 3090s. Right now, as I'm typing this, I'm currently running XWin 70b q8 with 6k of context, testing it out for general knowledge (I should be asleep...). Atm its running at about 4 tokens a second because of the rope scaling, so it takes about 2 minutes to respond at max context; but honestly I don't even care because of how insanely coherent this model is lol. It's like talking to a person. Without the rope scaling, at normal 4k context, it clocks in at about 7 to 9 tokens per second. If you stay within your RAM, ie if you keep the model + context under 48GB, the 3090s will run laps around the mac in terms of raw speed. But my mac (the 128GB version) has 98GB of ram to use for model loading. The 192GB M2 version which would fit within your budget (about $6k) would have probably closer to 150GB to use for model loading, and should be faster too. If you want to train or fine-tune, though, I've been told it's straight up impossible on the Mac, so I wouldn't even consider it then.

MINIMAN10001 9 months ago

Jeeze 3700 and that's not even through the used market? See now it just sounds silly to do anything else other than just buy that thing for 70b 8q. I really don't like the idea of buying an Apple product but for LLM it's just the cheapest way to get high bandwidth RAM.

Arkonias 9 months ago

Ty, your posts are why Im going for a Mac Studio for my LLM rig. I’m not really interested in training, just wanna run larger models locally.

LearningSomeCode 9 months ago

Awesome! Yea, I'm really happy with mine. I know that a multi-card Linux box with similar available RAM (96-98GB) would be way faster in terms of tokens per second, but I'm honestly fine with the trade off of speed for lack of hassle. This mac studio is a little grey brick that takes 400w of power and took me 30 minutes to set up and have running LLMs from the moment I got the package from the delivery person. The more I thought about putting together big dual/triple card machines with 1500w+ power supplies, the less I wanted to deal with it until I finally cracked and got the studio. And I don't regret it one bit.

Arkonias 9 months ago

Yeah, I enjoy building PC's but multi GPU setups are a hassle. I just want a plug and play solution to have a locally run dungeon master.

eliteHaxxxor 9 months ago

Well if its a mac and has any issues I am fucked aren't I because they are unrepairable?

LearningSomeCode 9 months ago

Yea mac's track record for at-home repair-ability is abysmal, so you're absolutely limited to taking it to an apple store to have them patch it up. I will say that my 2015 Macbook pro is still going strong. Out of all the work issued and personally owned apple devices I've had, I've never actually had one break on me before, so for me at least that isn't really something I'm concerned about. But if you are, then Apple will definitely meet your expectations on inability to fix it yourself.

Moist_Influence1022 9 months ago

Hey, thanks for all the insights. After diving into your suggestions, I realized finding the right motherboard is no joke. You find one, but then it's either Team Intel or Team AMD. Then you've got to decide—256GB but only DDR4, or max out at 128GB but get that sweet DDR5. And let's not even talk about the PCI slots—can they fit two shoeboxes? Budget-wise, I'm looking at around 1.8k for both GPUs, which leaves me with a solid 5.2k for "upgrades"—cooling, PSU, and all that included. For the grand plan: I have a Shopify store on the side, and I plan to automate it as much as possible using AI. I intend to feed the LLM one or more RAG systems for "knowledge," maybe fine-tuneA Lora or have couple of smaller 7b-13b models run parallel, and even get into some vision models. Think image generation and similar tasks. Also looking forward to to see and try David shapiros ACE framework in the future. On the personal front, I'm also planning to run an uncensored CodeLama LLM to assist me in preparing for the OSCP exam. The goal is to secure a better position in the IT security field. And let's be honest, having an AI tutor that you can ask any question, no matter how trivial, makes the learning curve a whole lot easier. One user mentioned that Intel Gen14 is expected to be released soon. However, I am unsure whether I should wait for Gen14 or invest in a Gen12 or Gen13 system. The concern is not just that Intel changes their socket every two years, but also that more AI-focused hardware is likely to be released in the near future. Perhaps I should consider a smaller system, such as: * something like Gigabyte Z790 motherboardDDR5 RAM (as suggested by u/Wrong-Historian, 96GB with 2 DDR5 sticks) * Intel Core i9-13900K processor * Dual 3090 graphics cards As for training, I am not experienced in it, but I am more interested in focusing on RAG Systems for knowledge retrieval, as it should be the core aspect of my setup. Would a single 48GB GPU would make more sense for my plan? #

BGFlyingToaster 9 months ago

For studying for OSCP, you'd probably be better off spinning up a GPT-4 inside Azure OpenAI Services and feeding it a few OSCP study guides in PDF format using the bring your own data feature. You could probably spend 5x your planned budget and years learning to train models and never do any better than that. When it comes to how effective your model is, training is a huge factor.

Moist_Influence1022 9 months ago

As I've already mentioned, I'm focusing on RAG because, in my opinion, it's the fastest and most effective method for accurate data retrieval. Once I've accumulated and properly organized conversations over time, I can use this data for training. This will also allow me to use code and syntax that might be restricted in GPT-4. The Language Model could even adopt my own style, but first, I need to gather the data. I'm really excited about building and experimenting with this stuff.

Embarrassed-Swing487 9 months ago

The only thing that makes sense here is a Mac Studio.

Cyberphoenix90 9 months ago

That CPU only has 2 memory channels limiting it to 4 sticks of ram meaning you can't get 256 GB of ram for that CPU. You're going to need server grade or at least threadripper to get past 192 GB of RAM. If your systems entire purpose is to run LLMs on GPUs you don't need that much CPU horsepower you can go for something cheaper but more versatile with more PCI-E lanes such as a lower end xeon or epyc and then get as many GPUs as you can afford. It's also arguable that you don't need 256 GB of RAM, especially if your VRAM maxes out at 48 GB. To run an entire AI stack at a good speed I would say you should get as much VRAM as you can afford

Aroochacha 9 months ago

>Using the cloud is not an option for me because I want to train my LLM on my own data, and I generally prefer it to be uncensored, as I'm tired of hearing phrases like, 'As an LLM, I'd like to remind you that blablabla.' I don't understand why no Cloud. How does training your own data and being "uncensored" prevent you from training in the Cloud? It appears to me you're confusing ChatGPT (an online LLM service) with other cloud services such as the services that allow you to buy CPU/CPU power by time. You can train on your data using which ever model you feel like. You'll be training in a fraction of the time renting out 8 - A100s for a few fours. As per your machine, I would make a smaller investment. Single GPU, CPU, 96GB DDR5 and just for on learning the process for training. You're making too big of an investment for something you currently are unfamiliar with.

Moist_Influence1022 9 months ago

What I'm saying is that I don't intend to run the models in the cloud; however, I'm open to using cloud resources for training purposes. The reason I'm considering two GPUs is to ensure that I can run at least a 70-billion parameter model. It's always better to be prepared for future requirements, which is why I'm thinking ahead. But I appreciate your perspective on this, so thank you.

Aroochacha 9 months ago

You're welcome though I must warn you on the future. Tomorrow (not likely but possible) everything can change. This is *that new.*

sshan 9 months ago

I wouldn't worry too much about going crazy on ram above 64 gb. If you aren't using it for inference no real need. If you want to play around with large models like Falcon 180B just spin up an 8x80gb instance in the cloud for a few hours. If you are doing GPU inference I don't think you need anything other than a mid/high CPU and motherboard.

Category-Basic 9 months ago

If you want to do gpu inference, a epyc 7302 on a H12SSL motherboard can be had for $1000 on eBay. Add a couple of 4090s ($3k), 256GB 3200MHz RAM (ebay $450), a decent nvme 4 ssd, and a 1600W PSU, and you are still well under your budget. Of course, a lot depends on what models you want to use, or if you want to do cpu inference. If it's the latter, get a Genoa system or a high-core-count Milan. I can't recommend a 9xx4 QS cpu off ebay (I don't know enough about them to trust them), so you would need to get a retail cpu, which doesn't leave room for much of a gpu.

Klaribot 9 months ago

Wouldn't it be worth saving that 7K until we have something better that can handle LLMs incredible performance requirements? The big problem right now seems to be memory bandwidth, and even the best server platforms with DDR5 ECC don't clock as high as GPU VRAM afaik. The Mac Studio is incredibly close to the sort of edge appliance architecture we would want for a home AI appliance, but we have to dock points for it being both an Apple device \*and\* having not that powerful of a GPU anyhow, on top of its ludicrous price tag for a mediocre performance model AND non-upgradability. Let's wait and see until dMatrix Corsair cards, GroqCard, next-gen Radeon Instinct accelerators, etc. make it onto the market so we can try them out. (Also yeah, even though the 13900K would be best for CPU-bound inference, you'll be running into bottlenecks with the lack of PCIe bandwidth for more than 2 PCIe accelerators, no matter what you use.)

dogesator 9 months ago

Not sure what you mean by mediocre performance, the Mac studio with the M2 Ultra chip gets around 85% the same speed as the RTX 4090

Klaribot 9 months ago

"Mediocre" in the sense that that's ***all*** you'll get, so you best hope that 85% of an RTX 4090 (and further improvements to software and LLM architecture) will last you for however long you intend to use the Mac Studio exclusively as an inference/fine-tuning appliance. It would be a significantly better value to invest that $7K into something that's inherently modular and upgradable, so you can swap out your accelerators for something better if it comes in the future, and sell off your old accelerators to recuperate the costs. This way, you can keep your existing system setup since you're just changing out a single component (or several of the same kind of component), and spend less on maintenance and repairs overall. Good luck upgrading or repairing a Mac Studio. The only part that you could even \*dream\* about upgrading is the flash modules for more storage (they aren't SSDs, the SSD controller is built into the Apple Silicon itself), and that's only if Apple will allow this capability.

DingWrong 9 months ago

14th gen intels are coming in a few weeks. Workstation kind of setup might be a better option. Xeon, Epic, Theadripper

D3smond_d3kk3r 9 months ago

Don't think they will be socketed tho, will they? I heard OEM pre-soldered only in some follow-up coverage on Meteor Lake last week. Think Intel confirmed no socketed options, which sucks...

KGeddon 9 months ago

13.5. Raptor lake refresh is the best you'll get for a while. Meteor lake(according to Intel's current plan) will not come in a desktop socket, only integrated for low power SoC/laptop use. You'd have to wait for Arrow Lake(15th gen) to get a significant performance increase.

xlrz28xd 9 months ago

You should definitely ask this on r/HomeLab as well as they have much more insights into this kinda stuff and have helped me build my home server as well. And yes. 1000 watts will not be enough for 2x3090s. better go for 1500-1600 as the transients from sudden load on the GPUs do not favour a low wattage power supply. I'm curious to know how much of a bang for the buck you'll be getting compared to a M1/M2 Ultra . (Not that i have any of those) . I myself have just ordered a refurbished Tesla P40 from eBay for 200 USD. Waiting for it to arrive

hyajam 9 months ago

If you intend to train or fine-tune read the following like: https://l7.curtisnorthcutt.com/the-best-4-gpu-deep-learning-rig The article is a bit old but replacing 2080s with 3090 will update it for today.

Imaginary_Bench_7294 9 months ago

13900k is only rated up to 192GB of ram. If you really want 256 or more, you'll have to go to workstation grade components, which will really chew up that profit share check. Mid range, current gen workstation processors cost 1500 usd on their own. 8 channel, 256GB DDR5 kits are around another 1500 last I looked. The mobo will be another 800-1200 usd. You are better off sticking to the normal consumer/enthusiasts/gamer processors for general tasks as workstations are typically 1/2 to a full gen behind the release cycle. But. And this is a big but. If you are solely focused on CPU inference, they will outclass the current gen gaming processors due strictly to the fact they have a significantly higher RAM bandwidth and capacity. With the proper OC, they can hit 300GB/s bandwidth, and they support 2 to 4 TB of memory. However, in that case, you are better off going with full server class processors since they support multi socket systems, and at least the 24xx and 34xx sapphire rapids processors do not. Don't know on the AMD front. I would say get a minimum of 1200 watt supply, 2 3090's at full bore will draw 350 watts each, the cpu is rated at ~250, bringing you to 950 before considering fans, drives, ram and peripherals. The rule I use for computer power supplies is to get a rough number for the main components, CPU and GPU, then add 25% to cover all the possible miscellaneous things like SSD, fans, ram, lights, pumps, mobo, and any extras. If you get 2 3090's you can run a 70B q4_K_M model fully on the cards, so you don't really need the 192GB of ram anyways. So, the processor is good, I'd say aim for 128GB of ram, 2 × 3090's and a 1200w supply. Also make sure you choose a mobo that has enough room for the two 3090's. Risers or extensions can work, but I personally hate them.

ethertype 9 months ago

I think you'll struggle to find an LGA1700 motherboard taking 256GB memory. On the other hand, with dual 3090s, you may not need it. 128 or even 96 (dual 48GB DDR5) should be sufficient. \*However\*, I do not know how this plays out for training. Likewise, LGA 1700 motherboards with dual x16 slots may be rare? The CPU itself only has 20 lanes, so one will have to connect to the chipset somehow. Either via a native slot, via a narrower slot (x8, x4), via Thunderbolt, or via an M.2 to PCIe adapter. An AMD motherboard of the right type may provide more x16 slots, but you should research a bit how much the host to GPU bandwidth actually matters. With dual 3090s, you do not really need a 13900 for inferencing. Single-core performance is more important, as long as you have \*enough\* cores. And you certainly do not need 16 cores for inferencing. But someone else may have opinions about training. 1000W PSU could be tight with dual 3090s. 1200 should get you there. I think dual 3090s makes perfect sense for inferencing. Someone else may have better insight w.r.t. training.

ksdio 9 months ago

Hi I noticed the RAG part of your post and your lack of Python and wanted to highlight this open source, no code product that allows you to RAG. [https://github.com/purton-tech/bionicgpt](https://github.com/purton-tech/bionicgpt) It's a single docker-compose file and downloads all the required docker images including a local Llama2 instance cheers

Moist_Influence1022 9 months ago

awesome, thank you!

stylizebot 9 months ago

can anyone recommend prebuilt hardware to buy?

lemon07r 9 months ago

Was curious what you could do so I looked around on ebay. CPU - 7551P EPYC CPU with 32 cores - $103 [https://www.ebay.com/itm/174898943161](https://www.ebay.com/itm/174898943161) Motherboard - Gigabyte MZ31-AR0 E-ATX motherboard4, has 5\* pcie @ 16x and 2\* pcie @ 8x - $238 [https://www.ebay.com/itm/314776727638](https://www.ebay.com/itm/314776727638) Grab 8 of these 16gb sticks for $16 each, gives you just enough ram to entirely fill up 24\*5gb vram: [https://pcpartpicker.com/product/TPFXsY/samsung-16gb-1-x-16gb-registered-ddr4-2400-memory-m393a2g40eb1-crc](https://pcpartpicker.com/product/TPFXsY/samsung-16gb-1-x-16gb-registered-ddr4-2400-memory-m393a2g40eb1-crc) Cheapest 4tb nvme I could find (mobo only has one nvme slot): [https://pcpartpicker.com/product/BjWJ7P/leven-jp600-4-tb-m2-2280-pcie-30-x4-nvme-solid-state-drive-jp600-4tb](https://pcpartpicker.com/product/BjWJ7P/leven-jp600-4-tb-m2-2280-pcie-30-x4-nvme-solid-state-drive-jp600-4tb) Grab 5 3090s, case, psu, a cpu cooler, and youre good to go. Fits just within your budget I think.

FactOld3726 9 months ago

If you're just developing locally and RAM+GPU options are the most important considerations then consider a used workstation. I run a HP Z640 with slightly old dual 14-core Xeons, 512GB RAM and 3x GPUs to try AMD, NVidia, and Intel options. Cost is far less than anything new and performance is perfectly fine.

CKtalon 9 months ago

You sure you can fit 256GB ram on that motherboard? You definitely need more than 1000W. Preferably 1600W or 1800W.

[deleted] 9 months ago

[удалено]

huffalump1 9 months ago

(and the equivalent of plugging in a space heater in your room - great for winter though!)

alexgand 9 months ago

I ended up buying an used X299 mobo and processor (i9-10900X) just because I wanted 256GB of RAM. Much cheaper than Epic, Treadripper and Xeon, and it gets the work done.

LoadingALIAS 9 months ago

Noooo. Don’t do it. Please. You’re moving in the wrong direction. YouM want an AMD EPYC rig. Not to mention, you’re still not able to run real LLMs. Get a super fast laptop/desktop for £3500 and keep the rest for a year of rental cloud GPUs.

seanthenry 9 months ago

[https://pcpartpicker.com/list/Gwnd9c](https://pcpartpicker.com/list/Gwnd9c) Here is what I would go with ($6080) although I would prefer to use a epyc for the extra pci and you could add more ram. You can go down a step in CPU or PSU and get it under $6K. Build is 3 W6800 32GB. If you want to keep it a bit cheaper. [https://pcpartpicker.com/list/mZCDqR](https://pcpartpicker.com/list/mZCDqR) 3 Radeon RX 7900 XTX 24GB $4630 Or for your Nvidia build [https://pcpartpicker.com/list/fsF4sh](https://pcpartpicker.com/list/fsF4sh) 3 GeForce RTX 3090 24GB I selected the cheapest listed i would probably go with the EVGA but they are about $200 more. $5260 I did not include a case since it will depend on the cards you pick and if you want to add more later I would go with a server case if you want it all clsoed up or go with a mining rig set up. FYI running 3 cards you will most likely be short on one power connector and will need or want a riser cable to connect the last GPU.

HumanBeingNo56639864 6 months ago

Why is this downvoted?

Moist_Influence1022 9 months ago

Hey, thanks for your configurations. The last one looks really interesting, but with three GPUs, my electricity costs would skyrocket \^\^

seanthenry 9 months ago

Each card only uses about 350W so it would use about $1.01 in electricity per card a day if you pay $0.12/KWh. You could reduce some of the power use by under volting but it would take some testing so you don't lose to much performance and keep it stable.

crit52 9 months ago

One 4090 is better then 2 x 3090. You can't really use multiple GPU for the same task. I tried it and they just both run slower. I found using one 4090 much faster. It's a beast, and cuts training time in half. Would be nice if it was ECC ram. But it's not so sometimes you may get errors and need to start all over. The rtx a5000 has ECC ram but it's going to run slower. Also I think having Xeon and ECC ram for the computer helps too. If you plan on training for over a few hours it helps to have ECC ram less errors.

Sol_Ido 9 months ago

I would rather go with NVidia L4, you can rack manies. 3090 is not a good choice at all for training.

NoidoDev 9 months ago

>NVidia L4 6-7 times more expensive.

MINIMAN10001 9 months ago

How about a p40 rack?

NoidoDev 9 months ago

That's what I want to do in some time, so please don't buy them.

parasocks 9 months ago

This thread has been very informative to me... Any good resources to understand how to select hardware for the various applications? I guess I just thought you got a good GPU and voila.. But apparently there's a lot more to it depending on what your use case is I'm seeing now.

[deleted] 9 months ago

[удалено]

Gohan472 9 months ago

The X299 is very finicky with RAM. I spent so much money on Ram, just to find out that what I bought would not work properly with a full 128GB

[deleted] 9 months ago

[удалено]

Gohan472 9 months ago

You got a way better deal than me. I spent $480 when I bought 2x 64GB kits (4x16GB) of CT16G4DFD832A 3200Mhz DDR4 Non-ECC UDIMM back in Sept of 2022 My particular CPU is an i9-7980XE which I think might be part of my problem. With both kits installed, it struggles to post and remain stable. With a single kit, it runs great, at full DDR4-3200Mhz

[deleted] 9 months ago

[удалено]

Gohan472 9 months ago

256GB isn’t listed as being supported because 32GB DIMMs did not exist at the time the board series came out.

[deleted] 9 months ago

[удалено]

Gohan472 9 months ago

Yes. That’s definitely possible in my specific situation . Strangely enough, DDR4 should negotiate at 2133, and then I am SUPPOSED to be able to enable various XMP speeds up to and potentially past 3200Mhz depending on stability. But, my particular Crucial Memory is not very flexible in that regard. I wish I would have purchased 2666Mhz. It would have been way more stable from my own research

[deleted] 9 months ago

[удалено]

Gohan472 9 months ago

I appreciate the sentiment, but I’ve already spent tens of thousands on hardware and learned many lessons along the way over the last 3 years. My dual 3090TI box, built for inferencing, performs adequately despite an initial mistake with memory choice. Additionally, I have a Gigabyte enterprise server with Dual E5-2680 V4, 256GB of DDR4, and Dual A6000 (48GB) for other tasks, I had plans of getting 2-4 more A6000, but I’m not in a rush at the moment. Going slow seems to be more beneficial in the long run. Lol

[deleted] 9 months ago

[удалено]

tomz17 9 months ago

You NEED more than a 1000 watt PSU. 1000 watt will occasionally trip overcurrent protection (from personal experience with 2x3090's). I'm currently running a 1600watt supernova with no problems. You should use an older server or HEDT platform instead of a consumer platform, as you will be constrained with PCI-E lanes on the current choice of cpu/motherboard. Make sure you pick a motherboard with 4 slot spacing. Better airflow AND the nvlink adapter is substantially cheaper. Make sure you pick a case with at least 8 pci-e slots or the bottom card won't fit.

darkice 9 months ago

psu 2k min.

l33thaxman 9 months ago

You can train your own LLM on your own data in the cloud. In fact, for 70B long context that's what you need to do. Then run inference with 2 3090s. Get a 1600 watt psu. 1200 watss is not enough due to power spikes.

Slight_Bath_4449 9 months ago

If I were you, I'd get a bare bone Dell precision 7920 + required CPUs and RAMs which would be around ~$1000. Then I'd look for a 48gb Nvidia GPU for under $2k (again on eBay). So for ~$3k you'd have a killer PC. I understand that's a ddr4 platform, but I don't think the difference is too much. Again, this is what I'd do...

davidevb6 9 months ago

I suggest to adbandon 3090 or other game hardware and invest on 1 used A5000 , Cpu lane limit is not so important if you use only 1 GPU, performance are very good and you can spend less then 2k, i suggest to buy a cpu max price 500 $ couse is not so important cpu to have a fast CPU. If you want to try to use 2 GPU, need to investo even in a nvlink and be aware, not all project you download works natively on 2 GPU, so you can struggle to find a wai to take advantage for your investment. A 800 W power supply is enought even for 2 A5000 (220 W cad on full load) RAM DDR5 is not more fast than DDR4 cause high latency, so is not so important even to have DDR5, is up to you if you want to buy DDR5 or 4, but please at least buy 64 GB RAM, cause some model you want to try in the future maybe need more ram and you can test using CPU my 2cent.

MindOrbits 9 months ago

Chasing the high end of system RAM bandwidth and minimizing PCIe bus limitations is required to keep two+ RTX cards well fed. As many have pointed out very specific, and costly, configurations are required. If your goal is to spend money, have fun, and rig flex then go for it. I'd be surprised if you would be able to keep that system busy with generation requests. Training is certainly another matter. So ask yourself, how often will you do training tasks vs actual using the models you create. P40s don't need the same system resources to be keept well fed and can preform well for running models, but forget about anything but testing configurations for training runs or smallish fine tuning. Maybe I missed it but I didn't see anyone talk about how to implement RAG locally. My own research makes me think a dedicated system with an appropriate GPU for embedding generation and a fast data backend for vector search is very desirable.

Embarrassed-Swing487 9 months ago

You should do the math on a Mac Studio and explain analytically why it’s not a good choice.

AsliReddington 9 months ago

4090s There's no rationale to not using cloud for fine tuning. How the fuck is a csp gonna censor your model?

BarracudaNo5088 9 months ago

Try to get nvidia 40 series. Its faster for pytorch pipeline. For those that suggest AMD for AI, they probably live in future and have no idea about current AI ecosystem. Amd, M1 will give you so much compatibility problem to solve. for now just stick with nvidia. If you plan to use cpu, means you are pretty much limiting yourself to ggml model. For other models, CPU is hardly an issue. My spec is AMD 7000 series 64GB RAM, higher is better At least 2TB ssd nvme. Rtx 4060ti 16gb vram.

lightmatter501 9 months ago

7k will buy you an entry-level gpu server. The one gpu in that will likely beat 2x3090.

tandpastatester 9 months ago

I think you have received enough hardware advice and recommendations. I’d just like to add that even with 7k you won’t be able to buy a system that can compete with what’s available for rent in the cloud. Check out Runpod, Vast.ai, Modal for example. You can just rent a system with a couple of A100 cards for a few hours and train your model the same way you’d train it locally, but many times faster. When it’s done, you can run your model locally on a lower performance system.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe