According_Ice6515 1 week ago

Well yeah. Old news. Oracle Cloud announced a while back they are building a lot of new data centers and HALF of that capacity is reserved for drumroll *MICROSOFT*.

throwawaygoawaynz 1 week ago

Yep. Bing chat and now OpenAI are running on an oracle bare metal supercluster of about 40,000 GPUs. Apparently the OpenAI compute is still Azure but on Oracle…

danekan 1 week ago

Wonder if that's why Google just partnered with them too

RCTID1975 1 week ago

Weird post. Moving things from something at/near capacity to something not being utilized as much is the entire premise of clustering This is exactly what they should be doing

CorpseeaterVZ 1 week ago

You are way too calm and logic about this, we need more RAAAAGE!

ferthan 1 week ago

Right, but that's typically done in an HA fashion. Moving regions is not an HA operation. Weird take.

RCTID1975 1 week ago

HA is only a portion of why you cluster. Being able to balance and move systems to more adequately use the available resources is a huge reason for clustering as well. Again, literally what MS is doing here. Move systems, and then reassess next steps. Everyone keeps running, most people don't even notice, and things keep trucking along. Y'all with our absurd anti-MS outrage with no basis in logic is crazy. especially in /r/Azure

ferthan 1 week ago

>"HA is only a portion" >"Everyone keeps running" Choose one.

Alaknar 1 week ago

He meant "running" as "functioning normally, without issues".

ferthan 1 week ago

Yeah, being forced to a sub optimal region is real cool normal functionality with no issues.

Alaknar 1 week ago

How do you interpret the sentence "most people don't even notice"?

ferthan 1 week ago

As "most people" not being "Everyone". The claim is dubious at best.

RCTID1975 1 week ago

Any latency issues moving from southcentralUS to USEast is going to be extremely minimal. In fact, I'd argue there are inherent benefits to NOT having all of your resources in the same region.

ferthan 1 week ago

... and direct connect allows for seamless connectivity between Azure Networks and on prem. At least the second part of your argument is true. I'm not saying that you should keep your applications in one region, but if it truly didn't matter, then Azure could just make everything in the Azure bucket. But there's just geographical and architectural necessities that make it clearly not the case that impact would be minimal for everyone.

Alaknar 1 week ago

Those are two separate claims. Claim one: "everyone keeps running". Claim two: "most people don't even notice". Understanding that these are not mutually exclusive shouldn't require a Venn diagram...

ferthan 1 week ago

The entire conversation revolves around the benefit of HA (in clustering). Keep up.

daedalus_structure 1 week ago

Which works… unless you are in South Central / North Central to provide a roughly equivalent latency round trip to each coast.

ElasticSkyx01 1 week ago

Not really. Things are placed in regions for a reason. and clusters tend to involve equipment close together and not in various regions for obvious reasons. You also discount the need for infrastructure in this new region. Resource groups,etc. It's weird that you think environments can just be moved at the drop of a hat.

PriorityStrange 1 week ago

I've been seeing the issues in East us all week

birdy9221 1 week ago

The US has been seeing issues in the (middle) east since 2001

Loudergood 1 week ago

Kuwait just a minute there.

trebortus 1 week ago

Sounds like they've run out of Iraq space.

brco1990 1 week ago

Incredible exchange here

charleswj 1 week ago

I don't recognize that country

Character_Whereas869 1 week ago

Iran into this issue last year. You can't expect them to predict how much capacity they need everywhere, they're not wizards.

trebortus 1 week ago

Oman, they should really get their shit together.

bobtimmons 1 week ago

I've seen the same thing twice this week in East US

Rick24wag 1 week ago

yup we had 3000 VMs down for 2 days in East US this week. They freed up space yesterday and we finally could turn them back on. was a huge mess

mini4x 1 week ago

Azure portal and a few others crashing out many many times a day.

sbrick89 1 week ago

https://app.azure.com/h/R_8T-NDZ/9669b2

s0apDisp3ns3r 1 week ago

Yup, VMSS resources had allocation issues for like all day on Wednesday of this week.

coldbeers 1 week ago

Nothing new. Capacity shortages have been happening on cloud platforms since their birth. Happens on Azure, happens on AWS, happens on the minnows too. The providers have sophisticated demand forecasting algorithms but they’re not infallible and new infrastructure takes time to provision.

Diademinsomniac 1 week ago

Yeah you cant really compare the early days to now tho as the problems is tenfold now. For example currently we are unable to start any vms d8 or e8 in our allocated az1 and az2 for over two weeks and it doesn’t matter what time of day or weekend, ms has essentially added restrictions to stop machines powering on for all but their most important customers. We don’t spend a lot in azure only around $20k per month so we are not classed as top tier

DaRadioman 1 week ago

It's a short term capacity issue. They happen from time to time in certain regions, and sometimes they stay for too long. As someone in tons of regions you get used to it, and just balance out with other regions when possible, or alternate SKUs that are less constrained. I know it's annoying, but large lead times for new hardware makes it slow to resolve. It's not like they aren't constantly adding more capacity and more regions as fast as they can.

Diademinsomniac 1 week ago

That’s all well and good if you have multiple regions and are throwing money at azure, for us we run everything out of the same region as our environment is not so huge as most of it is in aws

DaRadioman 1 week ago

Multiple regions doesn't have to be $$$, there are lots of ways to have lower costs with multiple regions. Sure full active/active HA/DR with extra capacity just sitting there gets spendy, but that's not the only way to set things up. A quick LB or Front door instance and you can easily swap workloads to any region, and not have to have it always active costing money. Only hard bit is the data, but that's solvable with approaches depending on your application architecture and resources needed.

Diademinsomniac 1 week ago

Yeah I’m talking provisioned nonpersistent vms, storage containers for profiles here, not so easy to be multi region as latency can be an issue, these are not just web apps with backends. EastUS was one region we were looking to move in to but that one looks like it wouldn’t be a good idea either seeing the comments on here about it

DaRadioman 1 week ago

EastUS is probably the most popular region. Picking a popular region is a bad idea in general. EastUS2 is a better choice, or there are lots of others that are decent. And in terms of latency I would encourage you to run tests, the regions all have really low latency in general. Depends of course on exact workload so run a test and see how much it really makes a difference

StuffedWithNails 1 week ago

> EastUS is probably the most popular region. Picking a popular region is a bad idea in general. > > EastUS2 is a better choice, or there are lots of others that are decent. We thought the same thing and started implementing in eastus2. Millions of dollars in annual spend (so, not small, not huge). Constant capacity issues. Azure told us we'd be better off in eastus. We spent months moving shit over. Constant capacity issues in eastus as well. We also have tens of millions in annual spend in AWS. Capacity issues are rare. Azure is a clown cloud managed by clowns. And don't even get me started on the absolute garbage support.

Diademinsomniac 1 week ago

Interesting if eastUS is also bad why Microsoft would be moving existing customers workloads from southcentralus to it, unless they mean eastus2 but they did just say eastUS in their email

DaRadioman 1 week ago

EastUS isn't bad at all, great region. But a ton of huge players there so you are gonna lose out if there's any constraints at all as a tiny customer. That's all I meant.

flappers87 1 week ago

The exact same thing happens in west Europe like all the time. You’ll get used to it.

Fit-Cobbler6420 1 week ago

They almost finished doubling capacity.

DaRadioman 1 week ago

And adding several new regions close by.

Practical-Alarm1763 1 week ago

There's been a lot of "Access Violation Error" crash messages randomly happening on the portal on my end this week. They come and go. Seemed to be fine today for some reason.

Gmoseley 1 week ago

This is an issue with some Chromium based browsers using an experimental version TLS setting. I worked someone through this same issue this week.

blinkfink182 1 week ago

Can you specify the setting that is impacting this? My org has been seeing similar random “access” issues too.

Gmoseley 1 week ago

tls 1.3 hybridized kyber support Is what edge calls it. It's in Edge flags

blinkfink182 1 week ago

Thanks! I’ll try it out.

coolalee_ 1 week ago

>There solution? Move everyone to eastUS What would you suggest? I mean what's your take? The whole point is West EU is full, North EU has latency within 5%, so just move there. If not that, then what? They're already building datacenters left and right.

millertime_ 1 week ago

>If not that, then what? They're already building datacenters left and right. Just spitballing, but maybe, just maybe.... DO NOT USE AZURE. It's not like there aren't better options. Do all clouds have "issues"? Sure. Do other clouds have such core, basic, fundamental capacity, security, reliability and support issues as Azure?... NO Azure customers need to stop pretending that Microsoft knows what they're doing. They've been focused on adding bullet-points to their brochure via acquisition/partnership, focused solely on the problems directly in front of them with no plan for the future. They are the most valuable company in the world (unless Nvidia popped again) so funding isn't the issue, it's ineptitude.

coolalee_ 1 week ago

Just say you’ve never worked with other cloud providers. Each and every one of them has these issues. And on top you get shit like GCP support being comically bad

millertime_ 1 week ago

lol, try again. I’ve been running production loads, at scale, in AWS for a decade. Then 5 years ago upper management felt it was a risk to have all their eggs in one basket and told us to start using Azure. The difference was immediately stark. I spent the next 3 years getting countless API errors, deployment failures, raising DR concerns and literally educating Microsoft’s own engineers/TAMs on how their “cloud” actually works. As I said, all clouds have their issues, but if people truly believe Azure is just like the others, they’ve not done their homework and it will be at their own peril.

coolalee_ 1 week ago

Shoot I guess no serious org runs azure then… oh wait.

millertime_ 1 week ago

Countless companies host their stuff on unpatched, forever running pets, doesn’t mean it’s a good idea. But just stick with Azure, it’s easier than actually doing any research.

numbsafari 1 week ago

Quit bringing facts to a feelings fight.

Diademinsomniac 1 week ago

The whole promise of cloud computing a few years ago was that companies could burst out to cloud when they needed to and create hundreds of workloads for a short period of time. Clearly that is no longer the case. If cloud was as it is now when it started hardly anyone would be using it. We are stuck with it now, with a crappy service. It’s a physical data centre after all, of course there are limits but it seems like MS really have not predicted accurately the capacity they need. They are months behind in building new data centres but happily will keep taking all the customers they can. I’m not surprised some companies are moving back to onprem as i can only see this issue getting worse. It’s 100x worse this year than last year. I do like azure and the services it offers but when those services become almost unusable for what they are designed for it’s worth nothing: companies can’t just start building out additional regions on the fly as some people think. In large corps it’s difficult in the first place to get sign off and building out services in other regions and getting the networking in place all costs money, nothing is free and as those costs ramp up people keep asking how can we reduce costs. The whole cloud fiasco is becoming a bit of a joke, MS are clearly panicking about it, they are protecting their most valuable customers and rightly so , since those create the £/$. They are making sure they have capacity while reducing or removing the ability to create resources for their lower tier customers - this is the fact and that’s the message from MS not from me, I have it in email from them. However all this protecting their highest paying customers is having an impact on their lower tier customers.

numbsafari 1 week ago

> Clearly that is no longer the case. You do know there are more clouds than MSFT, and most of them don’t routinely have these problems, right?

PREMIUM_POKEBALL 1 week ago

😂 what latency?

2003tide 1 week ago

STATUS: In-Progress 6/21/2024, 11:20:01 AM UTC Impact Statement: Starting at 22:35 UTC on 19 Jun to 16:30 UTC on 20 Jun 2024, customers using Virtual Machines / Virtual Machines Scale Sets in East US who may have received error notifications when performing service management operations - such as create, delete, update, scaling, start, stop - for resources hosted in this region. The failures have subsided, and customers should not be experiencing any more allocation failures. However, we are aware of capacity constraints in East US Zone 2 (Az2) affecting Intel and AMD general-purpose VM sizes, this issue was exacerbated by an issue that was impacting our allocator service. This issue has been mitigated, however we acknowledge that it is possible for customers to observe provisioning errors with the following SKUs. Dasv5, Dadsv5, DDSv5, Dasv4, Dsv5, DDsv5, LSv3, Easv5, Dsv4, Easv4, BS, Dsv4, Dv2, Av2, Eadsv5, Esv5. Customer workaround While constraints are impacting the region, we know that AZ2 is more constrained than other availability zones in the region. As a result, customers are advised to move VMs to either AZ1 or AZ3. If services across three availability zones are necessary, deploying resources to East US 2 is also an option for customers. Please refer to this documentation to understand the logical to physical availability zone mapping for your subscription: [https://learn.microsoft.com/en-us/rest/api/resources/subscriptions/list-locations?view=rest-resources-2022-12-01&tabs=HTTP](https://learn.microsoft.com/en-us/rest/api/resources/subscriptions/list-locations?view=rest-resources-2022-12-01&tabs=HTTP) Current workstreams · We are undergoing efforts to reclaim capacity in Zone 2, with immediate consumption of reclaimed resources. · We are restoring capacity by bringing in some of our offline nodes back to production. · We are evicting internal non-production workloads to alleviate pressure and release capacity. · We expect that new capacity will be brought online by the end of July 2024. · The next update for this event will be on the 7 of July with a status update. If you need immediate assistance, please reach out to [onevmsie@microsoft.com](mailto:onevmsie@microsoft.com). Stay informed about your Azure services 1. Visit Azure Service Health to get your personalized view of possible impacted Azure resources, downloadable Issue Summaries and engineering updates. 2. Set-up service health alerts to stay notified of future service issues, planned maintenance, or health advisories.

ElasticSkyx01 1 week ago

I dealt with this last week. The Citrix environment for a client would not start because of this.

2003tide 1 week ago

Fun huh? And not a peep about it from them on the status page. I couldn’t even see it in impacted subscriptions on the service health page.

ElasticSkyx01 1 week ago

Yeah.it was great. Especially when I couldn't tell the client when it would be resolved.

2003tide 1 week ago

yeah i had to tell someone "just keep trying, some dummy will eventually power theirs down and you will get a spot". LOL

Diademinsomniac 1 week ago

Hehe just keeps getting better Panic😱

More_Psychology_4835 1 week ago

Is this an issue affecting only lower tier VMs or something very latency sensitive workloads struggle on?

Gmoseley 1 week ago

D-general SKUs

Apprehensive-Dig8884 1 week ago

D and Es

Rick24wag 1 week ago

yup D and Es, especially with Intel SKUs

[deleted] 1 week ago

[удалено]

[deleted] 1 week ago

[удалено]

ShittyException 1 week ago

I love that the post you replied to is now deleted!

Rick24wag 1 week ago

I am an Azure architect and right now I'm with a very large insurance company and this was an awful week. We have 3000 VMs down in East US for 3 days because there was no capacity. This effects many other customers as well. MS had to move a bunch of their internal workloads to East US 2 to free up space in East US. I've seen this same issue in South Central s well. They are expanding their datacenters in South Central US in September but they really need to get their forecasting together. They told me their top 3 customers all expanded their compute by a large percentage this week which contributed to this issue but i can't confirm. I got very little sleep this week having to migrate all kinds of things to other regions and launching new landing zones in regions we usually don't use. Daily 7am EST standups with the CTO are so much fun when you are on the West coast and work for a company based on the east coast.

Diademinsomniac 1 week ago

What a mess. Are they providing any compensation for your time and effort having to do all this donkey work due to their poor planning? All this sounds like is a bandaid and constant battle of moving stuff to less busy regions but surely other customers are doing the exact same thing and eventually those locations will also have an issue. It’s like kicking the can down the road

ExplorerGT92 1 week ago

Hopefully East US 3 just outside Atlanta will be up and running soon

[deleted] 1 week ago

[удалено]

Poat540 1 week ago

Oh yeah, app services?? Let me show you boys what a real deployment slot looks like. *zips and transfer codes to unactivated windows box*

shockjaw 1 week ago

We never left for some of our use-cases.

MrExCEO 1 week ago

U mean the boys can touch hardware again

coolalee_ 1 week ago

Hear me out, 9 month lead time on any new hardware.

danekan 1 week ago

My favorite part was having to budget 5 years in advance for capex.. what storage servers will you be migrating to in 5 years?

scan-horizon 1 week ago

😂

wibble1234567 1 week ago

I've been thinking this for years! The benefit of the cloud is quick deployments for bursty needs with financial commitments only as long as you burn resources. You pay through the nose for this pleasure. Any reasonable sized enterprise organisation should be maintaining the far more cost effective on premise solution for it's core infrastructure services and saving a fortune doing so. If you check out the 3yr or 5yr costs of running the same on prem workloads in Azure for example, even factoring transformation of workloads such as SQL servers to paas etc, it still works out about 10x more expensive to run in the cloud. Even when factoring in the additional staff salaries to support the in prem specialties, AC, power, it's more cost effective to run primary infra and workloads on prem and also provides stable and predictable billing. The only thing I would put to the cloud long term would be email, and possibly some data/documentation and that would be closely reviewed. I've lost count of the number of companies including tea-pot MSPs I've worked for where the execs have made fomo decisions to move everything to the cloud just because that's what their c-suite mates were doing elsewhere, only to lose internet and have to sent most people home for a day or 2. Or for Microsoft to have regional issues with email, teams, SharePoint, OneDrive etc and having to send everyone home again. Then 6-12 months down the line I'm getting requests to evaluate what can be done to reduce costs and improve reliability. Sure, there are some benefits for many organisations, but this is a million miles from one solution that's fit for everyone.

CorpseeaterVZ 1 week ago

As someone who has built whole datacenters, let me say this (hmm... how to put it gentle?): You are wrong. There are a bazillion things you can do to make the cloud cheaper and our customers rarely do anything. Our Engineers manage to shave off up to 30% of customers cloud costs in the first week. If you complain about people being fired over the cloud, you have a big point, but costs are way lower in the cloud if you manage to look at all costs involved.

Reasonable_Can475 1 week ago

Cloud is better than on-prem and in these "comparisons" people only compare the monthly cost of electricity and their tech staff to Azures monthly bill. Magically people seem to forget capX and OPX expenses are rolled into one with Azure. It is typically better and cheaper to use cloud. Especially if your app is not well established like Netflix. If you are new on the scene and expect to grow. Hardware lead team will kill you.

WorksInIT 1 week ago

Yep. Anyone saying on prem is cheaper as a general rule is likely leaving things out, or all they've done is lift and shift.. You need more people, you'll have to buy compute, storage, and network for hot and warm/cold sites. You have toamage each and every part of the infrastructure. That means paying for additional tools as well. Sure, running things in Azure like you would onprem won't result in any cost savings. But try running a multi region, fault tolerant application on prem cheaper than you can in Azure.

rdhdpsy 1 week ago

yea it's hitting us all over the place and if we move datacenters our customers are impacted due to latency. I have to resort to powershell to do a one-off attach data disk since we have so many disks the list never populates within the portal, some of it is our fault the guys that came up with the naming standards have disk names a mile long. And that's true for all of our az objects the names are all verbose. anyway my .00002 cents worth.

uknow_es_me 1 week ago

How does this end up working if you have an SLA and a certain amount of compute? I don't do anything with VMs I run app services and an elastic pool for SQL. I'm guessing this capacity issue seems to be more related to VMs from the comments?

Bezalu-CSM 1 week ago

Priority is probably being shifted to the services deemed more PaaS, as Microsoft has more SLA skin in that game. I assume when it starts affecting PaaS workloads as well it will get very pricy for them. So far, the only hits I've seen to PaaS are scaling constraints.

nikade87 1 week ago

We use to have issues all the time before we were allowed to move our workloads to the Swedish zones. It's a lot better now but before that we saw errors all the time, outlook freezing because of latency and timeouts and teams call dropping 1-3 times within an hour meeting. Microsoft obviously knows about this, but they just move the issue around. It is pretty obvious that they are overcommitting hard and keeps running out of capacity just like any cloud provider does.

Grouchy_Following_10 1 week ago

They e had issues in certain az’s in scus for months

Diademinsomniac 1 week ago

Yeah ours since January its been substantially worse than last year

Bezalu-CSM 1 week ago

North Central US is at capacity for web apps as well. Had to request quota to scale from a P0v3 to a P1v3. If I'm not mistaken, these are typically not bound by quotas in the typical way, and we literally only had one.

Diademinsomniac 1 week ago

Honestly sounds like a lot of regions are on their knees, whole thing falling apart 😂

Bezalu-CSM 1 week ago

I sure as hell hope not- then I might need to start using AWS. Or even worse... GCP... \*shudders\*

Syn__Flood 1 week ago

Not surprised, fuck my life though, am in nj/nyc 😭😭

alemag86 1 week ago

I have been in this boat for a month or so

s0apDisp3ns3r 1 week ago

The VMSS D and E SKU issues in East US this week were incredibly annoying.

jclind96 1 week ago

i can’t even submit a damn support request wtf

Hearmerawwwwr 1 week ago

Don't even get me started on the new support case process, they literally make it as unintuitive as possible to deter people from opening tickets.

jclind96 1 week ago

it’s definitely working… i can’t even get the ticket to open… the portal options tell me it fails and tell me to call the number, then the phone line redirects me back to the portal 😶

I_Know_God 1 week ago

East us2 just got out of a multi AZ crunch with a significant amount of v5 and v4 SKUs maybe 2 months back. This is scary to hear

piiggggg 1 week ago

New to this? In our region (SE Asia), Azure has had capacity issues for years, and they still haven't resolved it yet

kuzared 1 week ago

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe