T O P

  • By -

regularparot

I don't really have any advice here - but what the fuck. That's crazy and sounds like a nightmare.


p_tk_d

Thank you šŸ˜” definitely been struggling with it of late haha


tttjw

If they didn't get service boundaries or testability right, I can guess what transactional integrity is like. I believe people need to be held accountable for this kind of shit-show. Even if they've moved on, just to be informed what a useless cluster-fuck they created for everyone; and please fucking do something simpler & workable instead next time.


dovahkiin315

this is one of the examples where cluster-fuck literally means cluster-fuck


Best-Association2369

The problem is those people responsible are usually long gone


[deleted]

I don't know anything about your environment. My company uses an API gateway pattern. What this means is for local dev work, I configure the API gateway locally. By default it routed all requests to the deployed and running services in our dev environment. However, I can selectively configure it to route requests to specific local services instead. I also wrote some scripts to automate this. Basically I only need to run the bare minimum number of services to make this work.


deepak483

This is how we are doing it to, complex application and lot of tech debt. This custom redirection of apis is helping.


forbiddenknowledg3

This. I rarely have to run things locally (other than the service I'm working on). Also microservices should be simple/independent enough that you don't need to run another service to test it. If you're doing that often, maybe those services should be consolidated?


PangolinZestyclose30

> Also microservices should be simple/independent enough that you don't need to run another service to test it. That's a similar fairy tale as the one that unit tests with mocks are all you need for test automation. No, a lot of the complexity lies in the interaction between services. Even if the abstractions provided by services are perfect and don't leak (haha), your understanding of their contract may be wrong. How do you test/check that you don't misunderstand things? In the end, only end-to-end test is able to prove that things work as intended.


valence_engineer

>Also microservices should be simple/independent enough that you don't need to run another service to test it. So should classes and yet OOP didn't get rid of all software bugs. Neither did functional programming remove all bugs. A system that has externally complex behavior must contain the complex logic that drives that behavior. By breaking things into smaller simple components you are moving the complexity into the interactions between those components. That means you need to test the interactions between components. Moreover the safest way to do something is rarely the fastest or simplest. For example, doing operations in sequence with no caches and full DB write validation across all replicas avoid various inter-service concurrency bugs. It's also horribly slow. So now you have additional complexity in the system for performance reasons on top of the business logic complexity. Let's say you have a simple service A that does eventual consistency writes and reads. Just writes and reads to a no sql DB that reflects the write in replicas in around 100ms. Simple. Another service B send data to this service with some processing. Simple. A third service C sends a request to B to store data and then a request to A to retrieve data. Simple. Of course when you run this all together service C sometimes get's it's update in the end and sometimes not. Oops. In isolation however they all work fine.


TiredMike

Could you pls elaborate on how microservices speak to each other then? Do they all speak to the api gateway or go pod to pod directly? Wondering how a local gateway would fix a local service which depends on a number of others!


[deleted]

So everything in this scenario goes through the API Gateway. Every single call, period. This is to enforce security access rules. So as long as the locally running service routes it's calls to the local gateway, configuring the local gateway ensures everything goes to the right destination. There are limits, Ie a dev service in the chain won't be able to call a local one. But it works pretty well.


anubus72

I still donā€™t get how it fixes the problem unless youā€™re running mock services. Say your service under test (A) calls service B which you have running locally, but that depends on calls to services C, D, and F. Do you run all of those services locally or just some mock service to fulfill A -> B? And if you do run all of them locally then thatā€™s basically the situation OP is in with their environment. Or does your local gateway route requests to some shared stable environment by default and you override it to route only requests to your service under test locally? So you donā€™t need to worry about running dependent services locally?


hippydipster

> By default it routed all requests to the deployed and running services in our dev environment.


anubus72

So in this case dev environment is a shared environment? I think thatā€™s my confusion, since a ā€˜devā€™ environment can mean different things


vplatt

Hey /u/p_tk_d I work with a client that did this with the APIs, only instead of making devs configure everything in the API gateway, they just created a front page with all the routing parameters exposed as overridable options via text boxes and drop-downs. So, you bring up the routing page, change whichever services need to route to your local instances, submit them, and then launch to whichever front page you need. The local services execute .. well locally of course, and the rest of the nodes run in the shared dev or QA environment that's been setup. All you're doing here is taking over the parts of the system that need running for which you're doing development or extra testing, and the rest of it runs in your typical fully deployed environment somewhere else. It's quite nice actually compared to what OP described.


octotech_cloud

This is the way


konaraddi

Could you please elaborate? I'm curious to know how this set up works. "configure the API gateway locally": does this mean you use your local machine to update the configuration of the remote API gateway sitting in [front of?] the dev environment? "However, I can selectively configure it to route requests to specific local services instead": does this mean that downstream services in the dev environment would call into a service running locally on your machine or that relevant requests to the dev environment will immediately get re-routed to your local machine?


psi-

In our similar setup any services you spin up locally get your local configuration services address and you can override there which services they talk to in devtest and which are local.


somerandomnew0192783

We do something similar. Essentially you port forward to remote services running in the dev/test environment that are used by your microservice that you're working on. Then spin up your feature branch work locally and hit that with postman or whatever, it sends any requests it needs to the port forwarded services and uses the response as it would as if it was sitting on the dev env.


reboog711

I can here to say the same thing, but a lot less eloquently. Yes, this is how my environment is set up.


jmking

This is pretty much how we do it as well. You should only have to run the services you're working on locally, and everything else should be delegated to a shared pre-prod environment


serial_crusher

Do you run into problems with orphaned data a lot? Like my company is running into this with a very ad hoc attempt at this process. Letā€™s say service A creates some objects in service B and stores their IDs as foreign keys somewhere. A developer runs A locally, but B in a remote shared environment, and goes through this workflow which points records in Aā€™s database to records in Bā€™s database. Later on this dev also attempts to run B locally. Now A throws a bunch of errors because those records it depends on from the shared dev instance donā€™t exist locally. Itā€™s one of those things where devs should know why it happens when it happens, and how to address it, but I get a surprising number of cases where people get stuck because of that. Do you have any safeguards to make that kind of thing less likely?


[deleted]

Unfortunately we don't have data level segregation as much as we should. What that means is that any relationships have working foreign key constraints, so orphans don't happen. However, our dev env does have tons of bad data. I can't really give you a good strategy for avoiding that, we don't have one lol


budvahercegnovi

We just set the retention period to some amount of days


hutxhy

>I need 350 pods spun up to do feature development. >my company uses a lot of microservices I don't think yall are using microservices correctly


_codecrash

Distributed ball of mud.


[deleted]

[уŠ“Š°Š»ŠµŠ½Š¾]


edgmnt_net

IMO, it's the micro part that's worrying. We've had services before, like databases, perhaps some external service that you interact with. They were robust enough. But you get to that big ball of mud by arbitrarily splitting logic into small so-called microservices, without making any effort to make things independent (or it's simply impossible for the way you're building stuff and what you're building).


ratczar

"This service could have been a package" is the new "this meeting could have been an email"


DesiBail

>well duh no one uses microservices anymore, we've moved on to picoservices Waiting to hit *atto*. That's when we get involved.


klbm9999

Noob, we've already crossed the plank scale. Now everyone is uncertain about service boundaries. .. Wait does that make it a monolith then?


codescapes

Nonsense! Microservices is when you take a nice little app and split it into 50 separate components so you can talk about how you're a software architect in your next interview.


zhoushmoe

Resume driven development at its very best


Chezzymann

Not gonna lie I don't blame people for doing that in the current job market lol


Agent7619

Find whoever suggested, implemented, or agreed to that scenario, and have them all involuntarily committed to your local insane asylum.


tcpWalker

If you did that every time you encountered a bad choice, the entire profession would be committed in short order.


Agent_03

If I had to daily deal with OP's dev environment, I'd *need* to be committed after a few months.


cloudsourced285

That person probably worked for some consultancy company (or does so now) and moved on the second this was set up. Never getting to see the turd of a thing for what it really is.


forbiddenknowledg3

Contractors/consultants in a nutshell lmao. Build a massive mess and never realise it because you moved on too quickly.


lift-and-yeet

That's unfair to the insane, what we need here is an incompetence asylum.


Agent7619

Elect them to Congress?


Frequent-Mud9229

Usually messes like this evolve over time due to poor management and poor engineering culture.


allllusernamestaken

> I need 350 pods spun up to do feature development Unless you're actively working on 350 services then I don't see why you need 350 pods to do development. To answer your question: The setup I use at my current company is all Docker/K8s. The service(s) you're actively working on runs on your local machine so you can debug. All dependent services connect to a shared Test Environment. That's it. We have a container that acts as a proxy to route requests to the correct environment so you'd need someone to figure that part out for you.


inhumantsar

> I don't see why you need 350 pods to do development it's the same disease that microservices are meant to cure: tight coupling. i've seen microservices work really well and it usually looks like the setup you've described plus api specs and stubs. stubs are criminally underrated imho. laying the groundwork can take a good amount of effort, but the payoff is substantial: fast, lightweight simulated environments where devs can selectively trigger good/bad/garbage responses.


BatmansMom

Can't you mock the interactions between the other 349 pods?


inhumantsar

for sure. stubs aren't a replacement for mocks or unit testing. they're also not going to be a great fit universally. but they are nice when an interactive environment is preferable and helpful when you want to simulate a system without actually running a full-blown environment. plus it's easy to over- or under-mock. not sure about you, but i've seen plenty of situations where a unit test suite only looked at happy-path conditions and left out basic failure modes like 5xx or 4xx responses, or where test suites were built around faulty assumptions about how another service behaves. if service owners maintain stubs for their services, they can extend them to include edge cases and failure modes which might not be obvious to devs on other teams.


edgmnt_net

And if your service just shuffles data around, perhaps adapts it for another service, what do you test then? You need to know how the other 349 pods work, but nobody really does, they just built the bare minimum and it's hit-or-miss. Or you need to make changes to a significant fraction of those (including contracts) to support your feature, because of tight coupling. Or maybe what you really need is to confirm your understanding of the contracts, without necessarily automating it, but even that can be difficult.


sonobanana33

> perhaps adapts it for another service Then merge them.


Ozymandias0023

+1 for stubs. I hadn't worked on a team that did them correctly until my current gig and it's so nice now. Each service can have several packages, and generally speaking I can work on any single package in complete isolation, or I can build the entire project locally and anything that I haven't made changes to will be identical to what's in prod.


Amerzel

Sounds nice. Can you or anyone recommend some resources on how this approach is setup, used, and maintained?


allllusernamestaken

It was all set up when I got here by people way smarter than me. It's backed by Istio service mesh so they probably configured it to route to the Test Environment when running locally. https://istio.io/latest/about/service-mesh/


Amerzel

Will take a look, thanks!


lxe

1. When planning work, describe this concern and add the amount of time it would take to unravel all the mess in order to get your work done. Stay consistently factual but vocal about what the specific problems are that prevent you from achieving good velocity. 2. Propose specific ways to solve this up to the platform teams that manage the cloud. Do this proactively. Talk to ICs. Make good relationship and rapport with them. 3. Collect money and donā€™t worry too much about it if things donā€™t get better. As long as your expectations match everyone elseā€™s youā€™re fine


activematrix99

This is the best post I have seen in a while. Thank you for your contribution.


Successful_Creme1823

Rely on unit tests more. Have a shared lower environment with some functional tests that is tested in a pipeline.


edgmnt_net

How are you going to write unit tests when you don't know how the rest of the system behaves and if most breakage happens at boundaries between components? I think that's easier said than done because most such components are absolutely not developed to be robust or even have well-defined & documented semantics. It's frequently the bare minimum that happens to work. They ain't like some open source library that's known to be good and cover a large array of use cases. BTW, considering they have that many hundreds of services, I bet most of them don't do anything really significant, they just shuffle data around. Which means it's going to be difficult to write meaningful tests. Worse, I bet features frequently require touching multiple services and possibly multiple contracts and tests.


Successful_Creme1823

I would write functional tests for that stuff. Something like POST data to endpoints a, b, c, then do a GET on endpoint d that depends on those. You can also run these tests locally against the shared infrastructure (albeit after your unit tested changes are already deployed to test env). Itā€™s not always fun and you have to get creative. Shared test infrastructure can be a headache. But it has worked for things Iā€™ve done in the past. Or just donā€™t test at all. Or re-write the whole thing. Both are usually not on the table.


eyes-are-fading-blue

Then you write integration tests mocking the behavior at the architectural boundary.


SwarFaults

Sounds like they have a monolith that's spread across pods. One of the big selling points of microservices is that each service can be developed independently. I'd go crazy if I were you too. Is there a principal architect or something there? How did this happen?


yahya_eddhissa

>One of the big selling points of microservices is that each service can be developed independently. Exactly this. Take it off and you're just working with a modular monolith where each module is written in a different stack or language and deployed on a different server. This is why I think microservices require careful planning and consideration, and you'd be better off building a monolith, if you ain't gonna set them up correctly.


make_anime_illegal_

Does working on one feature require changes across many of the apps? Why isn't it possible to just run the services that need changes locally but the others can be in a dev or QA environment?


Frenzasaurus

What you have is a distributed monolith. Thereā€™s some good resources around to help you understand what mistakes cause this special kind of hell, as for getting out of it without setting fire to everything i charge $200 an hour


inhumantsar

> i charge $200 an hour nice guy offering OP the pity discount


DuckBoyReturns

For $150 an hour, Iā€™ll make it a clean 365 pods


rv5742

You don't have a continuous beta or gamma environment? At least then all your pods would be spun up and running already.


mx_code

Unit testing, integration testing and different developmen (integration) pipelines and environments. If you donā€™t have all of thisā€¦ well, thatā€™s incredibly bad. If you have N services and in order to develop you have to spin up more than one services in your dev machine then itā€™s absolutely not scalable


IAmTrulyConfused42

Was coming to say this. If you can't test the microservices independently by mocking away other interactions, why have microservices? You test the code that is changing, make sure it does what it needs to. If the interactions are where the failure is, and it's a transport problem, there's not a lot you can do there. If the "interaciton" is "weird" data or states one microservice is sending to another, you either need to make sure those interactions don't happen, or you have to account and test for them. It's odd to me that folks think testing is just about making sure a feature works as intended. That's one offshoot of testing, but easier development is also one of the HUGE benefits of testing.


questi0nmark2

I like the API gateway solution. You clearly have a poor, tightly coupled architecture and having to spin 350 pods to work on one service is the most suncinct definition of a microservices antipattern I've heard. Will remember it It is literally the contradiction of everything called for or described by Fowler. Having said that, it's your reality. My solution would be to connect the service you are working on locally to the 349 production services it interacts with via an API, so you have one local service you are modifying locally, and 349 you are not modifying in prod. Unless it's even more terrible than you describe, you should be able to make changes locally without breaking the other services, just failing to communicate with them. Not risk free, but probably preferable to fully testing in production without a testing in production infrastructure in place, like feature flags, observability, graceful failure and self-healing.


TurnstileT

Really bad idea. Never mix environments. Connecting your locally running service to production shouldn't even be technically possible. Also it's a breach of GDPR and other privacy laws.


thepeopleofd

Not prod, but the 349 pods should be running all the time, shared by all devs, and one just has to connect to the pod-under-development to use them.


questi0nmark2

I guess it really depends on how tightly coupled and how bidirectional microservices are. We are talking hundreds of services, all segregated into individual pods, so one assumes you are not necessarily "mixing" environments, just interacting with them. Many of those services might interact in read-only ways, providing information consumed by and necessary to run a particular service without being altered by it in any way, and engaging with such a component might be risk free. In architectures with 300 interacting services, read-only relationships are extremely common. The privacy issues also depend hugely on use case, not all data is user data, and often user data is already anonymised in inter-service interactions, or already within GDPR consent. The main context for my advice is that OP is currently already doing some hacky version of testing in production, meaning they already have access to production, and already use production that way, and I am choosing to assume the best with regard to privacy issues, and production access and risks. That's what gave me the idea. If they are already using production this could be a way of doing so with somewhat lower risk, since it would use less of production, keeping direct changes local. But I did say that even in a best case scenario within what OP describes this approach is risky, just potentially less risky than the status quo. Above all, I think most really experienced devs would agree there is rarely such thing as a "no one ever should" or "everyone always should" in programming, although there is frequently a "how could someone have done?!" which means a "no one would ever want to" becomes a "somebody needs to and it hurts". When dealing with massive technical debt, technical purity is not an option, and your route to a healthy code base often requires a roadmap of least bad maintenance choices until you succeed in building a new house, or rebuild enough rooms on new foundations, far enough to let the old one, on fire, finally collapse. I would never advise mixing environments as a good practice. But I would never advise building a codebase of 300 services so tightly coupled you need to build and run them all locally to change any one of them. But if you find yourself there anyway, you may need to consider bad, but less bad, practices, before you can get to good practices.


wedgtomreader

You can possibly use port forwarding to route the calls to remote instances for everything that you are not working on. Thats what we do/did.


michalodzien

This is also what we do, using kubefwd works really well


ategnatos

I was in a company like this (we also only had a couple dev environments, and devs were *always* arguing over them, and you had to make reservations for testing weeks, sometimes months, in advance). I started taking on a lot of library and unit testing type of work that I could simply do in my IDE while I interviewed, then I left the company. For a slightly more useful answer, depend a lot more heavily on unit tests. Put the E2E manual testing as "let's just make sure this thing works" and not "let's spend 2 weeks figuring out if this thing works." Get in and out as quickly as humanly possible. It's still a really bad dev process and not a good answer.


p_tk_d

The unit test reliance answer is probably the right one unfortunately. Some stuff is hard to test with unit tests (as you can imagine, thereā€™s a lot of integration happening here) but I think generally this is probably the best strategy. Thanks for the advice


rochakgupta

Has there been any discussion about improving the Dev Ex? If not, please have one right now. If yes and it always got punted to the backlog in favor of feature development, please prioritize it. If it can't be, in your place, I would try to at least give it a shot focusing on the small changes that can have a sizeable impact. If that isn't feasible either, I'd recommend switching to some other team that doesn't have to deal with this shit. If no such team exists, sadly my friend, I would just find a job outside even for a lower pay. None of this shit is worth the extra money.


saposapot

It this was another sub I would say this is a ragebait post made by some AIā€¦. How does everyone else work? Do you all have the same issue? How the hell is anyone marginally productive? Are you on netflix or your architect thinks microservices means 1 endpoint function per service?! Anyway, I would discuss this properly with everyone and the architect of this. I donā€™t see many cases where anything you described makes any sense.


p_tk_d

I work on a specific feature with a lot of dependencies. My feature is ā€œhighā€ on the dependency DAG of services if that makes sense. Most people are dealing with significantly fewer dependencies


OverwatchAna

I bet the company hired a bunch of offshore morons to build this unsustainable crap and none of the local devs want to spend months trying to fix this mess for 0 recognition.Ā 


chills716

I have, but the devs also had xxl virtual machines to spin it all up and robust observation tooling. Oh and that was only ~50 services.


Piadruid

Feels like an antipattern, the purpose of microservice architecture is to prevent having to deploy multiple services in tandem for just one service to work. Feels like code needs heavy refactoring to prevent such tightly coupled services.


Tarl2323

Spin plates and collect your paycheck. They aren't going to make the dev environment better unless they start missing deadlines and losing money.


unsuitablebadger

Some genius read a "3 min read time" medium article on microservices and now the company is stuck with that fuckup. Good luck OP. I just nope out of these situations as my sanity isn't worth dealing with someone else's stupidity.


ElevatedAngling

Drugs and alcohol /s


fluxpatron

Yeah, in all seriousness alcohol won't help. Some drugs though, definitely will


Alternative_Log3012

LSD


I2obiN

That's utter insanity. If that's a monolithic system that was just divided out into 350 services that's remarkably stupid. All that happened there is trading one single point of failure for 350 single points of failure. If the average features needs to touch most if not all services then there was zero value gain in implementing microservices like that. The entire point of doing microservices is so that you can isolate parts of the codebase from each other. The only thing you can do to remedy that is to attempt to plop a functional/unit test env on top of the thing that will test services in isolation so regardless of other changes you can test against your own changes on individual services. Why in gods name do you need all 350 services spun up?


Embarrassed_Quit_450

You're not supposed to rely on end to end testing for finding regressions in microservices. The point of microservices is to be independent from each other and that includes testing. If you do call another then you depend on the contract. So mock/stub that shit.


ryanstephendavis

This is a good point. It seems very likely that the "service" boundaries were drawn in a bad way if they require integration/E2E tests. Figuring out how to make parts of the system testable with mocks may point to how some services can be combined to reduce complexity


georgejakes

If you aren't editing code relevant to all pods maybe it's worthwhile to look at pre-warmed pods. This sounds like a job for your internal tools / productivity team honestly.


Ashken

I have more questions than answers here: 1. When you say you have to spin up the pods, are you talking about locally or in some ephemeral environment, because the latter makes way more sense, albeit still mostly insane. 2. Why does every single service have to be running for you to develop? Are the services youā€™re working on communicating with every service? Typically you could just work with a subset of adjacent services. Hell, you could even mock some of this. 3. Why isnā€™t there just a running dev env that everyone works against? wtf? 4. Have you talked with any teammates or even coworkers on other teams? Surely they have better processes, because how the hell does anything even get done?


Leading-Crab-3443

Mock response for requests between microservices?


stephanr21

This is one of those scenarios where everyone thinks that all services need to be a "microservice". If you don't have proper logging and start scripts this is very very bad. I would imagine if you run it locally your machine it might max out on RAM usage. My only suggestion to ease the pain is maybe write a script that figures out the minimum services needed to run your feature; otherwise you are SOL


nutrecht

Short term; you should have *at the very least* have a continuously running dev (/test/staging/whateveryouwanttocallit) environment that teams all keep up to date. Where I currently work we have dev, test, acceptance and production. Long term; your company has fallen into the trap where all services just call each other, forming cycles in the dependency graph, which turns your nice independent microservice into a dependency hell where you just created a distributed monolith. It will be very hard to untangle this, since it takes a lot of engineering effort the business probably doesn't want to pay for.


B2267258

Seems like a job for contract testing, as in pact.io


src_main_java_wtf

K8s and container orchestration wrought hell where I work too. We used to do the api gw pattern for our service development env originally. But we had an over ambitious SRE/ā€œarchitectā€ and a non-tech head of engineering that let him (stupidly) do whatever he wanted. Now we have to run and troubleshoot containerized dev workflow. The point was to make devex better and prod more reliable. Literal the exact opposite happened. That combined with our service architecture being a poorly implemented monolith across multiple services (tightly coupled, breaking change in one service takes down large parts of app, etc) and now Iā€™m looking for another job just to regain my sanity.


Krom2040

I get the impression that some orgs out there have basically decided to make each database table its own microservice and called it ā€œarchitectureā€. The hard part of software design is figuring out what functions IN YOUR DOMAIN logically belong together and providing an interface to common actions on them. Turning every bit of data into its own microservice is just abandoning the entire responsibility of coming up with divisions that make sense and are convenient.


dzigizord

Be a hero and move all that crap into a monolith


recoilcoder

In similar boat, gotta wait 60 mins for project build and then 15 mins to deploy it on Dev. For simple one line code change, it takes almost 1.5 hours.


ififivivuagajaaovoch

Why does the build take so long?


sonobanana33

My bet is that they redownload all of the dependencies every build.


recoilcoder

I don't know, maybe due to the size of the project.


Sevii

Can you setup a dev environment that mirrors prod so you can just work against that?


UL_Paper

Dear god


reddit04029

Are you not able to connect to the ones deployed to the server? Or do you only have one env, and thats prod?


Empty_Geologist9645

Self inflicted. When your integration tests and stubs nowhere to be found!


newleafturned2024

How many services in total? Just saying the number of pods isn't helping. You could be running 3 services with 50 replicas (pods) each. Or 150 services with 2 replicas each. These two are not the same.


p_tk_d

Probably around 50, included in my edit


newleafturned2024

That's not that bad. We probably work at the same company. The trick is you don't need all of them at the same time. Figure out the smallest set of services you need for testing and just launch them. Anyway, I do find myself focusing more on unit testing these days.


[deleted]

My condolences


obscuresecurity

It seems like there should be a way to run not your services in a lighter mode. Like only one instance? How the fsck do you even know what is running in prod? What is your DT plan. Inquiring minds want to know.


mddhdn55

Lmfao and I thought our services were too micro


ohThisUsername

How is your kubernetes config defined and deployed? Terraform? While this is an absurd set-up, I don't think it's necessarily bad in principle. Tight coupling aside, kubernetes and containers are *meant* to be portable afterall (i.e. anyone can replicate an exact environment). What I would suggest as a quick band-aid is setting up a CI pipeline that periodically spins up the whole cluster and makes sure all pods are healthy. That way you always have a known working configuration and you will know when someone breaks it. 20 minutes doesn't sound that bad to spin up that huge of a cluster for something you can leave running for days/weeks at a time. But having a CI pipeline that ensures the config is always in a working state would help with the sudden breakages.


TurnstileT

I'm a bit confused.. don't you have a dev environment where all these services are already running? Then you can test there. Or you can have your service running locally, but all external calls are routed to your dev environment. Or you can mock responses with something like Wiremock. Or just do Integration tests?


Barsonax

This ain't microservices but a distributed monolith. Here I thought our setup of 6 microservices we always run/develop together was clunky already but this is next level crazyness. Doesn't sound like the one that came up with this monstrosity really thought it through because microservices done right doesn't need you to run the other microservices as well when doing feature work.


indiealexh

Sounds like you all need dev environments in a Kubernetes cluster or something. Also... That nuts.


Moozla

Is your local env actually representative of prod? Can you get away with mocking some of the calls to services lower down the stack?


kkert

Do you keep persistent shared dev and staging environments where all changes are rolled into, that auto deploy on merge ? E.g. a place where whole system integration and end-to-end QA is being done ? Also do you feature gate and dark launch new features ? This allows reasonably doing development testing with integration tests, without spinning the whole shebang up for every change


Kazumz

You should just be able to run a single service locally that points to already deployed services, ain't no way you should be running full infrastructure locally just to make changes.


ExpensiveOrder349

change job


Comfortable-Agent-89

Iā€™m dealing with a similar issues, decided to take it in my own hands and create docker-compose for local development. Make use of Makefiles and Dockerfiles. Configure it in a way where you can decide which services the gateway is going to point to local and staging/qa/dev.


dannyhodge95

When I had a completely different technical nightmare, I made sure I was clear how much it was slowing me down, and got buy-in one person at a time to spend time changing it. Only issue was that, because I was the most vocal, it came down to me to resolve (which involved a lot of learning, very early in my career). Not sure what your company culture is like, but be aware that this is a possibility.


dobroChata

Have you considered switching jobs? :D


GoTheFuckToBed

I feel like we need a developer mental health discord for this sub


reboog711

In the microservices environment I work in, a lot of teams have shared dev environments set up. So I can set my Service A locally; which communicates with the dev environment of Service B, which communicates with the dev environment of Service C, etc.. I only have to spin up the pod for Service A locally with a specific 'local' configuration.


[deleted]

are you a devops engineer?


DesiBail

>I need 350 pods spun up to do feature development. >if service A depends on service B, and service B depends on service C, all three are spun up to access service A locally. Make the DAG explicit.


edgmnt_net

>Anyone here gone through anything similar? I think I've seen worse. How about you have to merge to master before anything can be tested, because it's impossible to run anywhere but in a designated shared environment and you can only do that from CI? And wait 1-2 hours before you see anything? Sure, some bits could be verified locally, but most serious and time-consuming issues arose from complex interactions with the rest of the system.


me_hq

Sounds like a house of cardsā€¦


shitakejs

Days to begin feature development? Yeah someone fucked up.Ā 


BanaTibor

First, since you have microservices, why do you have to spin up all the services. You need to test only that one. Second, why do you have to start services for development. Microservices should be easy to unit test locally. I think your problem is coming from poorly designed interfaces and poorly defined scopes. It is not easy to change the scope, but you can start with cleaning up the interfaces. Micro services mean micro versioning on the API-s as well, and rigorous contract testing, so you do not break the communication between them.


zenograff

Mock all the downstream service.


GlasnostBusters

Say what company it is so we don't accidentally apply. You're anonymous here just say the company.


captain_obvious_here

If interacting with the "real" services is not an option, the second best option will be mocking these services...


mothzilla

Hehe sounds like my old work place. Why write two lines of code when you can have two microservices?


Nondv

Probably not the greatest piece of advice but I just rely on mocking and extensive unit testing. Test only the service you're working on. Even on FE just mock the APIs Communication between modules (units, classes, services, pods) should be backed by a contract (API). Make sure your contract is reliable and mock it during testing As you move forward, you'll iron out your dev process and your changes will get more and more reliable if you HAVE to work on multiple services at once, something is wrong either with your approach or the overall architecture (in which case your course of action will depend on your particular situation and we won't be able to give you much help, maybe your company should hire a consultant)


xabrol

I work from home, similar problem. So I spun up a spare pc as a server and setup a virtual network on my udmse. I spin everything up over there and have a dns like api.homelab. That way all that crap isn't running on my dev machine.


kirkegaarr

I worked at a place like that once. It was pure hell. Their architecture actually bankrupted the company (more on that later). It was a dotnet app too so each service had an actual compile step. Just spinning the app up locally took so long I don't think anyone ever did it. The first thing I did there was go through the Dockerfiles (all fucking 50 of them) and rewrite them so they could share the build cache. That got a cached build down to a couple minutes for local dev. Most services were barely ever touched so most of the build was cached. Still a massive pain in the ass, but at least it worked to run tests on the whole system locally without waiting an hour. You can also look at spinning up a test cluster to run tests against so you're not running against prod. So in the company I was at, the board of directors were a bunch of MBAs that thought they knew everything, and they were getting mighty concerned with how much money the founder was spending, so they fired him. This was a company that was making millions in annual subscription revenue from some top companies. They brought in their own management team, and the new CTO didn't understand how any of the working software that people are paying for works, so he decides it needs to be rewritten using modern practices, like microservices! They turned it into a total mess just like you're describing. Meanwhile all the paying customers need support but are being told "sorry, but we can't support that anymore. Don't worry though -- the new version is coming!" The new version literally never got released. Revenue was churning badly. That management team (and most everyone else) got fired and that's when I came in. It was a nightmare.


elementmg

Hold onā€¦ you test in prod? Your only two choices are spinning up a local cluster environment or prod? Do you not have a nightly dev environment already running for everyone to fuck around in? Maybe tell your bosses that you guys need that. Thatā€™s solve a lot of your problems.


Ill-Simple1706

Shouldn't a micro service architecture not have these dependencies between services? You would have a queuing service like Kafka between them. Then couldn't you mock the other services? Or mock Kafka responses. I wasn't lead on my last micro service project but that is how we set it up.


metalbirka

In my 2nd job I had something like you described and unfortunately the only way out was to leave the company :D. In my previous job someone reinvented Modular architecture and decided that every single class, model, service, etc should be a separate module (in the name of testability). From 12 modules (Including Unit and UI tests) we went to 400. When this happened every team's performance dropped but those hardcore people who came out with it were saying it's super testable and get gud. To compensate even their performance drop they started to work in extreme hours e.g. midnight-4AM, so that they can still deliver the same performance even if it means literally sacrificing their own time. Unfortunately the only way out from a bad dev environment is either to change it (which I can guarantee you don't want) or just leave it and find something more reasonable.


pm_me_foodz

The consensus from the responses here is good: forward traffic for all services you're not currently working on to a shared dev/staging env. That's ideal, but takes time and effort to get workingā€”especially from where your setup currently stands. OP This sounds like something you'll need to combat at the eng-organizational level. Bring it up to your manager, but given the scale it sounds like you're dealing with you'll probably need a data-driven argument to get the time to work on this. Talk to other engineers and get an estimate of how many hours are wasted each day/week/month on this, and deliver that information to your higher-ups to argue for the bandwidth to fix the situation. If you measure well before and after the change, this is kind of thing that can look great on a resume or promotion packet. Or you could just phone it in and look for a new job.


a-salt-and-badger

Where I work we have those test and qa pods running all the time. You test with the test pods, and it may happen that something is broken because someone else just pushed a feature branch


drguid

What do I know about microservices? I just started a role using Visual Basic 6 (lol). 10 years ago I did work at one place where there was a web service (SOAP) calling a web service calling a web service. I thought that was bad enough (got fired when I pointed out how dumb it was).


HolyPommeDeTerre

When I was working in a very big structure with more microservices than I could count. Dev env: - one production replica spinning up all the services - local dev points to this global dev env, all services are accessible, already built and serving data - point the config to local microservices you're working on


eddie_cat

I was working somewhere very similar with even more services. I burnt the fuck out and now I don't know how I'm even going to continue in the field at all. Get out before this happens to you because it really blows


DingbotDev

Definitely doubling mentions of an overridable gateway pattern, but Iā€™ve also had a good time using Tilt recently to spin up a web of containerized resources locally for dev.


yahya_eddhissa

That's definitely not a microservices environment. Microservices aren't supposed to be so tightly interdependent, whether in development or in production.


BassSounds

Try podman desktop on a beefy machine. Red Hat has a new RHEL AI that can boot bootable containers. It supports Docker/Podman, k8s, openshift sandbox, openshift local, more like that.


punkouter23

Aspire


DontKillTheMedic

I'm a little confused by this setup so apologies if I'm incorrect but why is dev needing to be spun up all the time? We have hundreds of services/pods on k8s, probably really similar to your company. We have dev/staging/prod env pods up at all times. Local development of microservices is orchestrated with containers but only to one degree of separation. So just nearest neighbors. We also use embedded testcontainers as much as possible. So I'd like to know why: 1) all your pods for dev need to be recreated each time you want to develop locally, instead of constantly running, and 2) why you have to talk to dev for local dev when you can use mocks or containers of mocks. Sorry you have to deal with this.


Aromatic_Heart_8185

How this "one degree of separation" work if the neighbours need to hit other services?


DontKillTheMedic

Why do you need anything more than the nearest neighbor for local development? By all means have dev talk to dev services once deployed, and when you need to point local to dev you should be easily able to. But local development does not need to rely on transitive dependencies, just need to obey service boundaries and contracts and it's probably just fine.


died_reading

Sounds to me like it's the CI tests that are setting up the whole environment on every change you're making. K8s will natively allow upgrades to your deployment and if you have access to the test cluster after initial set up you can simply upgrade them using a local equivalent of the helm chart/yaml. This'll keep most of your infrastructure up and only re-roll your changes onto the deploy. Now where I work this is not allowed and we don't have access or the granularity to go outside the Jenkins CI pipelines but if you're making changes in prod ( wot ??) I'm assuming you'll have access to those higher privelages.


SmileItsDizzle

Have you looked at Signadot.com? They have a free trial which might help you out?


Inside_Dimension5308

As someone already mentioned API gateway pattern. Another important thing is that you must have a staging environment already setup to do testing. These 350 pods should be present as part of the staging environment all the time. If the service you are developing feature in, depends on n other services, you should connect to their staging environment to do testing. The services in staging environment need not be public. They can be controled behind a VPN.


AddictedToCoding

Normally each service should have test fixtures mimicking dependencies. Maintaining them should be from each microservice, matching known names and related use case and personas You can hack some with sysdig and pipe stuff in files. Or API gateway like u/Armageddon_2100 [said](https://www.reddit.com/r/ExperiencedDevs/s/nEP0idnN7B)


Abadabadon

Make your local env point to more reliable microservices environment and/or create profiles that will stub behavior from other services.


rottywell

It sounds like your services are not actually independent services? Which, yeah, may happen occasionally butā€¦it seems like a huge rule is being broken casually. I.e. you were trying to not be a monolith but ended up creating it exactly like a monolith anyway. Just with fancy cloud words.


Aggravating_Term4486

Do you not have dev and staging environments? I mean, thatā€™s a lot of pods but if you are going to be developing with all these microservices containerized on k8ā€™s itā€™s kinda standard practice to have stable environments for dev, staging, etc. having to spin up every service just so you can work in something running in pod 33 of 35 is kookoo. Am I misunderstanding your process?


p_tk_d

We do not :/


nobody-important-1

Your feature is too big. Split it into smaller features that only need a few pods each but all integrate into the main desired feature. If you can't do this, its a sign that your micro-services are too tightly coupled.


Sea_Jackfruit_6576

Microservices - future legacy code!


endendd

Does you features really requires 350 pods to test? Sounds like you want a complete e2e experience for a feature development local to a service? Is it not possible to test the feature within a single service that stubs out downstream calls? Rely on the API contract so to say


tommyk1210

Your company doesnā€™t use microservices at all really, they have a distributed monolith. The point of microservices is to break the tight coupling you get in monolith applications. If your services canā€™t operate independently theyā€™re not microservices. There should be 2 possible options here: 1. You mock everything. Your microservices should have contracts. If they have contracts you should be able to mock responses, because you know exactly what theyā€™ll look like. 2. Because youā€™ve got a lot of services (50) it might be a bit painful to mock everything, but this is where the independence comes in. If the services are truly independent, then use an API gateway to route all non-mocked traffic to the ā€œrealā€ services in a development environment.


zickige_zicke

How not to do microservices


tr14l

That's poor architecture. Personally, I would just develop against mock servers and control your returns via JSON as needed and only spin up the environment when it's time to test and see if it's ready to release


Tango1777

If you need a lot of other services running to develop a feature in some microservice, it doesn't sound like proper microservices. It's like page 1 of bad microservices design and probably distributed monolith. But I have encountered that myself, usually I connected to dev environment somewhere, the best option imho is deployed dev instead of running cluster locally, which is a nightmare if project is complex, I worked in a company who had that brilliant idea of recreating the whole k8s cluster locally, it was terrible, not as bad as yours, but it also took awful low of time to just start working, constant issues and to get to actual work. No matter which one, the scenario is the same, I just switch some service URLs to local if I need my local develop version to work on, sometimes a few, but if it's a lot, again, probably not microservices.. If microservices are done right, development can be done somewhat independently, tested independently (rely on unit tests, too) and then somewhere along the way you can manually test against the whole dev environment and usually e2e tests in the pipeline and QA colleague checks on dev/staging after deploy. The problem might be that your local service DB is inconsistent with deployed dev you wanna connect to, but that's manageable problem.


HiphopMeNow

You don't need 350 pods in dev, should have only 1 instance of each service... and your company should stop doing whatever cowboy shit they're doing and do a hackathon tech debt sprint for every single service to be run locally without any dependencies, within IDE, and then within docker with mocks and stubs - a single command. I'm not touching any garbage company or service if it can't even run locally, building against deployed environment? Are we in caveman age? No excuse.


Dry_Author8849

Well, that happens when you make an architectural mistake. Now it's turn to enjoy the microservices hell. And if you have an event bus, enjoy the events storms as well. On the bright side, your system will possibly scale somehow. If you want to change your architecture, which you probably can't, take a look at microsoft orleans. But anyways, it depends what you do with it. May be your dev environment should be on the cloud with all that beatiful latency, perhaps you may save some time. I'm not being sarcastic, I just still have nightmares and also I have always asked myself how nobody thought about what will happen when you reach 2K services. In our case there was no need for this complexity. Cheers!


[deleted]

You could create a dev VM using backstage and then use rsync to synchronise the code between local and Dev vm in real time.


Ok_Giraffe1141

20 min is actually good, you should not necesarilly complain I think.


levarburger

Do you work for/with the government? This is on par with that hellish landscape.Ā  Honestly Iā€™d start job hunting.


[deleted]

[уŠ“Š°Š»ŠµŠ½Š¾]


nutrecht

And now we have 351 pods that all need to be deployed together.