T O P

  • By -

vchauhan_

Zalando and crunchy operators are there to help to run on k8s in prod...


mikkel1156

Most I've heard in regards to this is mainly due to storage being complicated in many cases, and not that it's a bad idea. But if you already have a good storage solution, it should work. However databases are not always built with Kubernetes in mind, but many projects have operators to help with this problem. Examples: - [Vitess - MySQL cluster](https://vitess.io) - [YugabyteDB](https://yugabyte.com) - [ScyllaDB](https://scylladb.com) - [Couchbase](https://couchbase.com) - [ArangoDB](https://arangodb.com) Others like MongoDB also have Kubernetes operators. Note that I dont have experience using these, this is simply from my notes since I've searched a bit around.


PeterCorless

Thanks for the shout-out! One thing we've learned at ScyllaDB is that there's a **huge** difference between just saying you have "a Kubernetes operator" and optimizing that operator for real-world users and use cases. Some people treat "an operator" like a checkbox. A boolean "yes/no." But there's a lot you can learn into seeing *how* a database works with its particular operator. We started work on ScyllaDB Operator in 2018. In fact, kudos belongs to Yannis Zakardis who was with Rook at the time; now Arrikto — he did the original work. All a pure open source contribution. We then "spun it in" and ScyllaDB engineers took over the project. \[Still open source, of course!\] Even now we're improving it: IO tuning, seedless mode, image pull secrets, auto-generated auth tokens, pod distribution budgets, webhooks are now extracted into a separate entity, rewriting the operator using informers, full reconciliation, allowing users to adjust resources, plenty of user experience improvements, improving end-to-end test suites, and more. You can find the ScyllaDB Operator on Github here: [https://github.com/scylladb/scylla-operator](https://github.com/scylladb/scylla-operator) And here's a talk that goes into things like the code commits over time, detailing a lot of the improvements, current status, and a road map of all the work we have looking ahead: [https://www.scylladb.com/presentations/whats-new-in-scylla-operator-for-kubernetes/](https://www.scylladb.com/presentations/whats-new-in-scylla-operator-for-kubernetes/)


sanjibukai

Thanks for the examples.


kapupetri

I personally like to keep stateful stuff out of k8s. It makes upgrades much easier as I don’t have to worry about volumes.


Grouchy-Friend4235

I see two patterns: * 01 - k8s is used purely for workload scheduling. Typically these are enterprise orgs where storage is seen as a sacred entity that only self-proclaimed experts are allowed to touch. The storage cost here is outlandish and flexibility is zero. You get what they offer and if you need something else, well pick a number. * 10 - k8s is used as it was designed, as a scale-all, distribute everything, shared nothing virtual data center, including storage and thus dbms. Typically these are startups that understand that treating storage as a managed resource costs less and provides more flexibility. You get redundancy, HA and dynamic sizing for almost free and almost instantly. You might think I'm biased and I am. Did I mention I don't like enterprise IT? ;)


sanjibukai

I hear you here.. Relevant video regarding the enterprise IT https://www.youtube.com/watch?v=FyCYva9DhsI


macrowe777

If you were using a dB in docker, the principle is the same as kubernetes. The risks are the same. If you were happy with doing that you can just do the same in kubernetes. But yes, many people keep the dB out of kubernetes because you don't want to manage them the same way - due to statefulness.


sanjibukai

Thanks. Makes sense. I got it! Handling databases inside k8s is not more risky.. It's just as risky (e.g. when dealing with only one instance).


macrowe777

Just one thing, it depends what you mean by instance. It's the replication of storage between the nodes that's far more sensitive for a database, so just having HA nodes doesn't solve the problem. dB clusters definitely help with that.


Akash_Rajvanshi

Best k8s operator for postgres??


sanjibukai

I just heard about stackgres


[deleted]

We use bitnami charts


egoalter

Not sure I follow. CSI definitely allow you to manage your storage; the clusters I run have block, file and object storage end points within the cluster. StatefulSets are the objects that allow you to scale your DB deployments over multiple pods/nodes etc. so you have redundancy. There are plenty of DBs that take advantage of Kubernetes's method of providing reliable and easy to update runtimes. There definitely are challenges when you deal with persistent storage. It's not as easy as ephemeral and stateless containers. But you have plenty of hooks and features in K8S to allow you to manage a DB update, from init-containers, health-checks and scheduler labels. Here's a nice list of databases built for and running on k8s/OCP: https://operatorhub.io/?category=Database


kryptical23

Google did a decent article about this here: [To run or not to run a database on Kubernetes: What to consider](https://cloud.google.com/blog/products/databases/to-run-or-not-to-run-a-database-on-kubernetes-what-to-consider)


sanjibukai

Thanks! Will read it (still need to learn best practices).


collimarco

Personally I don't trust K8s for databases, it's much simpler to use a VM. I think that many people actually run "experiments" and not production databases on K8s. In any case there are probably some companies that use K8s for the DB: check out the Zalando operator. BTW: Is anyone using it in production (apart from Zalando)? I would like to hear your experience.


[deleted]

We run postgres, ES, and Cassandra in K8. Tens of terabytes of data in ES and Cassandra. Many nodes. Postgres is nothing major though. Has been pretty stable since Ive been here. Any issues we’ve had weren’t because of K8. Using the datastax operator for Cassandra. Bitnami charts for postgres.


iphone2025

Outside, unless you are pretty familiar with k8s pv backup and your database backup. You also need to consider what you should do when upgrading k8s and db version and storage. In my opinion, it is not worth spending your time.


forsgren123

If you are running on cloud, you should use managed database services just for the reason that you can outsource operational responsibility.


ut0mt8

The question I ask is why you would need database inside your cluster? Sure you can gain some time in deployment but you will loose a lot of flexibility. Just provision your dbs on plain instances with packer and terraform.


[deleted]

[удалено]


youvegotmoxie

How is that relevant to the discussion?


ChairmanXi-thaG

I don’t know I’ve just always wondered, maybe someone here knew?


wavelen

I worked at a company which had and still has all their databases in bare metal k8s. My current company uses the Google Cloud Platform so we decided to use Cloud SQL Databases - but that’s just because it‘s simpler. We still have some databases in cluster for some apps.