Introducing kuku: kubernetes template tool

2018-09-27

At Gorgias we’re using k8s on gke to run all our production services. We run our REST API apps, RabbitMQ, Celery background workers, PostgreSQL and other smaller services on k8s. We also have staging & development k8s clusters where we experiment with different infrastructure setups and deployments.

If you have multiple k8s clusters to manage, chances are you also need a templating tool to customize your k8s manifests. By far the most popular one is helm these days. There is also ksonnet and more recently pulumi. All of these tools are powerful and solve real problems, but they are not quite right for us.

I can’t speak much about ksonnet and pulumi because I only briefly had a look at their APIs and how-to guides so take it with a grain of salt. However, as a user, I can speak about helm which is what we’ve been using at Gorgias for all our services.

Why not helm?

Well, there are a few things I find problematic with helm:

Poor templating language: requires constant referral to the docs, whitespace issues, yaml formatting is hard.
Server-side dependency: if you upgrade the server -> every user needs to update their client - waste of valuable time.
Lack of local validation: helm lint does not actually ensure the validity (Ex: required keys for a k8s object) of the manifest.
Chart names, releases, and other helm specific features do not fit with our current workflow.

At Gorgias, we’ve been using Helm to manage all our k8s resources, and it’s been great until we had to deal with more complex charts with lots of control flow and ranges. If you ever dealt with ranges in Helm and template you might know that it’s not easy to manage considering different contexts. For example template "name" . vs template "name" $ comes to mind.

So why not Ksonnet then?

Ksonnet improves the situation a bit with a stricter approach using the jsonnet language. When I say strict, I mean it doesn’t blindly render a text file into YAML as helm does, but uses a real programming language to render the yaml in the end.

My main issue with it is the language: jsonnet. It mostly has to do with the fact that it is yet another template language that I have to learn and deal with its different gotchas. A separate issue is that it introduces a whole set of new concepts such as Part, Prototype, Parameter, etc… I found that a bit too much when all I want is to render a bunch of YAML files with some variables.

Pulumi?

Pulumi approaches the most to what I would consider the ideal tool for us. It uses a programmatic approach where it connects directly to your cluster and creates the resources declared with code (TypeScript, Python, etc..). You write TS code, and you provision your infra with progress-bar. There is a lot to like about this approach. There are, however, a few things that I don’t like about Pulumi either: the primary language seems to be TypeScript at the moment - which I don’t want to use when it comes to infrastructure code. Python templates were in development when I wrote this post but I didn’t try them.

Pulumi also does infrastructure provisioning (multi-provider) - a la Terraform. I think this is overkill for what we need at Gorgias. We don’t have to use those features of course, but it seems like it tries to solve 2 different and complex problems at the same time. To put it plainly: it’s too much of a swiss army knife for us.

kuku: a simple templating tool

Finally, after searching for the right tool, I decided that I would write my own. kuku is very similar to Helm, but uses python files as templates instead of YAML. It also doesn’t have a server-side component.

Here are some of its goals:

Write python code to generate k8s manifests.

Python is a popular language with a vast ecosystem of dev-ops packages. Most importantly it’s easier to debug than some templating languages used today to generate k8s manifests.

No k8s server-side dependencies (i.e. tiller).

k8s already has a database for its current state (using etcd). We can connect directly to it (if needed) from the client to do our operations instead of relying on an extra server-side dependency.

Local validation of manifests.

Where possible do the validation locally using the official k8s python client.

Use standard tools.

Where possible use kubectl to apply changes to the k8s cluster instead of implementing a specific protocol. Again, this allows for easier maintenance and debugging for the end user.

More on Helm comparison

Compared to Helm there is no concept of charts, releases or dependencies. I found that we have rarely used any of those concepts and they just added extra complexity to our charts without much benefit.

Instead there are just 2 concepts that are similar to helm: values and templates.

Values come from the CLI or value files (same as Helm). Templates are just python files that have a template function.

Using kuku

Suppose you want to create a k8s service using a template where you define the service name, internalPort and externalPort.

To install: pip3 install kuku

Given the following service.py template:

from kubernetes import client

def template(context):
    return client.V1Service(
        api_version="v1",
        kind="Service",
        metadata=client.V1ObjectMeta(name=context["name"]),
        spec=client.V1ServiceSpec(
            type="NodePort",
            ports=[
                {"port": context["externalPort"], "targetPort": context["internalPort"]}
            ],
            selector={"app": context["name"]},
        ),
    )

You can now generate a yaml output from the above template using kuku by running:

$ ls .
service.py 
$ kuku render -s name=kuku-web,internalPort=80,externalPort=80 .

the above produces:

# Source: service.py
apiVersion: v1
kind: Service
metadata:
  name: kuku-web
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: kuku-web
  type: NodePort

You can also combine the above with kubectl apply -f - to create your service on k8s:

kuku render -s name=kuku-web,internalPort=80,externalPort=80 . | kubectl apply -f -

Same as above, but let’s make it shorter:

kuku apply -s name=kuku-web,internalPort=80,externalPort=80 .

Finally to delete it:

kuku delete -s name=kuku-web,internalPort=80,externalPort=80 .
# same as above
kuku render -s name=kuku-web,internalPort=80,externalPort=80 . | kubectl delete -f -

kuku templates

Let’s return to templates a bit because a few things are happening there. Templates are python files that are defining a function called template that accepts a dict argument context and returns a k8s object or a list of k8s objects. Simplest example:

def template(context):
    return V1Namespace(name=context['namespace'])  # example k8s object

You can create multiple template files each defining their own template function. kuku uses the k8s objects (aka models) from official kubernetes python client package. You can find them all here.

When writing kuku templates I highly recommend that you use an editor that is aware of the k8s python package above so you can get nice auto-completion of properties - it makes life some much easier as a result.

kuku command line interface

Similar to helm, kuku accepts defining it’s context variables from the CLI:

kuku render -s namespace=kuku .

-s namespace=kuku will be passed to the context argument in your template function. Run kuku -h to find out more.

A more realistic example

Defining services and a namespace is nice, but let’s see how it behaves with a more complex Postgres StatefulSet. Consider the following directory:

.
├── templates
│   ├── configmap.py
│   ├── service.py
│   └── statefulset.py
├── values-production.yaml
└── values-staging.yaml

We have some value files, a configmap, service (like before) and statefulset template. This postgres statefulset template is something similar to what we have currently in our production at Gorgias.

Let’s have a look at values-production.yaml:

name: pg # global name of our statefulset/service/configmap/etc..

image: postgres:latest

# optional
nodeSelector:
  cloud.google.com/gke-nodepool: pg

replicas: 1

resources:
  requests:
    memory: 10Gi
  limits:
    memory: 12Gi

pvc:
- name: data
  class: ssd
  size: 500Gi
  mountPath: /var/lib/postgresql/data/

configmap:
- name: postgresql.conf
  value: |
    max_connections = 500

Above we’re defining values that are used to declare that we want to run one instance of postgres:latest docker image on a specific k8s node pool while requesting some memory and a persistent volume. We’re also using a config map to define our postgresql.conf so it’s easier to keep track of its changes.

Keep in mind the above values and now let’s have a look at our statefuset.py template:

from kubernetes import client


def template(context):
    # volumes attached to our pod
    pod_spec_volumes = []

    # where those volumes are mounted in our container
    pod_spec_volume_mounts = []

    # persistent volume claims templates
    stateful_set_spec_volume_claim_templates = []

    # only set the claims if we have a PVC value
    for pvc in context.get("pvc"):
        stateful_set_spec_volume_claim_templates.append(
            client.V1PersistentVolumeClaim(
                metadata=client.V1ObjectMeta(
                    name=pvc["name"],
                    annotations={
                        "volume.beta.kubernetes.io/storage-class": pvc["class"]
                    },
                ),
                spec=client.V1PersistentVolumeClaimSpec(
                    access_modes=["ReadWriteOnce"],
                    resources=client.V1ResourceRequirements(
                        requests={"storage": pvc["size"]}
                    ),
                ),
            )
        )
        pod_spec_volume_mounts.append(
            client.V1VolumeMount(name=pvc["name"], mount_path=pvc["mountPath"])
        )

    # same for configmap
    if "configmap" in context:
        volume_name = "{}-config".format(context["name"])
        pod_spec_volumes.append(
            client.V1Volume(name=volume_name, config_map=context["name"])
        )
        pod_spec_volume_mounts.append(
            client.V1VolumeMount(name=volume_name, mount_path="/etc/postgresql/")
        )

    # command to check if postgres is live (used for probes below)
    pg_isready_exec = client.V1ExecAction(command=["gosu postgres pg_isready"])

    return client.V1StatefulSet(
        api_version="apps/v1beta1",
        kind="StatefulSet",
        metadata=client.V1ObjectMeta(name=context["name"]),
        spec=client.V1StatefulSetSpec(
            service_name=context["name"],
            replicas=context["replicas"],
            selector={"app": context["name"]},
            template=client.V1PodTemplateSpec(
                metadata=client.V1ObjectMeta(labels={"name": context["name"]}),
                spec=client.V1PodSpec(
                    containers=[
                        client.V1Container(
                            name="postgres",
                            image=context["image"],
                            lifecycle=client.V1Lifecycle(
                                pre_stop=client.V1Handler(
                                    _exec=client.V1ExecAction(
                                        command=[
                                            'gosu postgres pg_ctl -D "$PGDATA" -m fast -w stop'
                                        ]
                                    )
                                )
                            ),
                            liveness_probe=client.V1Probe(
                                _exec=pg_isready_exec,
                                initial_delay_seconds=120,
                                timeout_seconds=5,
                                failure_threshold=6,
                            ),
                            readiness_probe=client.V1Probe(
                                _exec=pg_isready_exec,
                                initial_delay_seconds=10,
                                timeout_seconds=5,
                                period_seconds=30,
                                failure_threshold=999,
                            ),
                            ports=[client.V1ContainerPort(container_port=5432)],
                            volume_mounts=pod_spec_volume_mounts,
                            resources=client.V1ResourceRequirements(
                                **context["resources"]
                            )
                            if "resources" in context
                            else None,
                        )
                    ],
                    volumes=pod_spec_volumes,
                    node_selector=context.get("nodeSelector"),
                ),
            ),
            volume_claim_templates=stateful_set_spec_volume_claim_templates,
        ),
    )

If you squint a bit you might see that the last return is similar to a yaml file, but it uses python objects instead with all of it’s IFs and for-loop, standard library, etc..

What I find better than a regular helm YAML template is that you can validate some of the input arguments of those python objects (Ex: client.V1Container) even before the template is sent to your k8s server - not to mention autocomplete.

Finally, this is how it all comes together:

kuku render -f values-production.yaml templates/ | kubectl apply -f -

The above renders all your templates and generates the yaml manifests that are then applied using kubectl apply.

You can find the source here: https://github.com/xarg/kuku/ And a couple of examples: https://github.com/xarg/kuku/tree/master/examples

Conclusion

We’ve started using kuku in Sept. 2018 at Gorgias, and we’ve since migrated all our Helm charts to kuku templates. It allowed us to customize our k8s deployment code to our needs much easier than before with minimal deployment surprises.

Hope you find kuku useful as we did. Happy hacking!

My very subjective future of humanity and strong* AI

2016-04-24 in philosophy

The fascination with AGI has been mainstream for a long time, but it started having more even more momentum in the recent years. Even hollywood has become less naive with movies like Her and Ex Machina.

On the R&D side there is of course Deep Learning which is a machine learning technique that uses neural networks with 1 hidden layer :P It has changed I believe forever the way people are doing research today. The hype is real because of the state of the art results achieved with it and the way the skills translate across different fields of ML. AlphaGo beats the best player in the world, translation and image/voice recognition is becoming better, artistic style stealing, attention models, etc.. The best part is that it’s more or less the same RNN with different neuron architectures, backprop and gradient decent that works with a broad range of problems. Now people are looking to for nails because they have a damn mighty hammer.

Of course hooking up a bunch of NVidia Pascals is not gonna give us AGI and the Moore’s law is not what it used to be. I could not agree more, but if we overcome the hardware issues (and I have high hopes that AR and VR is gonna push this) then it’s reasonable to assume that we’ll have the hardware to achieve at least weak AI soonish…

What about software? That maybe a bigger problem. But.. I’m also optimistic here with things like torch and recently tensorflow are given ton of attention from one of the best minds in the AI world today. What’s really cool about these frameworks is that they are used everyday in production on real products by startups and big corp alike. They are here to stay. It’s not enough, but I’m hopeful that things will improve.

Ok, so I want to say something that has been bugging me a long time, bare with me, I believe it’s important for the arguments that follow.

… is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can…

WikipediaArtificial General Intelligence

Now I have a problem with this definition because I would argue that in a cosmic sense we, the humans, haven’t achieved what I would call general intelligence. We’re kind of good at surviving in the Earth’s atmosphere. We can do many things that are amazing and not accessible to most animals, but we’re still bound to our environment. We’re still I would argue narrow in our intelligence and can only grasp a small fraction of what’s out there. There exists true AGI which is AIXI. It will seek to maximize its future reward in any computable environment (survive and expand), but there is this tiny little problem of requiring infinite memory and computing power in order for it to function. It’s useful just like the Turing machine is useful in the real world. For any intelligent agent to be practical, it’s required a favourable environment and a narrow specialisation for that environment. This is why I think that we’re really after is strongish AI which translates to being pretty cool in your neighbourhood.

The 100% software company

2015-07-21 in philosophy

If you know Stripe, Mailgun, or Zapier you might know what I’m talking about. They are all just a bunch of APIs. They are created to make running companies easier through automation. So we know that payments can be automated, billing, mail-delivery. But where is the limit?

What if there was a 100% software company that did client prospecting on it’s own, responded to clients on it’s own, resolved legal problems on it’s own and (blasphemy!) created a product on it’s own.

You get the picture.. everything on it’s own.

The people who I talked to about this said I was crazy (and that I want to destroy humanity). Here’s what they say:

There is no way to get the accounting right (in France!!?! Crazy!!! Jail time!).

How would you even begin designing a product for users, have interviews with them, etc.. you would need a Hard AI! You totally 100% require a human for this.

They are right of course, but.. given that there are so many amazing tools that allow us to automate so many parts of our business then what remains unorganised, unstructured?

What if you don’t need human level intelligence if you just have better structured information? At least to make a stupid simple product.

I’m now going to borrow something from my art friends and say that I’m proposing an Art project. Look.. this is just an experiment, a joke, a way to show that building a business has nothing to do with having a human brain.

Of course, what I’m not going to try to implement this Art project. What I’m really after is finding the remaining parts of a business that are difficult to automate and try to make it automatic. Isn’t this what we are looking for? Look at all those SaaS companies trying to remove the pains, scale and automate stuff that wasn’t automated before? And they are so cheap too! Where does this all going to lead?

borat meme

My prediction is that soon all we’re going to have is a bunch of Cronjobs and message brokers lousily connecting the different APIs together controlled by some reinforcement learning algorithm that looks to increase that Stripe balance. Think Zapier, but without you creating all the rules.

While this swarm like AI is probably not technically feasible at the moment I personally use it as a framework for thinking about the products.

What hole is this product filling in my 100% software company?

Btw, if you’re looking to improve your customer support through automation. Come check us out at Gorgias.

Near-future is for human-computer hybrids

2015-01-02 in philosophy

customer support oracle

Most of tech startups today try to be scrappy, to have many users and/or customers while keeping a small team. For some, this is the only way to survive and eventually become successful. This is possible because today’s technology is cheap, powerful and enables us to automate a lot of daily tasks that previously required many people.

The ideas behind this post are based on the premise that at least for the foreseeable future this trend is not going to change.

Automation and it’s limits

Software (SaaS or otherwise) companies are usually the first to embrace automation. Payments, customer communication (newsletters, drip e-mails, etc..), deployment, automated testing, statistical analysis, abtesting and other techniques enable such companies to stay small yet create and sometimes capture a lot of value. Taking a product from the “production line” and putting it into your customers’ hands is a big part of the software company advantage, but there still exist a few areas that are not fully automated.

The human monopoly on creativity

The ‘creative’ jobs are mostly human and even though there are a few bots that rehash news articles, there is still a long way towards bots that can write software, write a good blog post or create a good website layout. Despite being an interesting subject to discuss, I will try to focus on another part of non-automation: customer support.

Customer support. How startups do it?

old dude typing

Some companies, such as Google, provide only partial customer or no user support, but most startups today try to have a close relationship with their customers: they send non-automated e-mails to potential clients and the founders answer each customer individually. Doing unscalable things is not only normal, but strongly encouraged, at least in the beginning.
This rightly gives the customer the impression of being taken care of and appreciated, which historically doesn’t happen at big corp. Since the main focus of startups should be growth, this type of customer support is not very scalable. Meaning that the company has to hire people as their customer/user base grows.

The scaling problem

super-charged customer support

How can startups keep the customer support quality they used to offer at the beginning and yet still keep scaling up. The answer would be: automate more things. However automatically answering e-mails is a difficult problem and I believe that it can be ascribed to hard-AI problems

A simple example

Let’s imagine a theoretical scenario of a customer called Anna that sends an e-mail to support@gorgias.io

Hi, After the last update the keyboard completion functionality stopped working on Gmail. Can you help me out? Thanks!

Let’s see some of the steps that are needed to solve her problem:

I’m doing customer support so I read the e-mail and then try to reproduce the problem.
Let’s say I reproduced it.
I physically go to the developer and show her the issue (or describe it in an issue tracker).
Once I do that, I reply to Anna saying that I managed to reproduce the problem, apologise for the inconvenience and then wait for a fix.
Fortunately, it’s quickly fixed and the developer publishes an update and notifies me that it’s fixed.
I return to Anna and notify her that it should be fixed.
She replies that indeed it seems to work well now.

keyboard overload

There are a lot of steps (some of them might be missing here), not to take into account the hard work that is involved in finding the code and fixing the bug. A lot of information is not recorded between these steps and thus lost. Later on, it would be really hard to figure out, what the lifecycle of an issue is, just by looking at the exchanged messages between me and the customer and, internally, between me and the developer. I only we could build a really good AI, like in the movies, that could automate at least part of those steps.

The oracle

customer support oracle

We can imagine an agent that is aware of the internal working of a company, an oracle that knows each customer’s situation at any given time, that could even answer some easy questions to customers and demand clarification from its coworkers. Alas, this is something of a sci-fi domain for now.

In the near-future however I think it’s much more likely that we’re going to have semi-intelligent agents (insect-level intelligence) that would help us with the task of doing customer support. They could be displaying the relevant information at the right time and even writing some part of the answer for us. This would make the experience of doing customer support more like editing than writing.
It’s hard to say, what the future will be like and what the problems we might encounter are, but I hope we’re not going to fix it by throwing more man-power at these problems.

A primitive form of the above post that is pushing in that direction is an extension for Google Chrome we built to write faster messages on the web. You can use it with Gmail, Outlook.com, Yahoo mail and many other websites.

You can check it out (it’s free)!

How much energy is needed to write an e-mail?

2014-11-25 in gorgias-chrome

TL;DR;

In 2013 we sent 182.9B emails, consuming 26PJ (Peta Joules). Equivalent of yearly electricity consumed in Mongolia and 1kg of matter-energy.

Every time an e-mail is written, or any sort of human-computer interaction, some amount of energy is expended. Given that in 2013 an estimate of 182.9 billion e-mails written world wide an interesting question pops up:

How much energy is needed to write an e-mail?

By energy I mean how many Joules and by needed I mean needed by the human who writes the e-mail and the computer used to write the respective e-mail. For our calculations we’ll take an e-mail length of a 100 words. I don’t know the average e-mail size, but perhaps I should ask NSA, FSB or MSS some time to get a more accurate idea.

Sending an e-mail involves e-mail servers, routers, switches and other network devices which require energy in order to function, but given that the topology of the internet is rather hard to take into account, we’ll not bother with these details. However I suspect that the infrastructure itself that is required to send a single e-mail consumes far more energy that opening up your computer and sending the e-mail.

At this moment I’m not sure if it’s even possible to calculate this accurately, but let’s try anyway and please if you find any mistakes feel free to point them out.