PostgreSQL backup with pghoard & Kubernetes

TLDR: https://github.com/xarg/pghoard-k8s

This is a small tutorial on how to do incremental backups using pghoard for your PostgreSQL (I assume you’re running everything in Kubernetes). This is intended to help people to get started faster and not waste time finding the right dependencies, etc..

pghoard is a PostgreSQL backup daemon that incrementally backups your files on a object storage (S3, Google Cloud Storage, etc..).
For this tutorial what we’re trying to achieve is to upload our PostgreSQL to S3.

First, let’s create our docker image (we’re using the alpine:3.4 image cause it’s small):

FROM alpine:3.4

ENV REPLICA_USER "replica"
ENV REPLICA_PASSWORD "replica"

RUN apk add --no-cache \
    bash \
    build-base \        
    python3 \
    python3-dev \
    ca-certificates \
    postgresql \
    postgresql-dev \
    libffi-dev \
    snappy-dev
RUN python3 -m ensurepip && \
    rm -r /usr/lib/python*/ensurepip && \
    pip3 install --upgrade pip setuptools && \
    rm -r /root/.cache && \
    pip3 install boto pghoard 


COPY pghoard.json /pghoard.json.template
COPY pghoard.sh /

CMD /pghoard.sh

REPLICA_USER and REPLICA_PASSWORD env vars will be replaced later in your Kubernetes conf by whatever your config is in production, I use those values to test locally using docker-compose.

The config pghoard.json which tells where to get your data from and where to upload it and how:

{
    "backup_location": "/data",
    "backup_sites": {
        "default": {
            "active_backup_mode": "pg_receivexlog",
            "basebackup_count": 2,
            "basebackup_interval_hours": 24,
            "nodes": [
                {
                    "host": "YOUR-PG-HOST",
                    "port": 5432,
                    "user": "replica",
                    "password": "replica",
                    "application_name": "pghoard"
                }
            ],
            "object_storage": {
                "aws_access_key_id": "REPLACE",
                "aws_secret_access_key": "REPLACE",
                "bucket_name": "REPLACE",
                "region": "us-east-1",
                "storage_type": "s3"
            },
            "pg_bin_directory": "/usr/bin"
        }
    },
    "http_address": "127.0.0.1",
    "http_port": 16000,
    "log_level": "INFO",
    "syslog": false,
    "syslog_address": "/dev/log",
    "syslog_facility": "local2"
}

Obviously replace the values above with your own. And read pghoard docs for more config explanation.

Note: Make sure you have enough space in your /data; use a Google Persistent Volume if you DB is very big.

Launch script which does 2 things:

  1. Replaces our ENV variables with the right username and password for our replication (make sure you have enough connections for your replica user)
  2. Launches the pghoard daemon.
#!/usr/bin/env bash

set -e

if [ -n "$TESTING" ]; then
    echo "Not running backup when testing"
    exit 0
fi

cat /pghoard.json.template | sed "s/\"password\": \"replica\"/\"password\": \"${REPLICA_PASSWORD}\"/" | sed "s/\"user\": \"replica\"/\"password\": \"${REPLICA_USER}\"/" > /pghoard.json
pghoard --config /pghoard.json

Once you build and upload your image to gcr.io you’ll need a replication controller to start your pghoard daemon pod:

apiVersion: v1
kind: ReplicationController
metadata:
  name: pghoard
spec:
  replicas: 1
  selector:
    app: pghoard
  template:
    metadata:
      labels:
        app: pghoard
    spec:
        containers:
        - name: pghoard
          env:
            - name: REPLICA_USER
              value: "replicant"
            - name: REPLICA_PASSWORD
              value: "The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over. But it can't. Not with out your help. But you're not helping."
          image: gcr.io/your-project/pghoard:latest

The reason I use a replication controller is because I want the pod to restart if it fails, if a simple pod is used it will stay dead and you’ll not have backups.

Future to do:

  • Monitoring (are you backups actually done? if not, do you receive a notification?)
  • Stats collection.
  • Encryption of backups locally and then uploaded to the cloud (this is supported by pghoard).

Hope it helps, stay safe and sleep well at night.

Again, repo with the above: https://github.com/xarg/pghoard-k8s

My very subjective future of humanity and strong* AI

The fascination with AGI has been mainstream for a long time, but it started having more even more momentum in the recent years. Even hollywood has become less naive with movies like Her and Ex Machina.

On the R&D side there is of course Deep Learning which is a machine learning technique that uses neural networks with 1 hidden layer :P It has changed I believe forever the way people are doing research today. The hype is real because of the state of the art results achieved with it and the way the skills translate across different fields of ML. AlphaGo beats the best player in the world, translation and image/voice recognition is becoming better, artistic style stealing, attention models, etc.. The best part is that it’s more or less the same RNN with different neuron architectures, backprop and gradient decent that works with a broad range of problems. Now people are looking to for nails because they have a damn mighty hammer.

Of course hooking up a bunch of NVidia Pascals is not gonna give us AGI and the Moore’s law is not what it used to be. I could not agree more, but if we overcome the hardware issues (and I have high hopes that AR and VR is gonna push this) then it’s reasonable to assume that we’ll have the hardware to achieve at least weak AI soonish…

What about software? That maybe a bigger problem. But.. I’m also optimistic here with things like torch and recently tensorflow are given ton of attention from one of the best minds in the AI world today. What’s really cool about these frameworks is that they are used everyday in production on real products by startups and big corp alike. They are here to stay. It’s not enough, but I’m hopeful that things will improve.

Ok, so I want to say something that has been bugging me a long time, bare with me, I believe it’s important for the arguments that follow.

… is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can…

Now I have a problem with this definition because I would argue that in a cosmic sense we, the humans, haven’t achieved what I would call general intelligence. We’re kind of good at surviving in the Earth’s atmosphere. We can do many things that are amazing and not accessible to most animals, but we’re still bound to our environment. We’re still I would argue narrow in our intelligence and can only grasp a small fraction of what’s out there.
There exists true AGI which is AIXI. It will seek to maximize its future reward in any computable environment (survive and expand), but there is this tiny little problem of requiring infinite memory and computing power in order for it to function. It’s useful just like the Turing machine is useful in the real world.
For any intelligent agent to be practical, it’s required a favourable environment and a narrow specialisation for that environment. This is why I think that we’re really after is strongish AI which translates to being pretty cool in your neighbourhood.

Read more...

Running Flask & Celery with Kubernetes

At Gorgias we recently switched our flask & celery apps from Google Cloud VMs provisioned with Fabric to using docker with kubernetes (k8s). This is a post about our experience doing this.

Note: I’m assuming that you’re somewhat familiar with Docker.

Docker structure

The killer feature of Docker for us is that it allows us to make layered binary images of our app. What this means is that you can start with a minimal base image, then make a python image on top of that, then an app image on top of the python one, etc..

Here’s the hierarchy of our docker images:

  • gorgias/pgbouncer
  • gorgias/rabbitmq
  • gorgias/nginx - extends gorgias/base and installs NGINX
  • gorgias/python3 - Installs pip, python3.5 - yes, using it in production.
    • gorgias/app - This installs all the system dependencies: libpq, libxml, etc.. and then does pip install -r requirements.txt
      • gorgias/web - this sets up uWSGI and runs our flask app
      • gorgias/worker - Celery worker

Piece of advice: If you used to run your app using supervisord before I would advise to avoid the temptation to do the same with docker, just let your container crash and let your Kubernetes/Swarm/Mesos handle it.

Now we can run the above images using: docker-compose, docker-swarm, k8s, Mesos, etc…

Read more...

The 100% software company

If you know Stripe, Mailgun, or Zapier you might know what I’m talking about. They are all just a bunch of APIs. They are created to make running companies easier through automation. So we know that payments can be automated, billing, mail-delivery. But where is the limit?

What if there was a 100% software company that did client prospecting on it’s own, responded to clients on it’s own, resolved legal problems on it’s own and (blasphemy!) created a product on it’s own.

You get the picture.. everything on it’s own.

The people who I talked to about this said I was crazy (and that I want to destroy humanity).
Here’s what they say:

There is no way to get the accounting right (in France!!?! Crazy!!! Jail time!).

How would you even begin designing a product for users, have interviews with them, etc.. you would need a Hard AI! You totally 100% require a human for this.

They are right of course, but.. given that there are so many amazing tools that allow us to automate so many parts of our business then what remains unorganised, unstructured?

What if you don’t need human level intelligence if you just have better structured information? At least to make a stupid simple product.

I’m now going to borrow something from my art friends and say that I’m proposing an Art project. Look.. this is just an experiment, a joke, a way to show that building a business has nothing to do with having a human brain.

Of course, what I’m not going to try to implement this Art project. What I’m really after is finding the remaining parts of a business that are difficult to automate and try to make it automatic. Isn’t this what we are looking for? Look at all those SaaS companies trying to remove the pains, scale and automate stuff that wasn’t automated before? And they are so cheap too! Where does this all going to lead?

alt

My prediction is that soon all we’re going to have is a bunch of Cronjobs and message brokers lousily connecting the different APIs together controlled by some reinforcement learning algorithm that looks to increase that Stripe balance. Think Zapier, but without you creating all the rules.

While this swarm like AI is probably not technically feasible at the moment I personally use it as a framework for thinking about the products.

What hole is this product filling in my 100% software company?

Btw, if you’re looking to improve your customer support through automation. Come check us out at Gorgias.

Near-future is for human-computer hybrids

customer support oracle

Most of tech startups today try to be scrappy, to have many users and/or customers while keeping a small team. For some, this is the only way to survive and eventually become successful. This is possible because today’s technology is cheap, powerful and enables us to automate a lot of daily tasks that previously required many people.

The ideas behind this post are based on the premise that at least for the foreseeable future this trend is not going to change.

Automation and it’s limits

Software (SaaS or otherwise) companies are usually the first to embrace automation. Payments, customer communication (newsletters, drip e-mails, etc..), deployment, automated testing, statistical analysis, abtesting and other techniques enable such companies to stay small yet create and sometimes capture a lot of value.
Taking a product from the “production line” and putting it into your customers’ hands is a big part of the software company advantage, but there still exist a few areas that are not fully automated.

The human monopoly on creativity

The ‘creative’ jobs are mostly human and even though there are a few bots that rehash news articles, there is still a long way towards bots that can write software, write a good blog post or create a good website layout. Despite being an interesting subject to discuss, I will try to focus on another part of non-automation: customer support.

Customer support. How startups do it?

old dude typing

Some companies, such as Google, provide only partial customer or no user support, but most startups today try to have a close relationship with their customers: they send non-automated e-mails to potential clients and the founders answer each customer individually. Doing unscalable things is not only normal, but strongly encouraged, at least in the beginning.
This rightly gives the customer the impression of being taken care of and appreciated, which historically doesn’t happen at big corp. Since the main focus of startups should be growth, this type of customer support is not very scalable. Meaning that the company has to hire people as their customer/user base grows.

The scaling problem

super-charged customer support

How can startups keep the customer support quality they used to offer at the beginning and yet still keep scaling up. The answer would be: automate more things. However automatically answering e-mails is a difficult problem and I believe that it can be ascribed to hard-AI problems

A simple example

Let’s imagine a theoretical scenario of a customer called Anna that sends an e-mail to support@gorgias.io

Hi,
After the last update the keyboard completion functionality stopped
working on Gmail.
Can you help me out?
Thanks!

Let’s see some of the steps that are needed to solve her problem:

  1. I’m doing customer support so I read the e-mail and then try to reproduce the problem.
  2. Let’s say I reproduced it.
  3. I physically go to the developer and show her the issue (or describe it in an issue tracker).
  4. Once I do that, I reply to Anna saying that I managed to reproduce the problem, apologise for the inconvenience and then wait for a fix.
  5. Fortunately, it’s quickly fixed and the developer publishes an update and notifies me that it’s fixed.
  6. I return to Anna and notify her that it should be fixed.
  7. She replies that indeed it seems to work well now.

keyboard overload

There are a lot of steps (some of them might be missing here), not to take into account the hard work that is involved in finding the code and fixing the bug. A lot of information is not recorded between these steps and thus lost. Later on, it would be really hard to figure out, what the lifecycle of an
issue is, just by looking at the exchanged messages between me and the customer and, internally, between me and the developer. I only we could build a really good AI, like in the movies, that could automate at least part of those steps.

The oracle

customer support oracle

We can imagine an agent that is aware of the internal working of a company, an oracle that knows each customer’s situation at any given time, that could even answer some easy questions to customers and demand clarification from its coworkers. Alas, this is something of a sci-fi domain for now.

In the near-future however I think it’s much more likely that we’re going to have semi-intelligent agents (insect-level intelligence) that would help us with the task of doing customer support. They could be displaying the relevant information at the right time and even writing some part of the answer for us. This would make the experience of doing customer support more like editing than writing.
It’s hard to say, what the future will be like and what the problems we might encounter are, but I hope we’re not going to fix it by throwing more man-power at these problems.

A primitive form of the above post that is pushing in that direction is an extension for Google Chrome we built to write faster messages on the web. You can use it with Gmail, Outlook.com, Yahoo mail and many other websites.

You can check it out (it’s free)!