Kickstart Clickhouse in Google Cloud Platform — GCP

devops terminal
5 min readDec 5, 2021

Clickhouse is a distributed DBMS for online analytical processing (OLAP) operations. It is by nature columnar-based and hence works great with Big Data analytics. Somehow Clickhouse works on Linux, Mac and PowerPC64LE CPU architectures only, which means it is not available on Windows (unless you are dual booting a Ubuntu or Debian OS on Windows). In this writing, we will go through how to setup Clickhouse on Google Cloud Platform — Debian Linux.

VM choice

If you are just creating a demo or POC Clickhouse instance, the VM defaults would be good enough. Do understood that the boot disk is defaulted to 10Gb only; hence if you are planning for production, typically you might either:

  • set a larger boot disc at the very beginning OR
  • add additional disks of 20Gb (or more) as the data partition (which means Clickhouse could be installed on the boot disc whilst data is stored in these additional disks)

One more thing is about costings; there is actual an option on your VM’s availability for saving money. Under the “NETWORKING, DISCS, SECURITY, MANAGEMENT, SOLE-TENANCY” >> Management >> Availability Policy >> Preemptibility >> Turn it “On”.

The downside of preemptibility is that the VM instance would be only available for 24 hours and then auto shutdown until you manually start it again (it is a good choice for demo or POC but not recommended for production, of course).

E2 instances -> 2 CPU + 4GB ram (defaults)
OS is Debian Linux, boot disc 10Gb Size
preemptibililty option

Feel free to tune your settings on the VM and click “Create” to finish the setup. After the creation is done, your VM instance should be booted up and time to login through the “SSH” option.

a 24 hour available VM -> connect with “SSH” option

Installation

Once your SSH session is ready, several pre-requisites would need to configured before the real installation.

First of all, install “dirmngr” — required to handle OpenPGP and X.509 certificates related operations:

sudo apt install dirmngr

Next, update the key server:

sudo apt-key adv — keyserver hkp://keyserver.ubuntu.com:80 — recv E0C56BD4

Then, update the apt package repository information:

echo “deb http://repo.yandex.ru/clickhouse/deb/stable/ main/” | sudo tee /etc/apt/sources.list.d/clickhouse.list

Run an update on the apt package manager:

sudo apt update

Finally~ Install Clickhouse server and client through apt:

sudo apt install clickhouse-server clickhouse-client

and… DONE~

PS. during the installation, you would be prompt for the password of the default user (kind of root user). For demo or POC purposes, it is ok to leave it blank but strongly recommended to provide a password for production deployments.

PS. in case you didn’t set a password for the default user, it is possible to update the password through the /etc/clickhouse-server/users.xml file. Check the users >> default section under the xml file.

Kickstart

Now everything is installed, let’s start the Clickhouse server. The server and client executables are by default available under /usr/bin; hence they are accessible at anywhere like this:

clickhouse-server

Application: Ready for connection.

PS. if you want the server to be started as background process instead, use “clickhouse-server &” and note down the process-id for future shutdowns.

In modern Linux OS, there should be a service manager handling all sorts of applications start, stop, restart and upgrades. On Debian, another way to start the server is as follows:

sudo service clickhouse-server start

to check the status of the service

sudo service clickhouse-server status

running clickhouse-server as a service

Once the server has started, let’s connect it by using the official client:

clickhouse-client

PS. if the default user has a password set; run “clickhouse-client --password” and key in the password value interactively.

run the following -> show databases;

show databases

Several built-in databases including the “system” should be displayed. Run a query on system.tables:

SELECT name, database FROM system.tables WHERE database = ‘system’ LIMIT 3

available tables under the system database

To stop the Clickhouse server simply:

sudo service clickhouse-server stop

To make sure the server is down, either run a “ps” command to check for clickhouse process id is still there or not (obtainable through service status command) OR just run the following:

sudo service clickhouse-server status

This time the status information would display a line “Active: inactive (dead)”.

Closings

Congratulations~ The followings have been accomplished within this writing:

  • GCP VM choices suggestions
  • Installation of Clickhouse server and client on Debian Linux (hosted on GCP)
  • How to start and stop Clickhouse server

PS. even though not everybody would be choosing GCP as the deployment environment; however since Clickhouse is virtually only available on Linux and Mac, hence the installation steps covered above could be re-usable on self-hosted Debians or alternatives. For Ubuntu, most of the steps are identical. As for Mac users might need to use homebrew instead of apt.

--

--

devops terminal

a java / golang / flutter developer, a big data scientist, a father :)