Kickstart Clickhouse in Google Cloud Platform — GCP
Clickhouse is a distributed DBMS for online analytical processing (OLAP) operations. It is by nature columnar-based and hence works great with Big Data analytics. Somehow Clickhouse works on Linux, Mac and PowerPC64LE CPU architectures only, which means it is not available on Windows (unless you are dual booting a Ubuntu or Debian OS on Windows). In this writing, we will go through how to setup Clickhouse on Google Cloud Platform — Debian Linux.
VM choice
If you are just creating a demo or POC Clickhouse instance, the VM defaults would be good enough. Do understood that the boot disk is defaulted to 10Gb only; hence if you are planning for production, typically you might either:
- set a larger boot disc at the very beginning OR
- add additional disks of 20Gb (or more) as the data partition (which means Clickhouse could be installed on the boot disc whilst data is stored in these additional disks)
One more thing is about costings; there is actual an option on your VM’s availability for saving money. Under the “NETWORKING, DISCS, SECURITY, MANAGEMENT, SOLE-TENANCY” >> Management >> Availability Policy >> Preemptibility >> Turn it “On”.
The downside of preemptibility is that the VM instance would be only available for 24 hours and then auto shutdown until you manually start it again (it is a good choice for demo or POC but not recommended for production, of course).
Feel free to tune your settings on the VM and click “Create” to finish the setup. After the creation is done, your VM instance should be booted up and time to login through the “SSH” option.
Installation
Once your SSH session is ready, several pre-requisites would need to configured before the real installation.
First of all, install “dirmngr” — required to handle OpenPGP and X.509 certificates related operations:
sudo apt install dirmngr
Next, update the key server:
sudo apt-key adv — keyserver hkp://keyserver.ubuntu.com:80 — recv E0C56BD4
Then, update the apt package repository information:
echo “deb http://repo.yandex.ru/clickhouse/deb/stable/ main/” | sudo tee /etc/apt/sources.list.d/clickhouse.list
Run an update on the apt package manager:
sudo apt update
Finally~ Install Clickhouse server and client through apt:
sudo apt install clickhouse-server clickhouse-client
and… DONE~
PS. during the installation, you would be prompt for the password of the default user (kind of root user). For demo or POC purposes, it is ok to leave it blank but strongly recommended to provide a password for production deployments.
PS. in case you didn’t set a password for the default user, it is possible to update the password through the /etc/clickhouse-server/users.xml file. Check the users >> default section under the xml file.
Kickstart
Now everything is installed, let’s start the Clickhouse server. The server and client executables are by default available under /usr/bin; hence they are accessible at anywhere like this:
clickhouse-server
PS. if you want the server to be started as background process instead, use “clickhouse-server &” and note down the process-id for future shutdowns.
In modern Linux OS, there should be a service manager handling all sorts of applications start, stop, restart and upgrades. On Debian, another way to start the server is as follows:
sudo service clickhouse-server start
to check the status of the service
sudo service clickhouse-server status
Once the server has started, let’s connect it by using the official client:
clickhouse-client
PS. if the default user has a password set; run “clickhouse-client --password” and key in the password value interactively.
run the following -> show databases;
Several built-in databases including the “system” should be displayed. Run a query on system.tables:
SELECT name, database FROM system.tables WHERE database = ‘system’ LIMIT 3
To stop the Clickhouse server simply:
sudo service clickhouse-server stop
To make sure the server is down, either run a “ps” command to check for clickhouse process id is still there or not (obtainable through service status command) OR just run the following:
sudo service clickhouse-server status
This time the status information would display a line “Active: inactive (dead)”.
Closings
Congratulations~ The followings have been accomplished within this writing:
- GCP VM choices suggestions
- Installation of Clickhouse server and client on Debian Linux (hosted on GCP)
- How to start and stop Clickhouse server
PS. even though not everybody would be choosing GCP as the deployment environment; however since Clickhouse is virtually only available on Linux and Mac, hence the installation steps covered above could be re-usable on self-hosted Debians or alternatives. For Ubuntu, most of the steps are identical. As for Mac users might need to use homebrew instead of apt.