Cockroach Database – Introduction and Basics.

Founded in 2015 with the mission of creating a open source version of Spanner (Google’s Cloud SQL database), Cockroach db is a distributed SQL database. In other words it is a Relational Database system (RDBMS) with the power of distributed architecture that provides scalability, availability and resiliency. Cassandra is also a distributed database but it is NOSQL and it is not RDBMS.

How Cockroach database gets its name ? Cockroach database got its name from the Cockroach insect. Like the resilient Cockroach species which survived even the catastrophe of dinosaur extinction, data stored in Cockroach db are also very resilient.

Among the ACID transaction guarantee,Cockroach db provides Serializable Isolation which is the highest possible Isolation level that can be provided by databases.

What is ACID transaction guarantee ?

  • Atomicity – Its all or nothing. If a part of the transaction fails the whole transaction should be rolled back.
  • Consistency – It enforces rules like foreign key or any other constraints on data so that the data will be consistent from one state to another.
  • Isolation – Even if the database does things in parallel and once done, it should appear to have been done serially one after the other.
  • Durability – Once committed the data should be available even in case of a node failure.

Important Concepts of Cockroach Database:

  1. Keyspace: Cockroach db stores all the user data and almost all system data in a single, huge sorted map of key value pairs.
  2. Ranges: The keyspace is divided into small units of a specific size (64 MiB by default) called ranges. If the range grows in size they split into 2.
  3. Replicas: The ranges are replicated based on the replication factor and then stored in the nodes of the cockroach db. If there are 3 nodes with a replication factor of 3 (3 is default), each range will be copied into each node. In other words there will be 3 copies of the same range in the cluster. To keep all the replica in sync, CockroachDB uses the Raft consensus algorithm. When the range splits the replicas split as well.

Install Cockroach db in windows as shown below via PowerShell or use a free cloud cluster as shown in the Cockroach db university website link:

PS C:\WINDOWS\system32> $ErrorActionPreference = "Stop"; [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12;$ProgressPreference = 'SilentlyContinue'; $null = New-Item -Type Directory -Force $env:appdata/cockroach; Invoke-WebRequest -Uri https://binaries.cockroachdb.com/cockroach-v21.2.8.windows-6.2-amd64.zip -OutFile cockroach.zip; Expand-Archive -Force -Path cockroach.zip; Copy-Item -Force "cockroach/cockroach-v21.2.8.windows-6.2-amd64/cockroach.exe" -Destination $env:appdata/cockroach; $Env:PATH += ";$env:appdata/cockroach"

PS C:\WINDOWS\system32> cockroach version
Build Tag: v21.2.8
Build Time: 2022/04/05 01:53:29
Distribution: CCL
Platform: windows amd64 (x86_64-w64-mingw32)
Go Version: go1.16.6
C Compiler: gcc 6.5.0
Build Commit ID: cad00009ffee952065c51687da6b1f71e44a7032
Build Type: release

Command to start a single node cluster in local cockroach db. Note: Before running this command make sure you have added $env:appdata/cockroach(in my case it is “C:\Users\mark2\AppData\Roaming\cockroach” to the windows “Path” environment variable.

cockroach start-single-node --insecure --listen-addr=localhost:26257 --http-addr=localhost:8080

Connecting to Cockroach db via SQL Shell.

cockroach sql --insecure

Command to show databases.

show databases;

Command to show tables in a particular database.

show tables from system;

Limit number of records returned in a Select statement SQL in Cockroach db.

select * from db.table_name  limit 10; 

Create Database and set it to current.

create database mydb;
set database = mydb; (or) use mydb;

Create table with a single primary key. Note: Once a primary key is created you cannot change it later.

create table users(id UUID PRIMARY KEY DEFAULT gen_random_uuid(), fullName String);

Create table with a composite primary key. Note: Once a primary key is created you cannot change it later.

create table address(adr_id UUID DEFAULT gen_random_uuid(), line1 String, country String, zip String, PRIMARY KEY(adr_id, line1, country, zip));

Show table schema.

root@:26257/mydb> show create table address;
table_name | create_statement
-------------+--------------------------------------------------------------------------------------
address | CREATE TABLE public.address (
| adr_id UUID NOT NULL DEFAULT gen_random_uuid(),
| line1 STRING NOT NULL,
| country STRING NOT NULL,
| zip STRING NOT NULL,
| CONSTRAINT "primary" PRIMARY KEY (adr_id ASC, line1 ASC, country ASC, zip ASC),
| FAMILY "primary" (adr_id, line1, country, zip)
| )
(1 row)

Another way to see table schema is to use show columns from tablename;

root@:26257/mydb> show columns from address;
  column_name | data_type | is_nullable |  column_default   | generation_expression |  indices  | is_hidden
--------------+-----------+-------------+-------------------+-----------------------+-----------+------------
  adr_id      | UUID      |    false    | gen_random_uuid() |                       | {primary} |   false
  line1       | STRING    |    false    | NULL              |                       | {primary} |   false
  country     | STRING    |    false    | NULL              |                       | {primary} |   false
  zip         | STRING    |    false    | NULL              |                       | {primary} |   false

Alter table.

alter table address add column line2 String;

show indexes.

show indexes from users;

Create indexes:

create index my_index on users (last_name, first_name);

Explain query plan in cockroach db.

The syntax is: explain <query> Note: In the result spans: shows if database did a full scan or checked only required rows. In the example below it did a Full table scan as there were no indexes on first_name and last_name.

root@:26257/mydb> explain select * from users where first_name = 'William' and last_name = 'Cross';
                                       info
-----------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • filter
  │ estimated row count: 1
  │ filter: (first_name = 'William') AND (last_name = 'Cross')
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 3 minutes ago)
        table: users@primary
        spans: FULL SCAN

Result of explain after creating indexes on last_name and first_name.

root@:26257/mydb> explain select * from users where first_name = 'William' and last_name = 'Cross';
                                       info
-----------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 2
  │ table: users@primary
  │
  └── • scan
        estimated row count: 2 (40% of the table; stats collected 27 minutes ago)
        table: users@my_index
        spans: [/'Cross'/'William' - /'Cross'/'William']

Transactions in Cockroach db. begin; starts a transaction.

begin;
> run sql statements of a transaction...
> commit; //to commit
> rollback; //to rollback.

To drop Cockroach database.

drop database mydb;

If cockroach db complains: ERROR: rejected (sql_safe_updates = true): DROP DATABASE on non-empty database without explicit CASCADE, then SET sql_safe_updates = false; and continue with drop database as show below.

SET sql_safe_updates = false;
drop database mydb;
SET sql_safe_updates = true;

To exit / come out of Cockroach db SQL shell use \q

root@:26257/defaultdb> \q
PS C:\Users\mark2>

To stop cockroach db nodes and release http ports used by the nodes use: TASKKILL /F /IM cockroach.exe for windows as shown below.

PS C:\WINDOWS\system32> TASKKILL /F /IM cockroach.exe
SUCCESS: The process "cockroach.exe" with PID 3204 has been terminated.

Start a multinode cockroach cluster (Windows). At a minimum 3 nodes are needed for a production deployment. Run the below commands in 3 different powershells.

cockroach start --insecure --listen-addr=localhost:26257 --join=localhost:26257,localhost:26258,localhost:26259 --http-addr=localhost:8080 --store=cockroach-data-1

cockroach start --insecure --listen-addr=localhost:26258 --join=localhost:26257,localhost:26258,localhost:26259 --http-addr=localhost:8081 --store=cockroach-data-2

cockroach start --insecure --listen-addr=localhost:26259 --join=localhost:26257,localhost:26258,localhost:26259 --http-addr=localhost:8082 --store=cockroach-data-3

To Initialize the cockroach db cluster.

PS C:\WINDOWS\system32> cockroach init --host localhost:26257 --insecure
Cluster successfully initialized

Note: If you get the error ERROR: cluster has already been initialized, delete the datastore directories cockroach-data-1 – 3 and create the nodes again.

To add more nodes to the cluster. Run in separate powershells. The below commands create 4th and 5th nodes.

cockroach start --insecure --listen-addr=localhost:26260 --join=localhost:26257,localhost:26258,localhost:26259,localhost:26260 --http-addr=localhost:8083 --store=cockroach-data-4

cockroach start --insecure --listen-addr=localhost:26261 --join=localhost:26257,localhost:26258,localhost:26259,localhost:26260,localhost:26261 --http-addr=localhost:8084 --store=cockroach-data-5

Now the admin UI (for any of the node) shows that the 5 nodes are running.

To load sample data movr the comes with cockroachdb binary.

cockroach workload init movr

To specify the locality of a node use --locality flag when creating the node. They are key value pairs and should be ordered from most inclusive to least inclusive (example: --locality=country=us,region=us-east-1,zone=a) Note: when using cockroach cloud locality is set automatically.

cockroach start --insecure --locality=country=us,region=us-central --listen-addr=localhost:26260 --join=localhost:26257,localhost:26258,localhost:26259,localhost:26260 --http-addr=localhost:8083 --store=cockroach-data-4

Cockroach db will replicate the ranges based on locality to make the range replicas as distributed as possible.

Multi-Region Capabilities:

Cockroach db cluster nodes can reside in multiple regions such as us-east-1 and us-west-1. We can make a CockroachDB database a multi-region database and then set the Table locality to be either one of the below.

  • Regional Table: Provides low latency reads and writes for an entire table from a single region.
  • Regional by Row tables: Provides low latency reads and writes for one or more rows of a table from a single region.
  • Global tables: Provides low latency reads from all regions.

To create to a temporary in-memory Cockroach cluster of 9 nodes.

cockroach demo movr --nodes 9

To make the database a multi region database.

SHOW REGIONS FROM CLUSTER;

ALTER DATABASE movr PRIMARY REGION "us-east1";
SHOW REGIONS FROM DATABASE movr;

ALTER DATABASE movr ADD REGION "us-west1";
ALTER DATABASE movr ADD REGION "europe-west1";
SHOW REGIONS FROM DATABASE movr;
SHOW TABLES;

Survival Goal specifies how many simultaneous failures a database can survive. The available survival goals are Zone failure and Region failure. To alter survival goals:

SHOW SURVIVAL GOAL FROM DATABASE movr;
ALTER DATABASE movr SURVIVE REGION FAILURE;
SHOW SURVIVAL GOAL FROM DATABASE movr;

To set table locality to GLOBAL.

ALTER TABLE promo_codes SET LOCALITY GLOBAL;
SHOW CREATE TABLE promo_codes; 

To set table locality to Regional by row.

ALTER TABLE rides LOCALITY REGIONAL BY ROW;
SHOW CREATE TABLE rides;

Reference:

%d bloggers like this: