Scenario
Assume that a Cassandra table stores the user information (such as email address and password) and user status updates. Generally, the user information remains almost unchanged when the user status frequently changes. If the user information is updated along with each status update, a large amount of storage space is used.
Cassandra provides the static column feature to solve this problem. The static column that is declared in the same partition key has only one value. This indicates that only one set of data in this column is stored.
Define a static column
Add STATIC to the end of a column name to define the column as a static column, as shown in the following example:
CREATE TABLE "iteblog_users_with_status_updates" (
"username" text,
"id" timeuuid,
"email" text STATIC,
"encrypted_password" blob STATIC,
"body" text,
PRIMARY KEY ("username", "id")
);
The preceding command sets both email and encrypted_password fields to STATIC. This indicates that a username has only one email value and one encrypted_password value.
Limits on static columns
You cannot add the STATIC keyword to a column in each table. Static columns have the following limits:
Clustering columns (also known as clustering keys) are not defined for the table, as shown in the following example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username")
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static columns are only useful (and thus allowed) if the table has at least one clustering column"
You cannot create static columns in the iteblog_users_with_status_updates_invalid table because the table has only a primary key and clustering columns are not defined for the table. The more rows of data for the same partition key result in the better performance for the static column. If no clustering columns are defined, a primary key value identifies a unique row in a partition and no static columns are required.
The table is set to a COMPACT STORAGE table, as shown in the following example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username", "id")
... )WITH COMPACT STORAGE;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static columns are not supported in COMPACT STORAGE tables"
Columns used as part of partition keys or clustering columns cannot be set to STATIC, as shown in the following example:
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid STATIC,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY ("username", "id")
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static column id cannot be part of the PRIMARY KEY"
cqlsh:iteblog_keyspace> CREATE TABLE "iteblog_users_with_status_updates_invalid" (
... "username" text,
... "id" timeuuid,
... "email" text STATIC,
... "encrypted_password" blob STATIC,
... "body" text,
... PRIMARY KEY (("username", "id"), email)
... );
InvalidRequest: Error from server: code=2200 [Invalid query] message="Static column email cannot be part of the PRIMARY KEY"
Insert data into a table that contains static columns
Data is inserted into a table that contains static columns in a similar way in which data is inserted into a standard table. For example, you can execute the following statements to insert data into the iteblog_users_with_status_updates table:
cqlsh:iteblog_keyspace> INSERT INTO "iteblog_users_with_status_updates"
... ("username", "id", "email", "encrypted_password", "body")
... VALUES (
... 'iteblog',
... NOW(),
... 'iteblog_hadoop@iteblog.com',
... 0x877E8C36EFA827DBD4CAFBC92DD90D76,
... 'Learning Cassandra!'
... );
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+----------------------------+------------------------------------+---------------------
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
(1 rows)
A row of data is inserted into the table. The preceding statements have performed the following two operations:
All the data in the email and encrypted_password columns for the iteblog username is set to iteblog_hadoop@iteblog.com and 0x877e8c36efa827dbd4cafbc92dd90d76.
A row for the Learning Cassandra! body is added to the iteblog partition. Execute the following statements to insert another row into the table:
cqlsh:iteblog_keyspace> INSERT INTO "iteblog_users_with_status_updates"
... ("username", "id", "body")
... VALUES ('iteblog', NOW(), 'I love Cassandra!') ;
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+----------------------------+------------------------------------+---------------------
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
iteblog | iteblog_hadoop@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | I love Cassandra!
(2 rows)
cqlsh:iteblog_keyspace>
The statements show that the email and encrypted_password columns are not specified when data is inserted into the table. However, the query result shows that the values of the two columns in the new row are the same as the values that are specified in the preceding statements.
The user modifies the email address due to some reasons, as shown in the following example:
cqlsh:iteblog_keyspace> UPDATE iteblog_users_with_status_updates SET email = 'iteblog@iteblog.com'
... WHERE username = 'iteblog';
cqlsh:iteblog_keyspace> select username, email, encrypted_password, body from iteblog_users_with_status_updates;
username | email | encrypted_password | body
----------+---------------------+------------------------------------+---------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | Learning Cassandra!
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76 | I love Cassandra!
(2 rows)
The query result shows that all the emails of the iteblog username are modified as the new email because the email column is static.
The table stores user information such as the email address and password. If a user modifies the email address and password on the frontend page, the backend system must obtain the current email address and password by executing the following statement:
cqlsh:iteblog_keyspace> SELECT "username", "email", "encrypted_password"
... FROM "iteblog_users_with_status_updates"
... WHERE "username" = 'iteblog';
username | email | encrypted_password
----------+---------------------+------------------------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
(2 rows)
All the rows of the email addresses and passwords for the iteblog username are returned. This result is unexpected. You can execute the SELECT DISTINCT statement to eliminate duplicate rows:
cqlsh:iteblog_keyspace> SELECT DISTINCT "username", "email", "encrypted_password"
... FROM "iteblog_users_with_status_updates"
... WHERE "username" = 'iteblog';
username | email | encrypted_password
----------+---------------------+------------------------------------
iteblog | iteblog@iteblog.com | 0x877e8c36efa827dbd4cafbc92dd90d76
(1 rows)
All the duplicate rows for the iteblog username are eliminated, and a unique row appears in the end.
Cassandra does not perform the DISTINCT operation for all the data of the iteblog username. This is because a set of the static column data is stored in the underlying system.