All Products
Search
Document Center

PolarDB:Transparent data encryption

Last Updated:Jun 05, 2024

This topic describes transparent data encryption (TDE), which encryps data at rest, safeguarding sensitive database files on disks against potential attackers.

Prerequisites

Your PolarDB for PostgreSQL cluster runs one of the following engines:

  • PostgreSQL 14 ( revision version 14.5.1.1 or later)

  • PostgreSQL 11 (revision version 1.1.1 or later)

Note

You can execute one of the following statements to view the revision version of your PolarDB for PostgreSQL cluster:

  • PostgreSQL 14

    select version();
  • PostgreSQL 11

    show polar_version;

Background information

In China, to safeguard information security on the Internet, relevant service providers are required to meet data security standards. Such standards include:

  • Cryptography Law of the People's Republic of China (effective January 1, 2020)

  • Information Security Technology — Baseline for Classified Protection of Cybersecurity (GB/T 22239-2019)

Industrial and regional organizations have also introduced data security standards such as:

  • Payment Card Industry Data Security Standard (PCI DSS)

  • Health Insurance Portability and Accountability Act (HIPAA)

  • General Data Protection Regulation (GDPR)

  • California Consumer Protection Act (CCPA)

  • Sarbanes-Oxley Act (SOX)

To meet the requirements for data security protection, PolarDB provides the TDE feature. TDE allows authenticated users to access encrypted data without the need to modify the application code or configurations for decryption. It stops unauthorized attempts from the operating system to access sensitive information in the tablespace or on disks and database backups.

Terms

Term

Description

Key Encryption Key (KEK)

The key that is used to encrypt another key.

Memory Data Encryption Key (MDEK)

The key that is used to encrypt data and is stored in memory. It is randomly generated using the pg_strong_random function.

Table Data Encryption Key (TDEK)

The key that is used to encrypt table data and is stored in memory. It is generated from an MDEK using the HKDF algorithm.

WAL Data Encryption Key (WDEK)

The key that is used to encrypt WAL logs. It is generated from an MDEK using the HKDF algorithm.

Hash-based Message Authentication Code of Key (HMACK)

Along with a KEK, an HMACK is generated using SHA-512 from a passphrase.

Hash-based Message Authentication Code of Key Encryption Key (KEK_HMAC)

The authentication information used during key decryption. It is generated from an ENCMDEK and an HMACK using the HMAC algorithm.

Encode Memory Data Encryption Key (ENCMDEK)

The key that is used to encrypt data and is stored in memory. It is generated by encrypting an MDEK with a KEK.

How TDE works

  • Key management module

    • Key structure

      TDE uses a two-layer key structure that consists of a KEK and a TDEK. The TDEK is used to encrypt database data, and the KEK is used to encrypt the TDEK. Detailed descriptions of the two-layer key structure are as follows:

      • KEK and HMACK: the 64-byte data, with the first 32 bytes as the KEK and the last 32 bytes as the HMACK. It is generated by applying the SHA-512 algorithm to the output of executing the command specified in the polar_cluster_passphrase_command parameter.

      • TDEK and WDEK: keys for data and WAL log encryption, which are generated using a cryptographically secure random number generator. Ciphertexts, encrypted using the two keys, are further encrypted with the HMACK and the HMAC algorithm to generate the RDEK_HMAC and WEDK_HMAC. RDEK_HMAC and WEDK_HMAC are used for KEK verification and are stored in the shared storage.

      The KEK and HMACK are always obtained from an external system, such as Key Management Service (KMS). During testing, you can run the echo passphrase command to obtain the two keys. The ENCMDEK and KEK_HMAC must be stored in the shared storage to ensure that primary and read-only nodes can read the key file and obtain the data encryption key at the next startup. The data structure is as follows:

      typedef struct KmgrFileData
      {
          /* version for kmgr file */
          uint32      kmgr_version_no;
      
          /* Are data pages encrypted? Zero if encryption is disabled */
          uint32      data_encryption_cipher;
      
          /*
           * Wrapped Key information for data encryption.
           */
          WrappedEncKeyWithHmac tde_rdek;
          WrappedEncKeyWithHmac tde_wdek;
      
          /* CRC of all above ... MUST BE LAST! */
          pg_crc32c   crc;
      } KmgrFileData;

      This file is generated during the execution of the initdb command. This ensures that the read-only node that serves as the standby can obtain the file by using pg_basebackup.

      When a cluster is running, TDE-related control information is stored in process memory in the following structure:

      static keydata_t keyEncKey[TDE_KEK_SIZE];
      static keydata_t relEncKey[TDE_MAX_DEK_SIZE];
      static keydata_t walEncKey[TDE_MAX_DEK_SIZE];
      char *polar_cluster_passphrase_command = NULL;
      extern int data_encryption_cipher;
    • Key encryption

      Keys need to be generated during database initialization. The following figure shows the process.密钥加密

      1. Run the command specified by the polar_cluster_passphrase_command parameter to obtain a 32-byte KEK and a 32-byte HMACK.

      2. Use the random number generation algorithm of OpenSSL to generate an MDEK.

      3. Use the HKDF algorithm of OpenSSL to generate a TDEK from the MDEK.

      4. Use the HKDF algorithm of OpenSSL to generate a WDEK from the MDEK.

      5. Encrypt the MDEK with the KEK to generate an ENCMDEK.

      6. Use the HMAC algorithm to generate a KEK_HMAC from the ENCMDEK and HMACK. The KEK_HMAC is used for verification during key decryption.

      7. Write the ENCMDEK, KEK_HMAC, and other supplement information in the KmgrFileData structure into the global/kmgr file.

    • Key decryption

      In the case of database crash or restart, you must decrypt the corresponding keys based on limited ciphertext information. The following figure shows the process.密钥解密

      1. Read the global/kmgr file to obtain the ENCMDEK and KEK_HMAC.

      2. Run the command specified by the polar_cluster_passphrase_command parameter to a 32-byte KEK and a 32-byte HMACK.

      3. Use the HMAC algorithm to generate a KEK_HMAC' from the ENCMDEK and HMACK. Check whether the KEK_HMAC and KEK_HMAC' are the same. If so, proceed to the next step. If not, return an error.

      4. Decrypt the ENCMDEK by using the KEK to generate an MDEK.

      5. Use the HKDF algorithm of OpenSSL to generate a TDEK from the MDEK. The same TDEK can be generated because the information is unchanged.

      6. Use the HKDF algorithm of OpenSSL to generate a WDEK from the MDEK. The same WDEK can be generated because the information is unchanged.

    • Key replacement

      Key replacement is a process of decrypting keys with an old KEK and generating a new global/kmgr file with a new KEK. The following figure shows the process.密钥更换

      1. Read the old global/kmgr file to obtain the ENCMDEK and KEK_HMAC.

      2. Run the command specified by the polar_cluster_passphrase_command parameter to generate a 32-byte KEK and a 32-byte HMACK.

      3. Use the HMAC algorithm to generate a KEK_HMAC' from the ENCMDEK and HMACK. Check whether the KEK_HMAC and KEK_HMAC' are the same. If so, proceed to the next step. If not, an error is returned.

      4. Decrypt the ENCMDEK by using the old KEK to generate an MDEK.

      5. Run the command specified by the polar_cluster_passphrase_command parameter to obtain a new KEK and a new HMACK.

      6. Encrypt the MDEK by using the new KEK to generate a new ENCMDEK.

      7. Use the HMAC algorithm to generate a new KEK_HMAC from the new ENCMDEK and new HMACK. The new KEK_HMAC is used for verification during key decryption.

      8. Write the new ENCMDEK, the new KEK_HMAC, and other supplement information in the KmgrFileData structure into the global/kmgr file.

  • Encryption module

    All user data is expected to be encrypted by using the AES-128 or AES-256 encryption algorithm at the page level. AES-256 is used by default. (page LSN, page number) is used as the initial vector (IV) for the encryption of each data page. The IV ensures that different encryption results can be generated from the same content.

    The following code shows the header data structure of each page.

    typedef struct PageHeaderData
    {
        /* XXX LSN is member of *any* block, not only page-organized ones */
        PageXLogRecPtr pd_lsn;      /* LSN: next byte after last byte of xlog
                                     * record for last change to this page */
        uint16      pd_checksum;    /* checksum */
        uint16      pd_flags;       /* flag bits, see below */
        LocationIndex pd_lower;     /* offset to start of free space */
        LocationIndex pd_upper;     /* offset to end of free space */
        LocationIndex pd_special;   /* offset to start of special space */
        uint16      pd_pagesize_version;
        TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
        ItemIdData  pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
    } PageHeaderData;
    Note
    • pd_lsn is not encrypted because the IV is required for its decryption.

    • pd_flags adds the 0x8000 flag to indicate whether a page is encrypted. pd_flags is not encrypted. This approach can maintain compatibility with plaintext pages and allow the introduction of TDE for incremental clusters.

    • pd_checksum is not encrypted. This way, the page checksum can be checked in ciphertext.

    • Encrypted files

      Files that contain user data are encrypted. For example, the files in the following subdirectories of the data directory are encrypted:

      • base/

      • global/

      • pg_tblspc/

      • pg_replslot/

      • pg_stat/

      • pg_stat_tmp/

    • When to encrypt

      Data that is organized by data page is encrypted by page. Before page data is written to disks, the checksum must be calculated, either using checksum-related parameters or functions, such as PageSetChecksumCopy or PageSetChecksumInplace if the parameters are disabled. You need to encrypt page data before the checksum is calculated. This ensures that user data in storage is encrypted.

  • Decryption module

    Before being read into memory, page data stored must undergo verification using the checksum. If verification parameters are disabled, the verification function PageIsVerified will be called. Therefore, as long as data decryption occurs after checksum verification, data in memory is decrypted.