All Products
Search
Document Center

PolarDB:TDE

Last Updated:May 20, 2024

Transparent Data Encryption (TDE) performs data-at-rest encryption at the database layer. This prevents potential attackers from bypassing the database to read sensitive information from the storage layer.

Background information

In China, to ensure the security of Internet information, the state requires that relevant service developers meet some data security standards. Such standards include:

Internationally, regulatory data security standards also exist for some related industries. Such standards include:

  • Payment Card Industry Data Security Standard (PCI DSS)

  • Health Insurance Portability and Accountability Act (HIPAA)

  • General Data Protection Regulation (GDPR)

  • California Consumer Protection Act (CCPA)

  • Sarbanes-Oxley Act (SOX)

To meet the requirements for protecting user data security, PolarDB provides the TDE feature. TDE allows users who have passed database authentication to access encrypted data without the need to modify the application code or configurations. Operating system users who attempt to read sensitive information within tablespace files and unauthorized users who attempt to read backup data or on-disk data cannot obtain the plaintext information.

Terms

Term

Description

key encryption key (KEK)

The key that is used to further encrypt another key.

memory data encryption key (MDEK)

The data encryption key that is stored in memory. An MDEK is randomly generated by using the pg_strong_random function and stored in memory as the key for actual data encryption.

table data encryption key (TDEK)

The key that is used to encrypt table data. A TDEK is generated from an MDEK by using the hash-based message authentication code-based key derivation function (HKDF) algorithm. A TDEK is stored in memory as the key for actual data encryption.

write-ahead logging data encryption key (WDEK)

The key that is used to encrypt log data. A WDEK is generated from an MDEK by using the HKDF algorithm. A WDEK is stored in memory as the key for actual data encryption.

hash-based message authentication code of key (HMACK)

The key that is generated by using the hash-based message authentication code (HMAC) algorithm. A KEK and an HMACK are generated by encrypting a passphrase with Secure Hash Algorithm 512 (SHA-512).

hash-based message authentication code of key encryption key (KEK_HMAC)

The KEK digest that is generated by using the HMAC algorithm. A KEK_HMAC is generated from an ENCMDEK and an HMACK by using the HMAC algorithm. A KEK_HMAC is used for verification during key decryption.

encoded memory data encryption key (ENCMDEK)

The data encryption key that is stored in memory. An ENCMDEK is generated by encrypting an MDEK with a KEK.

How it works

  • Key management module

    • Key structure

      TDE uses a two-layer key structure that consists of a KEK and a TDEK. The TDEK is used to actually encrypt database data. The KEK is used to further encrypt the TDEK. The following section describes the two-layer key structure in detail:

      • KEK and HMACK: the 64-byte data that is obtained by running the command specified by the polar_cluster_passphrase_command parameter and performing SHA-512 calculation. The first 32 bytes of the data are used as the KEK and the last 32 bytes of the data are used as the HMACK.

      • TDEK and WDEK: the keys that are generated by using a secure random number generator in cryptography, which are the real keys for data and WAL log encryption. The ciphertext that is obtained by using the two keys is further calculated by using the HMACK and the HMAC algorithm to obtain the RDEK_HMAC and WEDK_HMAC, which are used to verify the KEK and are stored in the shared storage.

      The KEK and HMACK are always obtained from an external system, such as Key Management Service (KMS). During testing, you can run the echo passphrase command to obtain the two keys. The ENCMDEK and KEK_HMAC must be stored in the shared storage to ensure that primary and read-only nodes can read the key file and obtain the real encryption key at the next startup. The following code shows the data structure:

      typedef struct KmgrFileData
      {
          /* version for kmgr file */
          uint32      kmgr_version_no;
      
          /* Are data pages encrypted? Zero if encryption is disabled */
          uint32      data_encryption_cipher;
      
          /*
           * Wrapped Key information for data encryption.
           */
          WrappedEncKeyWithHmac tde_rdek;
          WrappedEncKeyWithHmac tde_wdek;
      
          /* CRC of all above ... MUST BE LAST! */
          pg_crc32c   crc;
      } KmgrFileData; 

      This file is generated during the execution of the initdb command. This ensures that standby nodes can obtain the file by using pg_basebackup.

      When a cluster is running, the TDE-related control information is stored in the memory of processes in the following the structure:

      static keydata_t keyEncKey[TDE_KEK_SIZE];
      static keydata_t relEncKey[TDE_MAX_DEK_SIZE];
      static keydata_t walEncKey[TDE_MAX_DEK_SIZE];
      char *polar_cluster_passphrase_command = NULL;
      extern int data_encryption_cipher;
    • Key encryption

      Keys need to be generated during database initialization. The following figure shows the process.密钥加密

      1. Run the command specified by the polar_cluster_passphrase_command parameter to obtain a 32-byte KEK and a 32-byte HMACK.

      2. Use the random number generation algorithm of OpenSSL to generate an MDEK.

      3. Use the HKDF algorithm of OpenSSL to generate a TDEK from the MDEK.

      4. Use the HKDF algorithm of OpenSSL to generate a WDEK from the MDEK.

      5. Encrypt the MDEK by using the KEK to generate an ENCMDEK.

      6. Use the HMAC algorithm to generate a KEK_HMAC from the ENCMDEK and HMACK. The KEK_HMAC is used for verification during key decryption.

      7. Write the ENCMDEK, KEK_HMAC, and other supplement information in the KmgrFileData structure into the global/kmgr file.

    • Key decryption

      If the database crashes or restarts, you must decrypt the corresponding keys based on limited ciphertext information. The following figure shows the process.密钥解密

      1. Read the global/kmgr file to obtain the ENCMDEK and KEK_HMAC.

      2. Run the command specified by the polar_cluster_passphrase_command parameter to obtain a 32-byte KEK and a 32-byte HMACK.

      3. Use the HMAC algorithm to generate the KEK_HMAC' from the ENCMDEK and HMACK. Check whether the KEK_HMAC and KEK_HMAC' are the same. If so, proceed to the next step. If not, return an error.

      4. Decrypt the ENCMDEK by using the KEK to generate an MDEK.

      5. Use the HKDF algorithm of OpenSSL to generate a TDEK from the MDEK. The same TDEK can be generated because the information is specific.

      6. Use the HKDF algorithm of OpenSSL to generate a WDEK from the MDEK. The same WDEK can be generated because the information is specific.

    • Key replacement

      The key replacement process can be understood as a process in which keys are decrypted by using the old KEK, and a new kmgr file is generated by using a new KEK. The following figure shows the process.密钥更换

      1. Read the global/kmgr file to obtain the ENCMDEK and KEK_HMAC.

      2. Run the command specified by the polar_cluster_passphrase_command parameter to a 32-byte KEK and a 32-byte HMACK.

      3. Use the HMAC algorithm to generate the KEK_HMAC' from the ENCMDEK and HMACK. Check whether the KEK_HMAC and KEK_HMAC' are the same. If so, proceed to the next step. If not, return an error.

      4. Decrypt the ENCMDEK by using the KEK to generate an MDEK.

      5. Run the command specified by the polar_cluster_passphrase_command parameter to obtain a new KEK and a new HMACK.

      6. Encrypt the MDEK by using the new KEK to generate a new ENCMDEK.

      7. Use the HMAC algorithm to generate a new KEK_HMAC from the new ENCMDEK and new HMACK. The new KEK_HMAC is used for verification during key decryption.

      8. Write the new ENCMDEK, the new KEK_HMAC, and other supplement information in the KmgrFileData structure into the global/kmgr file.

  • Encryption module

    All user data is expected to be encrypted by using the Advanced Encryption Standard 128-bit (AES-128) or AES-256 encryption algorithm at the page granularity. AES-256 is used by default. (page LSN, page number) is used as the initial vector (IV) for the encryption of each data page. The IV ensures that different encryption results can be generated for the same content.

    The following code shows the header data structure of each page.

    typedef struct PageHeaderData
    {
        /* XXX LSN is member of *any* block, not only page-organized ones */
        PageXLogRecPtr pd_lsn;      /* LSN: next byte after last byte of xlog
                                     * record for last change to this page */
        uint16      pd_checksum;    /* checksum */
        uint16      pd_flags;       /* flag bits, see below */
        LocationIndex pd_lower;     /* offset to start of free space */
        LocationIndex pd_upper;     /* offset to end of free space */
        LocationIndex pd_special;   /* offset to start of special space */
        uint16      pd_pagesize_version;
        TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
        ItemIdData  pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
    } PageHeaderData;
    Note

    Take note of the following points:

    • pd_lsn is not encrypted because the IV is required for decryption.

    • pd_flags adds the flag bits 0x8000 to specify whether to encrypt page data and is not encrypted. This way, the pages can be read in plaintext, and TDE can be enabled for incremental clusters.

    • pd_checksum is not encrypted. This way, the page checksum can be checked in ciphertext.

    • Encrypted files

      Files that contain user data are encrypted. For example, the files in the following subdirectories of the data directory are encrypted:

      • base/

      • global/

      • pg_tblspc/

      • pg_replslot/

      • pg_stat/

      • pg_stat_tmp/

    • When to encrypt

      Data that is organized by data page is encrypted by page. The checksum must be calculated before page data is stored on disks. Even if the checksum-related parameters are disabled, the checksum-related function PageSetChecksumCopy or PageSetChecksumInplace is called. You need to only encrypt page data before the checksum is calculated. This ensures that the user data is encrypted in storage.

  • Decryption module

    The page data in storage must be verified by using the checksum before the data is read into the memory. Even if the related parameters are disabled, the verification function PageIsVerified is called. You need to only decrypt such data after the checksum is calculated. This ensures that the data is decrypted in memory.