Envelope Encryption for SQLAlchemy Fields

Fancy envelope with seal

Descriptors, Decorators, and Data – Oh My!

Let’s face it, data security is no laughing matter. But that doesn’t mean we can’t have a little fun while we learn how to protect it! Today, we’re diving into the world of Envelope Encryption and SQLAlchemy, a match made in secure-data heaven.

Introducing SQLAlchemy: The Database Wrangler

SQLAlchemy is your friendly neighborhood ORM (Object Relational Mapper) for Python. It lets you define database tables as Python classes, making interacting with your data a breeze. No more wrestling with raw SQL queries – SQLAlchemy translates your Python code into the appropriate database calls, keeping you far away from the database plumbing.

SQLAlchemy-Utils: Almost Perfect, But Not Quite

A companion module, SQLAchemy-Utils, provides the StringEncryptedType column for handling the serialization and deserialization of encrypted data. The problem with the column type is that it relies on a singular master key, defined at model declaration time. We would like to employ envelope encryption for added flexibility.

What is Envelope Encryption?

Envelope Encryption is a process by which data is secured at rest. The data is encrypted with a key that is stored right along side the data. The caveat is that the key is encrypted…with another key!

Let’s get a lexicon established:

Data Encryption Key (DEK)Used to encrypt the users’s data.
Key Encryption Key (KEK)Used to encrypt the DEK.
Envelope Encryption system overview

Image via Google’s Cloud’s overview of Envelope Encryption.

Since the DEK is created dynamically you can see how the StringEncryptedType column from SQLAlchemy-Utils won’t work out for our scenario. Instead, we’ll need a way to declare a DEK that can live along side our stored data.

To achieve this in a neatly packaged method, we’ll employ Python Descriptors and a Class Decorator to help.

Python Descriptor

Python Descriptors let objects customize attribute lookup, storage, and deletion. Descriptors are a handy mechanism for creating dynamic attributes and controlling what happens whey they are get or set. Descriptors have a __get__ and __set__ method for handling the retrieval and storage of a parameter, respectively.

Example

This (very useless example) will print some logging statements whenever the parameter is get or set.

class AccessLogger:
    def __set_name__(self, owner, name):
        self.public_name = name
        self.private_name = '_' + name

    def __get__(self, obj, objetype=None):
        value = getattr(obj, self.private_name)
        print(f'Accessing {self.public_name} - {value})

    def __set__(self, obj, value):
        print(f'Setting {self.public_name} - {value})
        setattr(obj, self.private_name, value)

class Fruit:
    kind = AccessLogger()

If we did Fruit().kind = “apple” then the AccessLogger would store “apple” into a dynamic field: _kind. This may seem a touch overengineered, but when we get to adding a class decorator, you’ll see how the Descriptor becomes extremely valuable.

Our Use Case

We want to automatically encrypt/decrypt SQLAlchemy fields at rest which are declared with a postfix of _encrypted. We want to be blissfully ignorant of the encryption details and instead just assign and read from a single class member. Let’s take a peek at an example.

class Review(Base):
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    performance_review_encrypted: Mapped[dict] = mapped_column(JSON)
    reviewee: Mapped[str] = mapped_column(String)

The data stored in performance_review_encrypted will be automatically encrypted/decrypted when accessed. Under the hood, it’s saving a lot more data than just the string we’re going to assign it.

review = Review()
review.performance_review = "Some text goes here."
session.add(review)
session.commit()

In the above example, the performance_review attribute will encrypt the data being passed in, and store it in performance_review_encrypted, along with the DEK.

Conversely, the performance_review attribute will also handle decrypting the data when it is fetched.

review = session.query(Review).first()
print(review.performance_review)

The question is – how does all that magic happen? Descriptor classes to the rescue!

Building the Encryption Force Field: EncryptedField Descriptor

This is where the real fun begins! We’ll create a custom descriptor called EncryptedField. This little guy handles the dirty work of encryption and decryption behind the scenes. It takes care of generating unique DEKs, encrypting data with the DEK, and then encrypting the DEK itself with the KEK for maximum security. When you access the encrypted field, the descriptor decrypts everything and returns the original data – all without you lifting a finger.

class EncryptedField:
    """
    A descriptor that creates a new field on the database class which users can interface with, ignoring the details of encryption and decryption.

    Encryption Design:
    - Each field will be encrypted with a unique Data Encryption Key (DEK) that is generated each time the field is set.
    - The DEK and value are both encrypted with the Key Encryption Key (KEK), which is stored in the environment.
    """
    def __init__(self, name: str, kek: str, transfomer: BaseTransformer | None):
        self.name = name
        self.kek = kek
        self.dict_transformer = transfomer
        self.encrypted_name = f"{name}_encrypted"

    def __get__(self, obj, objtype=None):
        """Return the decrypted value of the field."""
        if obj is None:
            return self
        encrypted_value = getattr(obj, self.encrypted_name)
        if encrypted_value:
            if self.dict_transformer:
                dek = self.dict_transformer().deserialize(encrypted_value)['dek']
            else:
                dek = encrypted_value['dek']

            decoded_dek = Fernet(self.kek).decrypt(dek.encode()).decode()

            if self.dict_transformer:
                values = self.dict_transformer().deserialize(encrypted_value)
            else:
                values = encrypted_value

            return Fernet(decoded_dek).decrypt(values['value'].encode()).decode()

        return None

    def __set__(self, obj, value):
        """Encrypt the value and store it in the database."""
        # Generate a new Data Encryption Key (DEK)
        dek = Fernet.generate_key()

        # The data that will be set on the object
        encrypted_value = {
            'value': Fernet(dek).encrypt(value.encode()).decode(),
            'dek': Fernet(self.kek).encrypt(dek).decode()
        }

        # If there is a need to convert the dictionary into something else, call the serializer
        if self.dict_transformer:
            encrypted_value = self.dict_transformer().serialize(encrypted_value)

        # Store the new encrypted value as well as the encrypted DEK
        setattr(obj, self.encrypted_name, encrypted_value)

Ooof. That’s a lot of code. Let’s get funky and….break it down.

A normal descriptor would have a __set_name__ magic method defined. We’re skipping that in favor of a constructor because the descriptor will be dynamically applied to fields on the class by our class decorator – which we’ll be getting to shortly.

Since the descriptor is responsible for performing envelope encryption/decryption on a field, a Key Encrypting Key (KEK) needs to be passed in. Remember, the KEK is what will be used to encrypt the DEK, prior to it being persisted to the database.

__set__()

This method is invoked when the instantiated descriptor has a value assigned to it. Our implementation does the following:

  • Generate a new DEK
  • Encrypt the value with the DEK
  • Encrypt the DEK with the KEK
  • Store the encrypted value and DEK, as a dictionary, on the original field (the one which includes a postfix of “_encrypted“)

__get__()

This method will read the dictionary stored on the original field and perform the following:

  • Decrypt the DEK with the KEK
  • Decrypt the encrypted value with the DEK
  • Return the decoded value

Now…we just need to figure out how to apply the EncryptedField decorator to any field on the SQLAlchemy model that ends in _encrypted.

Class Decorator

We need a way to automatically apply the Descriptor class to fields who’s name ends in _encrypted. Class decorators to the rescue! For the purposes of this project, a class decorator capable of accepting some parameters and iterating over fields within the class will do the job.

def _encrypt_fields(cls, kek: str, dict_transformer: Any | None = None):
   
    fields = {}
    for name, attr in cls.__dict__.items():
        if name.endswith('_encrypted'):
            base_name = name[:-10]  # Remove '_encrypted' suffix
            fields[base_name] = EncryptedField(base_name, kek, dict_transformer)

    # NOTE: We're running this loop again to avoid modifying the dictionary while iterating over it
    for base_name, field in fields.items():
        setattr(cls, base_name, field)

    return cls

def encrypt_fields(kek: str, transformer: BaseTransformer | None = None):
    """
    A decorator that provides encryption for fields that end in "_encrypted".
    Usage:
    Apply @encrypt_fields(kek) to a class to enable encryption for fields that end in "_encrypted".
    Each ***_encrypted field MUST be a JSONB field (or something capable of accpting a Python dictionary type).
    When using the class, instead of
    """
    def encrypted_fields_inner(cls):
        return _encrypt_fields(cls, kek=kek, dict_transformer=transformer)

    return encrypted_fields_inner

The encrypt_fields() function is the actual class decorator. It accepts a couple of parameters and then invokes a wrapped function (_encrypt_fields) which handles passing around the class variable. The secret sauce of binding an instance of

Here’s an example of the decorator in action. The secret_encrypted field is the only one that will be automatically encrypted/decrypted when accessed using the secret field.

@encrypt_fields(kek=MY_SECRET_KEY)
class MyModel(Base):
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    secret_encrypted: Mapped[dict] = mapped_column(JSON)
    likes: Mapped[int] = mapped_column(Integer)
    login: Mapped[str] = mapped_column(Text)

Conclusion

By combining Envelope Encryption with SQLAlchemy and the power of Python descriptors and decorators, you can create a secure and user-friendly data storage solution. Now go forth and encrypt your data like a boss!

P.S. Don’t forget to check out the project repository to give this technique a try.

P.P.S. Have a question or some feedback about the technique used in this post? Download DevHuddle and chat with us about it – we’d love your feedback!

more insights

Slackitecture

The Bane of Modern Architecture Let’s face it, developers love to chat. We love to discuss the intricacies of our code, the latest tech trends,

Read more >
XKCD comic about overengineering

Overengineering Engineers

How we overengineered DevHuddle and paid the (literal) price. Engineers engineer. That’s what we do! Sometimes (a lot of times) we are guilty of overengineering.

Read more >

Level up your chats