Self-obfuscating value objects

Building a quick illustration of a concept can be easy and is how most MVP (minimum viable product) projects begin. Unfortunately they often take shortcuts and sacrifice either security or other best practices in design for the sake of expediency. This can lead to exploitable vulnerabilities in production, leaked customer data, and potential litigation against your business or your team. Understanding best practices around software design will help your team build the most efficient and stable product possible – understanding how to apply those same design principles for security will help make your product safe and secure as well.

Value objects

An easy illustration of this concept is the “value object” design pattern. This pattern establishes an object that effectively wraps an otherwise primitive value for better use with higher-level concepts like equality versus equivalency. We used a value object to wrap raw bytes for our initial Cryptopals work, but they can be much more powerful.

For example, consider email addresses encoded as strings in a PHP application. If your email address is developer@myawsomesite.com you can encode this directly as a string literal in PHP. Further uses of this string – i.e. string comparison – are easy with simple equality operators.

$email1 = 'developer@myawsomesite.com';
$email2 = 'developer@myawsomesite.com';

echo $email1 === $email2; // True
echo $email1 == $email2;  // True

These two strings are obviously the same. They’re identical. But the variables containing them are different even if the values contained are equivalent. This subtle nuance (value equivalency versus object equivalency) can cause problems, particularly if variables are passed by reference to any other functions in your codebase. Value objects are a way to illustrate the same level of equivalency while making the values contained within the objects immutable.

A simple value object for representing email would look something like:

class Email
{
  protected ?string $value;

  public function __construct(?string $value)
  {
    $this->value = $value;
  }

  public function getValue(): ?string
  {
    return $this->value;
  }

  public function equals(Email $other): bool
  {
    return $this->value === $other->getValue();
  }
}

This class then wraps our simple string primitives and drives the point home that, while the values within the objects are the same, the objects themselves are distinct. Further, the objects are immutable so you can’t accidentally overwrite the value of the object.

$email1 = new Email('developer@myawsomesite.com');
$email2 = new Email('developer@myawsomesite.com');

echo $email1->equals($email2); // True
echo $email1 == $email2;       // True
echo $email1 === $email2;      // False

The value object pattern helps distinguish between value equivalency and object equivalency and leads to more stable applications when used properly. The Email primitive above could be paired with things like a Phone primitive to build a more complex Contact object as well. The objects’ equivalency checks that empower you to quickly verify if one set of contact information is the same as another without diving into manual value comparisons.

PII Wrapping

An added benefit of value objects, particularly in the world of PHP, is in protecting personally identifying information (PII) from being inadvertently exposed by your application. Since we’re working now with higher-level objects than raw strings we can embed additional functionality to prevent these objects from leaking their values in unexpected ways.

Regardless of how useful XDebug and similar tools are for interactive debugging, many developers – myself included – resort to quick hacks like var_dump() or print_r() to quickly diagnose an issue in their application. This kind of laziness isn’t necessarily bad, but it can lead to serious issues if those debugging lines are accidentally committed to the codebase.

I once worked with a client who had built a dynamic pipeline to synchronize user accounts from a WordPress-powered system to a third party platform. In the interest of user simplicity, they leveraged the platform’s API to set user passwords whenever a user was created in WordPress so users could log in to either platform with the same password. This was a bad approach in general made even worse by the fact a developer had left a var_dump() line in the code that was printing all user information – including plaintext passwords – to the system log.

Consider our Email example above. There are various ways this value object could leak the potentially sensitive PII it contains:

$email = new Email('developer@myawsomesite.com');

echo serialize($email); // O:5:"Email":1:{s:8:"*value";s:27:"developer@myawesomesite.com";}

var_dump($email);
// object(EAMann\Email)#3 (1) {
//   ["value":protected]=>
//   string(27) "developer@myawesomesite.com"
// }

print_r($email);
// Email Object
// (
//     [value:protected] => developer@myawesomesite.com
// )

We want to be sure our email address is never leaked by mistake, so we have to proactively sanitize any attempts at serialization or debugging. Luckily, PHP allows us to override both with various magic methods on our object and specific interfaces we can implement directly. Consider the following abstract class:

abstract class PII implements \Serializable, \JsonSerializable
{
    protected ?string $value;

    protected bool $valid = true;

    protected function redacted(): string
    {
        return str_repeat('*', strlen($this-value));
    }

    public function isValid(): bool
    {
        return $this->valid;
    }

    public function serialize(): string
    {
        return $this->redacted();
    }

    public function unserialize($serialized)
    {
        $this->value = null;
        $this->valid = false;
    }

    public function jsonSerialize()
    {
        return $this->redacted();
    }

    public function __toString()
    {
        return $this->redacted();
    }

    public function __debugInfo()
    {
        return [
            'valid' => $this->valid,
            'value' => $this->redacted()
        ];
    }
}

The ::serialize() and ::unserialize() methods will be exposed by PHP thanks to the Serializable interface to the globally scoped serialize() and unserialize(), respectively. In this case, we want to redact any values when we serialize them out to prevent exposure. Unserializing from a raw string will no longer work in this case as the value was redacted, so we want to internally flag the object as being “invalid” to keep track of that fact.

Similarly, the JsonSerializable::jsonSerialize() method allows us to configure how this object should be serialized to JSON format, something our raw Email object was incapable of before. Again, to prevent leaking anything sensitive we’ll redact the value contained.

Finally, we want to explicitly tell PHP what to do if anyone attempts to use our object as a string (echo new Email('a@b.com')) or use it in a debugging line or stack trace by implementing both __toString() and __debugInfo().

Updating our Email class to protect its embedded PII is then as simple as leveraging the PII class as a base:

class Email extends PII {
  // ...
}

Ideally, we could use a trait instead of an abstract base class to provide this PII-protecting functionality to our Email object, unfortunately traits cannot (yet) implement interfaces. Using a trait would require both a use declaration to load the implementation and additional implements declarations on our composite class. The same is true for an interface, which cannot implement or inherit from other interfaces, thus complicating our object composition. There are definitely other options for constructing a PIIEmail class here, but extending an abstract class was the cleanest/simplest way to do so for the purposes of this walkthrough.

Once updated, we can use the Email class exactly as we did before, but the output is now fully obfuscated and we avoid accidentally disclosing any of this protected PII:

$email = new Email('developer@myawsomesite.com');

echo serialize($email); // C:5:"Email":27:***************************}

var_dump($email);
// object(EAMann\Email)#3 (2) {
//   ["valid"]=>
//   bool(true)
//   ["value":protected]=>
//   string(27) "***************************"
// }

print_r($email);
// Email Object
// (
//     [valid] => 1
//     [value:protected] => ***************************
// )

We could optionally override the implementation of PII::redacted() to expose some of our information for use with deduplicating a dataset. Given an email example, this might take the form of exposing the hostname for an email address but obfuscating the username to protect privacy:

class email extends PII
{
  // ...

  protected function redacted(): string
  {
    if ($this->isValid()) {
        $parts = explode('@', $this->value);
        if (count($parts) == 2) {
            return substr($parts[0], 0, 1) . str_repeat('*', strlen($parts[0]) - 1) . '@' . $parts[1];
        }
    }

    return str_repeat('*', strlen($this->value));
  }

  // ...
}

Expansion and Next Steps

The Email example above is a simple illustration, but it has broad uses for various programs. Consider a Contact object that marries user identification (name and ID) with contact information (email address and phone number) that needs PII protection. Such an object could be relatively simple, encoding “name”, “ID”, “email”, and “phone” as strings. Or we could bake in protection by encoding email and phone as value objects with obfuscation directly:

class Contact
{
  public string $id;
  public string $name;
  public Email $email;
  public Phone $phone;

  public function __construct(string $name, string $email, string $phone)
  {
    $this->id = mt_rand();
    $this->name = $name;
    $this->email = new Email($email);
    $this->phone = new Phone($phone);
  }
}

If both the Email and Phone classes extend our PII class above, then even this simple object affords our customer information a modicum of security:

$contact = new Contact('Eric Mann', 'eric@phparch.com', '555-555-5555');

var_dump($contact);
// object(Contact)#3 (4) {
//   ["id"]=>
//   string(9) "547093787"
//   ["name"]=>
//   string(9) "Eric Mann"
//   ["email"]=>
//   object(Email)#2 (2) {
//     ["valid"]=>
//     bool(true)
//     ["value"]=>
//     string(16) "****************"
//   }
//   ["phone"]=>
//   object(Phone)#4 (2) {
//     ["valid"]=>
//     bool(true)
//     ["value"]=>
//     string(12) "************"
//   }
// }

The approach of using self-sanitizing value objects is not a foolproof way to protect your customers’ PII, but it is a solid approach to avoiding accidental disclosure of otherwise sensitive data due to user or developer error. As always, ensure your team is practicing defense-in-depth and utilizing multiple tools and approaches to protect your users. Proper design patterns as applied to security are just another tool at your disposal to build stable and truly secure software products.