When I began writing code for WordPress projects, I had no idea how to properly internationalize[ref]Internationalization (sometimes shortened to I18N) is the practice of marking strings in one language as translatable to other languages and helping to enable the translation.[/ref] my content, There were a few blog posts at the time that accused developers like me of doing a disservice to our customers.
I took offense at the time, but they had a point.
Internationalization is extraordinarily important for open, distributed systems. It’s a valuable feature of WordPress that helps make the software accessible to millions of non-English-speaking writers around the world. Unfortunately, it’s also a mechanism that’s poorly understood and has been confused with some security features of the software.
Rather than call anyone out for doing something wrong, let me instead explain a common mistake that I have made in the past so you can avoid it in the future.
Security
Anyone working with large computer systems today will explain to you the importance of working with safe versus unsafe data. The data returned by an atomic, deterministic function under your control is safe. Data input by a user or retrieved from a remote data source is unsafe.
Unsafe data must be sanitized before it can be used.
When we’re building WordPress websites, we’re required to take a somewhat pessimistic view of our own data sources. While we make every effort to ensure no nefarious data is ever stored in the database, the fact remains that it’s still not entirely within our control. As a result, we must also escape this data upon output to the web browser.
In WordPress, we use a set of specialized functions to do this:
- esc_html() for arbitraty data that may contain HTML
- esc_attr() for HTML attributes
- esc_url() for URLs
- esc_textarea() for arbitrary data targeted for entry into a textbox
The downside is that some of these functions look deceptively similar to some other, internationalization-specific, ones.
Internationalization
While there are far more than just two, the two most common translation functions you’ll see in a plugin, theme, or in WordPress itself are __() and _e(). The first is meant to return a translated string while the latter is meant to echo the translated string to the browser.
The hitch is: the list of translated strings available is and should always be considered an untrusted data source. A third party could embed rogue HTML or JavaScript into the translation file, so any translated strings should also be escaped before being otherwise used in markup.
Luckily, WordPress exposes a set of functions to do just that:
- esc_html_e()
- esc_html__()
- esc_attr_e()
- esc_attr__()
Considering what we know about the escaping functions above and the new translation functions just mentioned, it should be fairly obvious what these are and how they’re meant to be used.
The Problem
`esc_attr_e()` is not a shortcut for `echo esc_attr()` and we shouldn't use it as such.
— george stephanis toots on mastodon (@daljo628) March 29, 2016
Unfortunately, it’s fairly easy to assume that things like esc_html_e() are aliases for things like echo esc_html().
They’re not and should not ever be used as such.
In reality, esc_html_e() is an alias for echo esc_html( __() ). If used to escape and echo arbitrary (not translated) text, it will still pass the content through WordPress’ translation mechanisms. It isn’t quite apparent from the seemingly short function names for these internationalization functions, but translation mechanisms are among the most expensive operations that run in WordPress.
In other words: using a translation function on content not meant to be translated is a massive performance sink for your site.
If you’re using these functions incorrectly, don’t be discouraged. It means you were on the right track and are far ahead of many other developers; you just need to take one more, minor, jump to use the functions correctly.