galdecoa.com

Using Unicode property names

📅

In the final section of the previous text, I described a very crude way to shorten CSS custom property names. Using grep, a bash script lists all the names it can find on the file. Then, it iterates over that name list and replaces each entry with a shorter name. This shorter name is fundamentally the hexadecimal value of the position of the name within the list.

That got me thinking on the possibility of making those variable names smaller. After all, what I do is substitute a string that starts with “--” with another string that has the same prefix. Since the substitution is already based on a numeric value, it would be feasible to convert it into a value in the range of Unicode characters and produce something like:

:root {
	--☀: #eee;
	--☽: #333;
}

I would not recommed it.

Why not

The purpose is not getting shorter names, it is getting smaller file sizes after minification. Assuming the use of UTF-8 encoding, Unicode characters will require at least three bytes. For example, running in a shell:

printf| hexdump

will output:

0000000 98e2 00bd                              
0000003

This means that, to produce that Unicode character from its hex values, we have to use three bytes:

printf '\xE2\x98\xBD'

Notice that E2 and 98 were swapped in the hexdump output, meaning it was using little-endian byte ordering for each 16-bit value.

So, using these characters will produce pretty/cryptic variable names but bigger file sizes.

What to do instead

The original idea converts the decimal number assigned to a name to hexadecimal for two reasons: the representation is generally shorter, and it is easy to do using printf '%x' $NUMBER.

The digits in base 16 are characters that require only one byte to represent. A bigger set of one byte symbols can be used to represent any number in a higher base. Base conversion is rather simple, and perhaps there is a smart way to do it, but I decided to code a bash function for that purpose:

#!/bin/bash

# Convert a decimal integer to another base.
#
# The first argument should be an positive integer, or zero.
# The second argument is optional. If provided, it will be treated as a string
# containing the symbols of the base to convert the decimal to. If not, it will
# assume base 16.
#
# Echoes a string representing the given number, using the specified digits.
decimal_to_base() {

	local DECIMAL=$1;

	local SYMBOLS="0123456789abcdef";
	if [ ! -z $2 ]; then
		SYMBOLS="$2";
	fi
	local RADIX=${#SYMBOLS};

	local DIGITS=();

	local NUMBER=$DECIMAL;
	local QUOTIENT=$(($NUMBER/$RADIX));
	local REMAINDER=$(($NUMBER-$QUOTIENT*$RADIX));
	DIGITS+=(${SYMBOLS:$REMAINDER:1});

	while [ $QUOTIENT -gt 0 ]; do
		NUMBER=$QUOTIENT;
		QUOTIENT=$(($NUMBER/$RADIX));
		REMAINDER=$(($NUMBER-$QUOTIENT*$RADIX));
		DIGITS+=(${SYMBOLS:$REMAINDER:1});
	done;

	local RESULT="";
	local DIGITS_MAX_IDX=$((${#DIGITS[@]}-1));
	for IDX in `seq 0 $DIGITS_MAX_IDX`; do
		RESULT=${DIGITS[$IDX]}$RESULT;
	done;

	echo $RESULT;

}

CSS custom properties are case sensitive and allow certain non-alphanumeric characters. Taking that into account, the previous function can be used to obtain a generally shorter representation of a number:

NUMBER=1234;
SYMBOLS="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-";
REPRESENTATION=$(decimal_to_base $NUMBER $SYMBOLS);
echo $REPRESENTATION; # Echoes "tL"

Caveats

Some minifiers expect exclusively alphabetic strings, or make other assumptions regarding their format. For example, cleancss has trouble processing property names that do not start with letters. In these cases, or erring in the side of caution, use only lower and upper case letters as the symbols’ string.