Lempel–Ziv–Welch
- Input is a set of 8-bit codewords, Output is a string of 7-bit ASCII characters
- Maintains a symbol table mapping input characters/strings (keys) to fixed size output codewords (values)
Symbol table
- Symbol table contains 256 entries
- Initialize with the 128 single character entries
- Reserve codeword 80 for End Of File (EOF)
- Using hexadecimal notation:
- Codewords 00-7F (0111 1111) represent the 128 single characters
- Codeword 80 (1000 0000) is EOF
- Codewords 81-FF (1000 0001 – 1111 1111) represent strings of characters
Algorithm
Add an entry associating the next unused codeword (value) with the key formed by appending the lookahead character to the key
- Find the Longest prefix matching s in the symbol table that is a prefix of the unscanned input
- Write the 8-bit codeword associated with s
- Scan one character past s in the input
- Associate the next codeword value with s+c (c appended to s) in the symbol table, where c is the next character in the input.
LZW decoding
Only requires initial dictionary without the whole symbol table
- Write the current string val
- Read the next codeword x from the input
- Set s to the value associated with x in the symbol table
- Associate the next unassigned codeword to the value val+c in the symbol table, where c is the first character of s
- Set the current string val to s
- Loop until EOF