Lempel–Ziv–Welch
- Input is a set of 8-bit codewords, Output is a string of 7-bit ASCII characters
- Maintains a symbol table mapping input characters/strings (keys) to fixed size output codewords (values)
Symbol table
- Symbol table contains 256 entries
- Initialize with the 128 single character entries
- Reserve codeword 80 for End Of File (EOF)
- Using hexadecimal notation:
- Codewords 00-7F (0111 1111) represent the 128 single characters
- Codeword 80 (1000 0000) is EOF
- Codewords 81-FF (1000 0001 – 1111 1111) represent strings of characters
Algorithm
Add an entry associating the next unused codeword (value) with the key formed by appending the lookahead character to the key
- Find the Longest prefix matching s in the symbol table that is a prefix of the unscanned input
- Write the 8-bit codeword associated with s
- Scan one character past s in the input
- Associate the next codeword value with s+c (c appended to s) in the symbol table, where c is the next character in the input.
LZW decoding
Only requires initial dictionary without the whole symbol table
- Write the current string val
- Read the next codeword x from the input
- Set s to the value associated with x in the symbol table
- Associate the next unassigned codeword to the value val+c in the symbol table, where c is the first character of s
- Set the current string val to s
- Loop until EOF
Lempel–Ziv–Welch
Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations.[1] It is the algorithm of the Unix file compression utility compress and is used in the GIF image format.
https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch

Seonglae Cho