admin管理员组

文章数量:1307444

I am running into a very interesting dilemma while learning Huffman compression. I am currently studying in a course that uses zyBooks and the examples are absolutely terrible in my opinion. In 2 examples of 2 separate steps of building the Huffman tree there is an issue regarding which order the characters should be in when the dictionary contains many characters which have a frequency of 1. In 1.8.10, using the string "APPLES AND BANANAS", the characters that share a frequency of 1 are ordered in "order of appearance", yet in 1.8.13, the characters are in reverse order of appearance. Can anyone help me understand which would be the correct method as, due to this obfuscation, I am getting questions wrong at the end of the section.

I have attempted the questions at the end of the section to build the tree on paper (notice the book is not written language-specific). The book uses very short strings so its "easy" enough to do, and I want to understand the logic before I let the computer do the work for me using the code.

I am running into a very interesting dilemma while learning Huffman compression. I am currently studying in a course that uses zyBooks and the examples are absolutely terrible in my opinion. In 2 examples of 2 separate steps of building the Huffman tree there is an issue regarding which order the characters should be in when the dictionary contains many characters which have a frequency of 1. In 1.8.10, using the string "APPLES AND BANANAS", the characters that share a frequency of 1 are ordered in "order of appearance", yet in 1.8.13, the characters are in reverse order of appearance. Can anyone help me understand which would be the correct method as, due to this obfuscation, I am getting questions wrong at the end of the section.

I have attempted the questions at the end of the section to build the tree on paper (notice the book is not written language-specific). The book uses very short strings so its "easy" enough to do, and I want to understand the logic before I let the computer do the work for me using the code.

Share Improve this question asked Feb 2 at 16:14 TAW_WarDriverTAW_WarDriver 111 silver badge2 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 3

Either choice results in an optimal code. There is no "correct" choice when there are two or more tied frequencies, or when there is more than one second-lowest frequency choice (e.g. 1 2 2). Sometimes with different choices you can get codes with topologically different trees. Those are all still valid and equally optimal. This answer shows many different trees from the same set of frequencies. There is also no "correct" choice in assigning 0's and 1's to each left or right branch. Either way for each branch is fine.

As for the questions, it appears that in both cases the trees are provided. You are not being asked to recreate them. Though the second question implicitly assumes that 0 is assigned to the left branches and 1 to the right branches.

Huffman coding uses a greedy approach and takes the smallest frequency elements to merge at every step. Also, it's an application of OMP (Optimal Merge Patterns). You can check it out if you want. The order of merging doesn't really matter. What matters is you merge the smallest frequency items at each step. Here is the first example

Different merge orders of 1.8.9

So, the set of frequencies is {A:3, B:1, S:1, N:2}. Here are the steps to build the tree:

Step1: Choose the nodes with the smallest frequencies (B:1 and S:1) and merge them. Now, if there were more than 2 minimum frequencies, you can combine either of them (if 3 frequencies of '1', then 3 choices of merging). Ofcourse the codes would be different for different merge orders. You can include some restrictions to limit the merge orders.

Step2 Remove the merged nodes from the set and add the new node. Then, repeat these steps until all the characters are merged. In the example, there are 2 merge orders with different codes.

After building the tree, label the left edges and right edges as 0 and 1 respectively. The path from the root to the respective characters will tell you the huffman code. Again, the order of 0 and 1 can be different for left and right subtree edges.

In example 1, codes will be {A->0, B->110, S->111, N->10}. In example 2, codes will be {A->1, B->000, S->001, N->01}

Both codes are correct.

After labelling the left and right subtree edges as '0' and '1' respectively in the example 1.8.10, the code of A would be 11 and so on. Say, if 11 wasn't in the options, then we'd assume that the right subtree's edge is assigned 0 and left edge as 1. Then the answer would have been 00

Hope this helps...

本文标签: algorithmOrder of Huffman List when there are many characters with the same frequencyStack Overflow