Skip to main content
留学咨询

辅导案例-CSE 100 PA3

By May 15, 2020No Comments

CSE 100 PA3: Huffman Encoding and Compression Checkpoint Deadline: ​Wednesday, Nov. 6th, 11:59 pm Final Deadline:​ Wednesday, Nov. 13th, 11:59 pm Overview ● In this project, you will be asked to complete two major tasks. The first is to implement a Huffman Coding Tree and use it to perform “compression” without using bitwise i/o. The second part is to implement bitwise i/o to allow true compression with efficient header. ​You should ​read these instructions carefully ​before you start writing code or post questions on piazza. ● This is a Pair Programming project. You may ask Professors/TAs/Tutors for some guidance and help, but you can not copy code from anywhere including online sources such as Github. You may also discuss the assignment conceptually with your classmates, including bugs that you ran into and how you fixed them. However, do not look at or copy code, as this constitutes an Academic Integrity Violation. And you may not use other libraries’ implementations in your solutions. ● For this assignment and the remainder of the course, you are expected to use Git to keep track of your code and Github to store your code. Make a ​private repository on Github, commit and push your code regularly as you go through the assignment. ● There is a ​FAQ page for Docker and devcontainer setup issues please check this as well as existing Piazza posts before making your own post. Thank you! ● Start early, submit early and often! Pair Programming Partner Instructions Academic Integrity and Honest Implementations Retrieving Starter Code Submitting Your PA Style Requirement Part 1: Pseudo Compression Part 2: True Compression with Bitwise i/o and Header Design Part 3: Worksheet Testing Suggestion Extra Credit: Block Encoding Getting Help Pair Programming Partner Instructions If you wish to work with a partner, please be sure to read the guidelines for Pair Programming in the syllabus carefully. Make sure to have both of the partners’ names on the headers of all files. Most importantly, ​make sure to submit your homework on Gradescope using only ONE of the accounts and do not forget to add the other teammate’s account into the submission. https://sites.google.com/eng.ucsd.edu/100/programming-resources/pair-programming-guideli nes Academic Integrity and Honest Implementations We will hand inspect all submissions and will use automated tools to look for plagiarism or deception. ​Attempting to solve a problem by other than your own means will be treated as an Academic Integrity Violation. This includes all the issues discussed in the Academic Integrity Agreement, but in addition, it covers deceptive implementations. For example, if you use a library (create a library object and just reroute calls through that library object) rather than writing your own code, that’s seriously not okay and will be treated as dishonest work. Retrieving Starter Code To retrieve the starter code, go to this page: https://github.com/CaoAssignments/cse100-psa3-huffman-fa19-starter Copy the git url of your repo from here after clicking “Clone or download” (make sure the URL you copied starts with https): Then on your own machine or on ieng6 you can clone the repo like this: git clone https://github.com/CaoAssignments/cse100-psa3-huffman-fa19-starter.git Now go back to GitHub and create your own ​private​ ​repository. Then cd into the PA3 starter code repository you cloned earlier and type ​“git remote -v”​. Now we’re going to remove the remote called origin, so we can then add your own github repository as the new remote called origin: run ​“git remote remove origin” If you now run ​“git remote -v” you should no longer see the remote called origin like you saw before. Now run the commands listed on your newly created git repo. (If your terminal did not ask you to log in upon executing this git clone command, double check that the repo is set to private.) You should now see the PA3 starter code on your private GitHub repository. If you are working with a partner, add them as a collaborator on your private repository: First click the settings tab on your GitHub repository. Then click Collaborators on the left and search for your partner’s github account. Among the starter code files, the following files are relevant to implementation and testing. ⚠ NOTE: the starter code does not contain any meson.build files except for the ​cxxopts in subprojects​. You have to implement these files yourself for this PA. ├── COMMANDS_CHEATSHEET.md ├── README.md ├── create_submission_zip.sh ├── data │ ├── check1.txt │ ├── check2.txt │ ├── check3.txt │ └── warandpeace.txt ├── extra-credit-compress.executable ├── extra-credit-uncompress.executable ├── solution-compress.executable ├── solution-uncompress.executable ├── src │ ├── FileUtils.hpp │ ├── bitStream │ │ ├── input │ │ │ ├── BitInputStream.cpp │ │ │ └── BitInputStream.hpp │ │ └── output │ │ ├── BitOutputStream.cpp │ │ └── BitOutputStream.hpp │ ├── compress.cpp │ ├── encoder │ │ ├── HCNode.hpp │ │ ├── HCTree.cpp │ │ └── HCTree.hpp │ └── uncompress.cpp ├── subprojects │ ├── cxxopts │ │ └── cxxopts.hpp │ └── gtest.wrap └── test ├── test_BitOutputStream.cpp └── test_HCTree.cpp ● To complete the assignment, you will modify the files in the starter code to provide full definitions for methods marked with // TODO. To see all of the TODO tags, you can run the command ​grep -r TODO .​ in your PA3 directory. We personally recommend using grep like this as you can manipulate the arguments after “A” and “B” to get a good snapshot of what is required: grep -r -A 5 -B 5 TODO * (To search in VSCode, you can also use Ctrl+Shift+F or Cmd+Shift+F on mac) ● You can find all the relevant information about compiling your code and running tests in the ​README.md​ file of the starter code. Submitting Your PA You will be submitting your code using Gradescope through the Github submission option. You should have gotten enrolled in Gradescope by now (if not, please add your name to the appropriate piazza post ). Instructions to submit your PA on Gradescope: 1. Be sure to test your code on ieng6 or in the devcontainer. We will be grading your code on the same environment as ieng6 and the devcontainer. There may be issues with compilers/etc if you only tested your code on your personal machine outside of a devcontainer using self-installed dependencies. 2. Be sure to push the final version of your code to your ​private Github repository. (That will be the code you will submit if you are submitting using Github) 3. Go to gradescope and find PA2 checkpoint/final submission. You may use either Upload or Github to submit. Upload​: In the root directory of your PA3, run ​./create_submission_zip.sh​. ​The script will create a zip file named ​submission.zip​, then simply drag the zip file to gradescope to finish the submission Github​: You will be asked to authorize your github account if you haven’t done so before. After author
izing your account, choose the repo you pushed your final version of PA1 code to and the correct branch. 4. Check the feedback from the autograder and make sure no compile errors appeared. 5. If you are submitting with a partner, make sure that the student who submits adds the other student to the assignment. 6. You can submit as many times as you like before the deadline: only your last submission will be counted. Checkpoint Submission ​(Due Wednesday, Nov. 6th, 11:59 pm)​: When you have completed all of the requirements for the checkpoint submission (Part 1: Pseudo Compression). You should go to gradescope and find PA3 checkpoint submission. ​Only the files in your src and test folders will be graded. Make sure your code can compile (running ‘meson build’ and ‘ninja -C build’ in project directory gives no error) Final Submission ​(Due Wednesday, Nov. 13th, 11:59 pm) ​: When you have completed all of the requirements for the final submission (All the parts). You should go to gradescope and find PA3 final submission. Only the files in your src and test folders will be graded. Make sure your code can compile. (running ‘meson build’ and ‘ninja -C build’ in project directory gives no error) Style Requirement Please follow the style guidelines below on course website. https://sites.google.com/eng.ucsd.edu/100/style-guidelines/style-guidelines When grading the PAs, there won’t be points specifically allocated for style, but bad style will result in up to 10 points deduction from the points you get in the other parts. (Bad style includes magic numbers, missing method headers, bad indentation, and lacking inline comments etc.) You have an auto formatter available to you (check the Readme.md for details), use it! Part 1: Pseudo Compression (Checkpoint Submission) In the first part of this project, you will use Huffman Coding Tree to implement a pseudo compression without using bitwise i/o (your compressed file will contain chars ‘0’s and ‘1’s to represent the encoded bits). Your major tasks in this part includes: 1. Write your own ​meson.build​ files in this project. 2. Implement Huffman Coding Tree to use as the compression algorithm. 3. Implement a pseudo compress program that compresses the given file to a smaller file. 4. Implement a pseudo decompress program that decompresses the file generated by compress program to get the original file back. After finishing each step above, you should first test your program before moving to the next step. Note that in this assignment, the correctness of each part above heavily relies on the correctness of the other parts, so make sure you strictly follow test driven development in this assignment! 1. Write meson.build For this PA you need to write your own meson.build files. We have listed a summary of the most important concepts below. The full meson documentation can be found ​here​. You are also welcome to use the previous PAs as a resource. Be careful when copy pasting though, a copy paste error is easily made, but debugging it can be painful! project() The first line of your root level meson.build file should define your project. This defines the name, version, programming language, compiler options and more for your project. subdir() This defines there is a subdirectory part of this project which has its own meson.build file. subproject() This defines there is a third party project (e.g. GoogleTest) listed under the subprojects/ directory. run_target() This defines a custom run target that you can call using “ninja -C build your-custom-runtarget” include_directories() This lets you define a directory containing .hpp files. These .hpp files will be made discoverable by the ​build target that you provide the include_directories to. If you’re missing this the corresponding #include statements in your code cannot be resolved during compile time. declare_dependency() This lets you declare a set of files that you can reuse throughout the rest of your project as a named dependency. executable() This defines a set of source files that will be compiled into an executable. You can list dependencies. If you list the “install: true” option, the executable will be moved to a bin directory on your system when running “meson -C build install”. This makes the executable available in your terminal without specifying the full path to the executable. library() You need to use this in order to be able to declare a dependency that does not only include .hpp files, but also .cpp files. By default this will compile a so called “shared_library”. Which means it will compile the cpp files into a ​shard object library file (.so) which can be used as a shared resource by other libraries and executables. Your executable will be smaller than the provided reference solution executable because of this. We compiled the reference solution by telling meson to set the default library to “static_library”. This means the dependencies will be packaged into the executable. If we didn’t do that the reference solution executable would complain on your system that it cannot find the HCTree and bit stream libraries because it is not placed in the build dir. If it were placed in the build directory it would try to use your HCTree and bit stream libraries instead of ours. 2. Write Tests First Run the provided tests Make sure you finished setting up the meson build files in step 1 and that you can run the tests using “ninja -C build test” (it is expected that the tests fail at this point, but make sure you can build and run them). Write additional tests We only provided simple tests to help you get started writing the tests. The provided tests are by no means testing your code sufficiently. So your next step is to write more extensive unit tests. Read the function descriptions in part 3 below and write unit tests first before implementing any functions. Then based on your understanding of each function, you should first type out all the function headers in all files to reinforce the understanding of the requirements before you attempt to implement them. You should refer to the ​Testing Suggestion​ part for testing advice. Note that as the bit stream implementation is only for the final submission, it is expected that the provided tests for BitInputStream and BitOutputStream fail for now in checkpoint. You can either comment them out or remove those test files for your convenience. 3. Implementing HCTree Now implement the following methods from ​HCNode.hpp​, ​HCTree.hpp​ and​ ​HCTree.cpp Please first read and understand the purpose of each member variable at the top of the file. ​Do not modify any public method signature. ​You may modify the private helper methods or add any additional member variables if needed. Also, ​good C++ coding practice requires that you use hpp file as an interface and put all the implementations in the corresponding cpp file. If you code only in the hpp files, you may lose points during manual inspection! HCNode.hpp bool operator()(HCNode*& lhs, HCNode*& rhs) const Implement a comparator of HCNode pointer in struct HCNodePtrComp. The comparator should make the priority queue of HCNode* has the following behaviors: HCNode pointer points to a HCNode with lower count should have higher priority. If count is the same, then HC
Node pointer points to HCNode with larger ascii value symbol should have higher priority. HCTree.hpp, HCTree.cpp explicit HCTree() Initializes a new empty HCTree. void build(const vector& freqs) Build the HCTree from the given frequency vector. You can assume the vector must have size 256 and each value at index i represents the frequency of char with ASCII value i. To improve performance, only non-zero frequency symbols should be used to build the tree. The leaves vector must be updated so that it can be used in encode() to improve performance. void encode(byte symbol, ostream& out) const Write the encoding bits of given symbol to ostream. You should write each encoding bit as char of either ‘0’ or ‘1’ to the ostream. ​You must not perform a comprehensive search to find the encoding bits of the given symbol, and you should use the leaves vector instead to achieve efficient encoding. ​For this function to work, build() must be called before to create the HCTree. byte decode(istream& in) const Decode the sequence of bits (represented as char of either ‘0’ or ‘1’) from istream to return the coded symbol. For this function to work, build() must be called before to create the HCTree. ~HCTree() Make sure to avoid memory leaks! Note: The following 2 functions in HCTree are not required for checkpoint submission. You will implement them in the final submission: void encode(byte symbol, BitOutputStream& out) const byte decode(BitInputStream& in) const Also, we strongly encourage you to unit test your HCTree thoroughly before proceeding to the next part​: you will save a significant amount of time in the end if you can prevent having to debug your HCTree in the next step. You should refer to the ​Testing Suggestion​ part for testing advice. 4. Implementing Compression and Decompression Program Compression Now use your Huffman Coding Tree to implement a program that performs pseudo compression on a given file. The compress program should take two arguments: the name of the file to be compressed, and the name of the compressed file: ./build/src/compress.cpp.executable data/toCompress.txt compressed.txt First, your program should read the first file and build the Huffman Coding Tree based on the frequency of each char in that file. Then your program should open the compressed file (​you should overwrite it in case it already exists​). In the compressed file, write a header that can help the uncompress program to reconstruct the same tree. Your header in checkpoint should have 256 lines, with each line being the frequency of the ASCII symbol with value of the line number. Then, write the encoding bits (represented as chars of either ‘0’s or ‘1’s) of all the chars in the original file to the compressed file. You should implement this file from scratch so make sure you understand how to properly do i/o in C++. There are plenty of examples from previous PAs and ​online resources​ can be helpful as well. Any file checking should be done in main() and you may use the functions in FileUtils.hpp​. Also note that all your implementation except file checking should be in ​pseudoCompress() ​for checkpoint submission. Make sure your program follows good object-oriented design, which means in your compress program, all the compression parts should be done by just calling functions of HCTree. To test your compress program, start by running the program on a small input file, and see if the compressed file contains the expected content. You may draw out the expected Huffman Coding Tree and see if all the encoding bits (represented as chars) are correct. Note that because we are using ascii chars to represent bits, the compressed file is actually larger than the original, which is why we called it a pseudo compress program. Later in part 2, with the help of bitwise i/o, you may start implementing the true compression program. Decompression The decompress program should take two arguments: the name of the compressed file, and the name of the decompressed file: ./build/src/uncompress.cpp.executable compressed.txt decompressed.txt First, your program should read the first file and build the Huffman Coding Tree based on the header of the file. Then your program should open the decompressed file and write decoded chars of the compressed file to the output file, such that the output file contains the exact same content as the original file. Any file checking should be done in main() and you may use the functions in ​FileUtils.hpp​. Also note that all your implementation except file checking should be in ​pseudoDecompress() for checkpoint submission. You may run the diff command to see if your output file is the same as the original one: diff data/toCompress.txt decompressed.txt To compare with the reference solution run the reference solution like this: ./solution-compress.executable –ascii data/toCompress.txt compressed.txt Part 2: True Compression with Bitwise i/o and Header Design (Final Submission) Your major tasks in this part includes: 1. Implement bitwise i/o to allow true compression later 2. Implement encode and decode with bitwise i/o in HCTree 3. Adjust the compression and decompression program to perform true compression 4. Design header so that it takes as little space as possible in the compressed file 5. Adjust the compression and decompression program again to use the new header After finishing each step above, you should first test your program before moving to the next step. Note that in this assignment, the correctness of each part above heavily relies on the correctness of the other parts, so make sure you strictly follow test driven development! 1. Implementing BitInputStream and BitOutputStream The smallest i/o unit is byte in C++ by default, so to allow for reading and writing data bit by bit, we need to implement our own bitwise i/o in the following class. The general idea of achieving bitwise i/o is to first use a buffer to store the current byte being read or written from an istream or ostream, then perform the proper bitwise operations to read/write each bit in the buffer. BitInputStream.hpp, BitOutputStream.cpp BitInputStream(istream& is) Constructor of BitInputStream. Initializes member variable to be the proper values void fill() Fills the one byte buffer from input stream unsigned int readBit() Reads the next bit from the bit buffer. Fills the buffer with next byte from input stream if all the bits have already been read. It should return 0 if bit read is 0, and return 1 if bit read is 1. BitOutputStream.hpp, BitOutputStream.cpp BitOutputStream(ostream& os) Constructor of BitOutputStream. Initializes member variable to be the proper values void flush()e Sends the buffer to the output stream, and then clear the buffer to allow further use. void writeBit(int i) Writes the least significant bit of the given int to the bit buffer. Flushes the buffer first if it is full (which means all the bits in the buffer have already been written out). You may assume the given int is either 0 or 1. 2. Implementing Encode and Decode with Bitwise i/o Now that you have implemented bitwise i/o, proceed to implement the following functions in HCTree to allow encode and decode using bits: HCTree.hpp, HCTree.cpp void encode(byte symbol, BitOutputStream& out) const Write the encoding bits of given symbol to the given BitOutputStream. You should write each encoding bit to the BitOutputStream. ​You are not allowed to perform a comprehensive search to find the encoding bits of the given symbol, and you should use the leaves vector instead to achieve efficient encoding. ​For this function to work, build() must have been called before to create the HCTree. byte decode(BitInputStream& in) const Decode the sequence of bits from the given BitInputStream to return the coded symbol. For this function t
o work, build() must have been called before to create the HCTree. 3. Adjust Compression and Decompression Programs Now change the compress and uncompress program to allow true compression using bitwise i/o encode and decode that you just implemented. Before you proceed to implement an efficient header, it is important that you first verify the correctness of your compression programs using bitwise i/o. To prevent inefficient implementation, compressing and decompressing file smaller than 30MB should be done within a timeout of 180 seconds. 1. Adding a command line option In checkpoint submission, you have implemented pseudo compression using ASCII chars to represent bits. Now we should extend the functionalities without breaking the work we did for checkpoint. Therefore, the pseudo compression part in checkpoint should be hide behind a command line option “–ascii” (similar for uncompress): ./build/src/compress.cpp.executable –ascii data/toCompress.txt compressed.txt Now using the ascii option, your program should generate the compressed file using ASCII symbols ‘0’s and ‘1’s instead of actual bits. Whereas running without the ascii option will do the true compression using the bit stream you just implemented. We provided a library called ​cxxopts​ for you to do the command line option parsing. You can find it under ​subprojects/cxxopts​. To make it accessible to the rest of your project, make sure to add it to the root level meson.build file like so: # === src dependencies === cxxopts_proj​ =​ subproject​(​’cxxopts’​) cxxopts_dep​ = cxxopts_proj.​get_variable​(​’cxxopts_dep’​) # === end src dependencies === Now you should add “cxxopts_dep” as a dependency to your compress and uncompress executables in the corresponding meson.build file. Then add the following to the top of your main function of compress.cpp. You can modify the following to do option parsing in uncompress.cpp. cxxopts​::Options ​options​(​”./compress”​, ​”Compresses files using Huffman Encoding”​); options​.​positional_help​(​”./path_to_input_file ./path_to_output_file”​); bool​ isAsciiOutput = ​false​; string inFileName, outFileName; options​.​allow_unrecognised_options​().​add_options​()( ​”ascii”​, ​”Write output in ascii mode instead of bit stream”​, ​cxxopts​::​value​(isAsciiOutput))( ​”input”​, ​””​, ​cxxopts​::​value​(inFileName))( ​”output”​, ​””​, ​cxxopts​::​value​(outFileName))( ​”h,help”​, ​”Print help and exit”​); options​.​parse_positional​({​”input”​, ​”output”​}); auto​ userOptions = ​options​.​parse​(argc, argv); if​ (​userOptions​.​count​(​”help”​) || !​FileUtils​::​isValidFile​(inFileName) || ​outFileName​.​empty​()) { cout << ​options​.​help​({​""​}) << ​std​::endl; ​exit​(​0​); } Play around with it to understand what it does. For example, try running this: ./build/src/compress.cpp.executable --help You can find more information about cxxopts ​here​, examples are under src/example.cpp 2. Implement trueCompression and trueDecompress Now implement the trueCompression() in compress.cpp and trueDecompress() in uncompress.cpp by using bitwise i/o functions in your HCTree. Make sure you use the flag “isAsciiOutput” to determine whether to call pseudo or true compression/decompression. Note that your header should stay the same as in checkpoint submission for now, that means only the “encoding part” in the compressed file should be changed. Once you finished implement this part, follow the instructions in ​Testing Suggestion​ to test and run your compression and decompression programs. With the correct implementation, given the input file such as warandpeace.txt, your compression program should now be able to produce a compressed file that is smaller than the original file. 4. Efficient Header Design As you may notice, before you even start to encode any symbol, the header in the checkpoint solution had already taken at least 500 bytes of space (It could take more space when certain frequencies are large). And for file of size around 500 bytes, this is especially expensive because you might end up with a larger compressed file. Your last task in this assignment is to further compress the input file by implementing a space efficient header. In order to earn full credit for the header design, ​the compressed file produced by your program must be smaller than the one produced by the reference solution for most of the files​ (we will list the cases that you don’t need to beat the reference solution later). Partial credit will be given if your solution’s header is bigger than the reference solution’s header by no more than 10% (i.e. within 110%). The cutoff is 10%, which means no credit for header design will be given if the header size produced by your solution is more than 10% larger than the reference solution. The header size when compressing warandpeace.txt in data folder should be a relatively good indication. Few cases for which you don’t need to beat the reference solution include: 1. A file with only single kind of character repeated for some number of times, like check3.txt in data folder with all newline chars. 2. A file with small size and each symbol has low frequency. Although you are not required to beat the reference solution in this case, your header size should be really close to the reference solution with only a few bytes off in this case. When running the reference solution, it will print out the header size so you may use that information to compare it to your header size. ​To check the size of compressed file, you can simply type "​ls -l filename​". Note that because people have different ways to implement the header, the compressed files are just some mysterious bits for others, so we have to make a reasonable assumption when determining the size of your header: ​given the same input file, the length of the encoding bits of your solution and the reference solution should always be the same​, because Huffman Coding gives the optimal encoding length. Therefore, when grading your header size for the same input file, we simply use the following formula based on the assumption above: size of your header = size of your compressed file - size of the non-header part of reference compressed file Before you get started to implement the efficient header, remember to save your work from last part by making a git commit. Also make sure you have already tested your program comprehensively before beginning to implement the efficient header. Reference solution header that you need to beat Since you are required to beat the reference solution, it might be helpful to first know how does it implement the header in the first place. The idea of reference solution’s header is basically removing all redundant information. It does the following to produce the header of the compressed file: 1. Find ​the maximum frequency of all symbols​. Based on this number, determine the number of bits needed to store each frequency (For example, if the maximum frequency is 14, then you only need 4 bits to represent any frequency in the original file, since 4 bits can store number up to 2^4 - 1 = 15). Then let this number be ​numBits. 2. Find ​the number of non-zero frequency symbols​. 3. As one particular symbol can occur up to 2 billion times as stated in ​Testing Suggestion​, the maximum ​numBits​ is about 31, which requires 5 bits to store. Then we use fixed 5 bits to store numBits​ in the beginning of the header. 4. The maximum number of non-zero frequency symbols is 256, which requires 8 bits to store. Then we use fixed 8 bits to store it in the header. 5. In the remaining part of the header, ​for each non-zero frequency symbol, output 8 bits to represent the symbol, followed by ​numBits​ ​bits to represent its frequency​. For example, given a file with content “aaaaaabbbcccc”, the header should look like the following in bits with each char like ‘a’ representing 8 bits: 00011 00000011 a110 b011 c100 (After some calculations, it is easy to see that the header above takes 46 bits to store.) In decompress program, this would be interpreted as: ​numBits​ = 3, number of non-zero frequency char = 3. So we should read 3 blocks of char-freq, and in each block, first read 8 bits to get the symbol, then read ​numBits​ bits to get its frequency. Efficient Header Design Hint: All the headers we have seen are based on the assumption that we need to represent the frequency vector such that decompress program can use it to rebuild the tree. Using this idea, the reference solution is probably the best we can do. That means if you want to beat the reference solution, you should think about this problem from another perspective: instead of focusing on the frequency vector, if somehow we know the correct structure of the original HCTree and are able to reconstruct it, then decode and encode should also work even without knowing the frequency of each symbol, as the frequency of each symbol is only used in building HCTree. So that boils down to the following problem: Can we encode the structure of the HCTree in a sequence of bits? And based on this sequence of bits, can we reconstruct the original HCTree? Part 3: Worksheet  Please download and complete the worksheet in digital form (word or google doc) linked below. It will be submitted in pdf format as a separate assignment on gradescope. The worksheet is worth 5 points. ​Please submit the worksheet INDIVIDUALLY, the worksheet for PA3 is NOT a pair assignment. https://docs.google.com/document/d/1l1X0pIshjJaW_nGGfxee1hl-OtjRskQ4Ahs486PbkI4/edit? usp=sharing Testing Suggestion You should make good use of the provided tools​, such as debugger, GDB and cppcheck. Try out the different tools in Readme.md to see how they can help you to test the program. ​Remember to constantly back up your code to GitHub. 1. Unit tests and code coverage You will want to be sure to unit test your code at every step. You should write your own testers for every method, no matter how simple it is. Always start with simple cases in unit tests, as unit tests are extremely helpful for catching missed edge cases and any unintentional mistakes. Use larger test cases only after your program passed on small and simple test cases. You can model the setup for your test code off of the c++ test code in the BST project. In the provided tests, there is an example of how to test the output of ofstream, you might find it helpful when you are testing encode(). Also, to test HCTree, it might be a good idea to use TestFixture in your tests. You may refer to the provided tests in PA1 for an example. For checkpoint the line coverage for ​the files in the encoder directory needs to be ​at least 90% to get points for the coverage report. For final ​both encoder and bitStream need to have at least 90% ​coverage. 2. Cases of input file to check Note that our compression program along with decompress program should work regardless of the type or size of the given file. You can only assume that the given file to compress is no larger than 2GB. Other than the normal input files like warandpeace.txt, your program should also work in the following cases: 1. Empty File. 2. A file with only single kind of character repeated for some number of times, like check3.txt in data folder with all newline chars. 3. File containing symbols from extended ASCII table, with ASCII value larger than 127. 4. Large file up to 2GB such that a particular byte value may occur up to 2 billion times. 5. Non-text files such as binary files, images and videos. We have provided 4 text files for you in the data folder. The first two files check1.txt and check2.txt only contain several chars, which might be useful for small testing. The file check3.txt is used as an edge case as indicated above. The file warandpeace.txt is a normal large test case. After passing the provided tests, you should create your own test files to cover the remaining cases above. 3. Detect Memory Leaks To check memory leaks, run valgrind on the executables. For example: > valgrind ./build/test/test_HCTree.cpp.executable (there are lots of options you can use, but this is enough for now) ● Notice under LEAK SUMMARY: If it reports that some memory was “definitely lost” then it means you have a memory leak. Check your deleteAll method and see if all the memories allocated are destroyed/deallocated or not. A rule of thumb is to make sure every “new” has a corresponding “delete”. ● For details of what all the LEAK SUMMARY lines mean, see: http://valgrind.org/docs/manual/faq.html#faq.deflost Extra Credit: Block Encoding After turning in a normal compressor that beats the reference header size, you can earn some extra credit points by encoding two byte symbols instead of one byte symbols. See below for an example. Motivation Let’s see how compressing one byte symbols compares to compressing two byte symbols on an input like xxxxyxxy. Larger block sizes reduce the encoded size: Implementation We highly recommend that you first back up your work by making a git commit and pushing it to a ​private​ Github repository. Then create a branch called extra-credit and implement the extra credit on that branch. The same algorithm will work if some adjustments are made. Symbols can no longer be represented by the byte typedef. Instead you may represent them as a two byte type such as unsigned short or whatever is convenient for you. Decompression must result in a file identical to the original that was compressed. Your implementation must be able to handle edge cases, such as an empty file and file with an odd number of bytes. The output compressed files must be no more than 105% the size of those produced by the reference extra credit executables. This will be tested with files large enough that reasonable differences in header size will not matter. The reference executable is called ​extra-credit-compress.executable,​ you can find it in your project root. Your extra credit solution should be available when passing the flag “–block” to the executable. Note that after adding extra credit, the following commands lead to the following behavior: Compress using bit stream output ./build/src/compress.cpp.executable data/toCompress.txt compressed.txt Pseudo compress using ascii output ./build/src/compress.cpp.executable –ascii data/toCompress.txt compressed.txt Compress using block encoding and bit stream output ./build/src/compress.cpp.executable –block data/toCompress.txt compressed.txt The same behavior is expected for the uncompress executable. Getting Help Tutors in the labs are there to help you debug. TA and Professor OH are dedicated to homework and/or PA conceptual questions, but they will not help with debugging (to ensure fairness and also so students have a clear space to ask conceptual questions). Questions about the intent of starter code can be posted on piazza. Please do not post your code to piazza either publicly or privately – coding support comes from the tutors in the labs. Format of your debugging help requests At various times in the labs, the queue to get help can become rather long (all the more reason to start early). To ensure everyone can get help, we have a 5 minute debugging rule for tutors in that they are not supposed to spend more than 5 minutes with you before moving onto a new group. Please respect them and this rule by not beg
ging them to stay longer as you’re essentially asking them to NOT help the next group in exchange for helping you more. 5 minutes?! Yes, 5 minutes. The job of tutors is to help you figure out the ​next step in the debugging process​, not to debug for you. So this means, if you hit a segfault and wait for help from a tutor, the tutor is going to say “run your code in gdb, then run bt to get a backtrace.” Then the tutor will leave as they have gotten you to the next step. This means you should use your time with tutors effectively. Before asking for help, you will want to already have tried running your code in gdb (or valgrind, depending on the error). You should know roughly which line is causing the error and/or have a clear idea of the symptoms. When the tutor comes over, you should be able to say: What you are trying to do. For example, “I’m working on Part 1 and am trying to get the insert method in the BST to work correctly.” What’s the error. ​For example, “the code compiles correctly, but when I insert a child in my right subtree, it seems to lose the child that was there before.” What you’ve done already. ​For example, “I added the method which prints the whole tree (pointers and all) and you can see here that insert to the right subtree of the root just removes what was on the right subtree previously. But looking at my code for that method, it seems like it should traverse past that old child before doing the insert. What do you suggest I try next?”

admin

Author admin

More posts by admin