Cracking the Oyster

Cracking the Oyster 

Problem Statement 

Input: A file containing at most n positive integers, each less than n, where n = 10^7. It is a fatal error if an integer occurs twice in the input. No other data is associated with the integer. 
Output: A sorted list of increasing order of input strings. 
Constraints: Limited Memory, Huge Disk. Time can at most 10 seconds.

Problem Solving

  • Since I am starting from scratch. I implemented the external sort myself to get a hang of it. 
  • There were interesting and frustrating moments.
  • I learned about file streams, reading writing. 
  • Interesting issues in file streams and how to properly use them. 
  • K-way merge I implemented for the first time. 
  • Realized how much is integer range.
  • Also learned to work with <random> library
  • Memory did not exceed 1.3 Mb. which is quite a good approximation. However, it took a very long time to run. 
  • So I further moved in the book and it talks about an approach where we use the idea that we are sorting integers and then instead of creating the temp files and reading from them we rather read the input file multiple times each time picking up elements in the range, sorting it and writing it to output file.
  • I further implemented the improved version, this one is almost 6x faster than the original temp file creating approach.
  • I measured time using <chrono> library. 
  • Memory is well under 938 Kb.
  • However, there is another improvisation left here. Which proves that a simple solution is the best. 
  • We can create a bit vector representation of the whole range of numbers, which can then be used to sort.
  • So we create a bit vector and initially all elements are 0, then we read the input file and set the bit at an index of the number read to 1. After the processing is complete, we write iterate integers in the range and output if the corresponding bit is set. 
  • This program was very simple to build compared to the first one, also this took some milliseconds to run while the others would take 40, 12 seconds respectively. 
  • Find Code here

 

Learning

  • This was very insightful and fun to do exercise. 
  • I spent a few hours implementing all versions and faced good challenges. 
  • I learned file handling in c++ as well, quite some time was waster only to use the library properly. 
  • This particulate chapters also talk about two book – Conceptual Blockbusting and Software Requirements and Specification. Have to add these in my reading list.  
  •  
     
     
     
     
     

    Leave a Reply

    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.

    Catch Me On Social !!