Fun finding duplicate elements in an array

I recently saw a fun programming challenge of trying to find a duplicate element in an array. The problem was stated as follows.

  • You have an integer array of length n.
  • Each element in the array can only take on the values of [0, n-1].
  • Try to find all integer values that are duplicated in the array.

For example, the array has a length of 5, thus containing only the values in the set {0, 1, 2, 3, 4} or range [0, 4] (note inclusive both ends). The actual array might hold elements (or integers) in this order: 2, 1, 4, 1, 3.

This problem is so silly to me, but it does have its merits, such as flushing out one’s understanding of worst case running time and space complexity. Before we begin, let’s list the approaches and their associated complexities. In the following table, the different solutions are given a name, followed by their running time and space complexity in big-Oh notation, and they are sorted by running time complexity descendingly and then space complexity descendingly.

name running time complexity space complexity
naive O(n^2) O(C)
sorting O(n log n) O(C)
map O(n) O(n)
set O(n) O(n)
bitmap O(n) O(n)
awesome O(n) O(C)

Here’s a quick description of each of the solutions.

  • naive: This approach uses 2 for loops to iterate over the array to find duplicate elements. It is called “naive” because it’s probably what your gut instinct might first lead you to coding without considering running or space complexity.
  • sorting: This approach uses a sort algorithm to sort the array, and then iterates the sorted array to find duplicate elements.
  • map: This approach uses a Map data structure, much like the set approach, to store observed elements and their frequencies. Any element with an associated frequency greater than 2 is a duplicate.
  • set: This approach uses a Set data structure to store observed elements as we iterate through the array. If we come across an element that already exists in the Set, then we know that element is a duplicate. It’s ranked slight higher than the map approach since we are not using the additional space to store frequencies.
  • bitmap: This approach uses a bitmap (or boolean array; booleans are 1 bit) to store if we have seen an element. Initially, the bitmap values (boolean values) are all set to 0 (false). The first time we see an element, we go to the corresponding index in the bitmap and flip the value to 1 (true). If we have observe that the corresponding value is already true, then the currently inspected element is a duplicate. It’s ranked slightly higher than the set and map approaches since we are using bits.
  • awesome: This approach is called “awesome” because it is awesome. It has the lowest worst case running time AND space complexity. One must exploit the original constraints provided by the problem, which is that the values are in the range [0, n-1] e.g. 0, 1, 2, …, 4.

Next, let’s go into a little bit more depth using Java to implement each of these solutions. For the naive approach, our code will be as follows. Note that we iterate/loop over the array twice but use no additional space to detect duplicates. The running time complexity is quadratic or O(n^2) . It’s a sign a defeat for some software engineer to implement code that have O(n^2) worst case running time complexity (or higher). On the other hand, there’s a guiding principle in software engineering: make it work now, optimize later. Also, for very small inputs, O(n^2) algorithms might not bad and even competitive with algorithms with lower running time complexity. Note that with the outer loop, we point to one element, and with the inner loop, we move along each element to the right to detect duplicates.

public static void naive(int[] data) {
  for(int i=0; i < data.length; i++) {
    int num1 = data[i];
    for(int j=i+1; j < data.length; j++) {
      int num2 = data[j];
      if(num1 == num2) {
        System.out.println(data[i]);
      }
    }
  }
}

For the sort approach, what we want to do to the array is first sort it and then search it for duplicates. Sorting and searching problems are duals (or go hand-in-hand) in computer science. First we use Java’s Arrays.sort(int[]) method to sort the array. Java’s Array.sort(int[]) method uses quicksort to sort, and its worst case running complexity is O(n log n). If you read the documentation, Java’s quicksort might be O(n^2) with pathological inputs. On the other hand, Java’s Collections.sort(List) method uses merge-sort which guarantees O(n log n). But let’s ignore all that for now. After we have sorted the array, simply iterate it once, looking at each element and its right adjacent neighboring element. If they are the same, then the element is a duplicate.

public static void sorting(int[] data) {
  Arrays.sort(data);
  for(int i=0; i < data.length-1; i++) {
    int num1 = data[i];
    int num2 = data[i+1];
    if(num1 == num2) {
      System.out.println(num1);
    }
  }
}

The map approach is like the set approach. Note that with the map approach, we can store both the integer and its associated frequency. As we loop through the array, note that we store the integer and increment its frequency. If we encounter an integer that already exists in the map, then we have found a duplicate.

public static void map(int[] data) {
  Map<Integer, Integer> map = new HashMap<Integer, Integer>();
  for(int i=0; i < data.length; i++) {
    int num = data[i];
    if(map.containsKey(num)) {
      System.out.println(num);
      map.put(num, 1 + map.get(num));
    } else {
      map.put(num, 1);
    }
  }
}

For the set approach, we use a Set to store previously seen integers. As we iterate/loop through the array, if we detect an integer that is already stored in the Set, then it is a duplicate.

public static void set(int[] data) {
  Set<Integer> set = new HashSet<Integer>();
  for(int i=0; i < data.length; i++) {
    int num = data[i];
    if(set.contains(num)) {
      System.out.println(num);
    } else {
      set.add(num);
    }
  }
}

The bitmap approach gets us a little bit closer to the ultimate solution. In this approach, we use a bitmap (here in Java, a boolean array; a boolean is 1 bit) to track what integers we have seen so far. If you go back to the problem’s constraint, note that the values are in the range [0, n-1] and the length of the array is n. That means the bitmap can be used to indicate if we have previously seen a value using the value as the index. In our example, we have n=5, so we have a boolean array with the values [false, false, false, false, false] and each index corresponds to a value in [0, n-1], which is [0, 1, 2, 3, 4]. If we go to bitmap[0], we can say that we have seen the number 0 at least one time if its value is true, or that we have not seen the number 0 at all if its value is false.

public static void bitmap(int[] data) {
  boolean[] bitmap = new boolean[data.length];
  for(int i=0; i < data.length; i++) {
    int num = data[i];
    if(false == bitmap[num]) {
      bitmap[num] = true;
    } else {
      System.out.println(num);
    }
  }
}

Now for the awesome approach. This approach loops over the array only once and uses no additional space. How do we do that? Note that we need a way to store what we’ve seen so far. The way we can store what we’ve seen so far is when we encounter a number, we go to the index element corresponding to the number, and if the element (number) is positive, we make it negative (multiply it by -1). If we see that number again, we revisit the index element corresponding to that number, and we will observe that it is negative, so it’s a duplicate.

public static void awesome(int[] data) {
  for(int i=0; i < data.length; i++) {
    int num = Math.abs(data[i]);
    if(data[num] < 0) {
      System.out.println(num);
    } else {
      data[num] = -1 * data[num];
    }
  }
}

Let’s actually walk through the awesome approach. Assume we have this array of integers: [2, 1, 4, 1, 3]. Now let’s loop through starting at the 0-th element.

  • At i=0, array[i] = array[0] = 2. We go to array[2], and array[2] = 4. Since 4 is positive, we know we have not seen 2 before. So we make the integer referenced by array[2] negative. After this modification, the array is [2, 1, -4, 1, 3].
  • At i=1, array[i] = array[1] = 1. We go to array[1], and array[1] = 1. Since 1 is positive, we know we have not seen 1 before. So we make the integer referenced by array[1] negative. After this modification, the array is [2, -1, -4, 1, 3].
  • At i=2, array[i] = array[2] = -4. We go to array[4] (note we take the absolute value of the value), and array[4]=3. Since 3 is positive, we know we have not seen 4 before. So we make the integer referenced by array[4] negative. After this modification, the array is [2, -1, -4, 1, -3].
  • At i=3, array[i] = array[3] = 1. We go to array[1], and array [1] = -1. Since -1 is negative, BINGO, a duplicate is found.
  • At i=4, array[i] = array[4] = 3. We go to array[3], and array[3] = 1. Since 1 is positive, we know we have seen 3 before. So we make the integer referenced by array[4] negative. After this modification, the array is [2, -1, -4, -1, -3].

Notice the final array [2, -1, -4, -1, -3] stores 2 types of information? Each array[i] stores the integer value and the sign of the value indicates if we have seen the value of its index. If we just look look at the array, we can say we

  • have never seen 0, since array[0] is positive, 2
  • have seen 1, since array[1] is negative, -1
  • have seen 2, since array[2] is negative, -4
  • have seen 3, since array[3] is negative, -1
  • have seen 4, since array[4] is negative, -3

There are even more ways to solve this silly problem. A good link is here on StackOverflow.

Hope you enjoyed reading this blog post. Sib dua cov phooj ywg! 고맙습니다. 당신을보고

 

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s