Hacker News new | ask | show | jobs
by dzorz 5492 days ago
Here is a solution in C++. It reads K and then N numbers from stdin and then prints K largest.

The complexity of nth_element is O(N).

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>

    using namespace std;

    int main() {
        int K;
        cin >> K;
        vector<int> numbers((istream_iterator<int>(cin)),
                            istream_iterator<int>());
        nth_element(numbers.begin(), numbers.begin() + K, numbers.end(),
                    greater<int>());
        for (int i = 0; i < K; ++i) {
            cout << numbers[i] << ' ';
        }
        cout << '\n';
    }

Input:

    5
    1 9 1 3 7 8 2 11 2 5 5 9 1 7
Output:

    7 9 9 11 8
Note: the output is not sorted.
1 comments

Personally the implementation of nth_element is a much more interesting thing to see than just knowing it exists.
It's STL-implementation specific. look in your c++ kit.

in mine (osx 10.6.7, g++ 4.2.1), it's in

/usr/include/c++/4.2.1/bits/stl_algo.h

and it uses an algorithm very similar to the original article

    template<typename _RandomAccessIterator>
    inline void
    nth_element(_RandomAccessIterator __first, _RandomAccessIterator __nth,
                _RandomAccessIterator __last)
    {
      typedef typename iterator_traits<_RandomAccessIterator>::value_type _ValueType;
 
      // concept requirements
      __glibcxx_function_requires(_Mutable_RandomAccessIteratorConcept<_RandomAccessIterator>)
      __glibcxx_function_requires(_LessThanComparableConcept<_ValueType>)
      __glibcxx_requires_valid_range(__first, __nth);
      __glibcxx_requires_valid_range(__nth, __last);

      if (__first == __last || __nth == __last)
        return;

      std::__introselect(__first, __nth, __last,
                         std::__lg(__last - __first) * 2);
    }
you should look at how std::__introselect is defined (hint: its in /usr/include/c++/4.2.1/bits/stl_algo.h )
The original article uses heap to get top K numbers. It's runtime complexity is O(N*log(K)) and nth_element is O(N), so the algorithms are not similar at all.
scorchin posted the implementation of nth_element, except he forgot to look at how the gnu implementation of __introselect behaves.

If you look there, you will see it internally uses a heap.

It uses heap only as a fallback if maximum recursion depth is reached. The key difference between __introselect and the posted article is that __introselect uses pivoting to (ideally) "throw" away half of the array during each step.