|
|
|
|
|
by aw1621107
877 days ago
|
|
I was curious why the std::vector version produced so much more assembly, so I tried to dig a little. At least in libc++, the relevant code is (cleaned up): template<typename T,
class vector {
private:
pointer __begin_;
pointer __end_;
__compressed_pair<pointer, allocator_type> __end_cap_;
public:
constexpr const pointer& __end_cap() const noexcept {
return this->__end_cap_.first();
}
constexpr size_type size() const noexcept {
return static_cast<size_type>(this->__end_ - this->__begin_);
}
constexpr size_type capacity() const noexcept {
return static_cast<size_type>(__end_cap() - this->__begin_);
}
};
So if the functions are fully inlined we should end up with template<class T>
bool can_fit(std::vector<T> const &v, size_t n) {
return static_cast<size_t>(v.__end_cap_.first() - v.__begin_) - static_cast<size_t>(v.__end_ - v.__begin_);
}
At least algebraically (and ignoring casts) it should be equivalent to v.__end_cap_.first() - v.__end_, which is more or less what the manual implementations do. Maybe the optimizer can't make that transformation for some reason or another (overflow and/or not knowing the relationship between the pointers involved, maybe)?If you change can_fit(Vec<S>) to: return (v.f - v.b) - (v.e - v.b) >= n;
You end up with code that looks pretty similar to the can_fit(std::vector<S>) overload (the same for clang, a bit different for GCC), so it does seem it might be something about the extra pointer math that can't be safely reduced, and the casts aren't really relevant.(I'm also a bit surprised that can_fit_fast produces different assembly than the Vec can_fit overload) |
|
(There's also a secondary issue here, which is that sizeof(S) isn't a power of 2. That's what's introducing multiplications, instead of bit shifts. You might not see as drastic of a difference if sizeof(S) == alignof(S).)