-
-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary search is not well understood #495
Comments
I wanted to add that the problems above feel like they are specifically ruby problems. Functional languages for example, can easily pass around array slices as they typically don't need to create copies. Many non-functional languages actually care about performance more than ruby and are a lot "closer to the metal" so thinking in terms of algorithmic complexity is more natural there. |
These are good points. I agree that exercism might not be the right place to teach algorithm analysis, and ruby does not lend itself well to implementing some of these "lower level" algorithms (and wouldn't even be realistic in the real world), but understanding the idea behind the exercise (even in ruby) is not a bad thing. I do wonder though how we could enforce a specific implementation (besides using nits from rikki), which I don't really think is possible (or ideal). |
I have been thinking about this some more. One thought would be to distance ourselves from the "binary search" aspect of the exercise, and through code reviews/breadcrumbs lead the developer to the conclusion that there are more optimal ways to implement the particular solution. Really we can only test what a method is doing, not its implementation so maybe the exercise should really be something along the lines of "Find the index of a particular item in a sorted array" |
Seems good; that is in keeping with exercism/problem-specifications#234
I'm not sure if this is something that would be desirable to replicate, but I'll share it in case it is: Lua creates a special TracedArray that counts the number of times you access its elements.
I have not done this exercise in Lua so I do not know whether this is prone to causing false positives or negatives, etc. Sieve is another example of an exercise where we can only look at the result, not the implementation, and there are various submissions that will use a non-sieving implementation. Indeed, it can be argued that Exercism exercises in general should not really focus on particular implementations. |
I was actually just about to mention that.
That seems like a great idea. |
Something like the following seems like a good idea to me. Using a static array guarantees that we don't get weird array access and by ensuring that it is sorted, we can move the "ensure sorted" assertion out of the BinarySearch algorithm. # A static sorted array.
# Maintains the invariant that all items are stored in sorted order.
# Constructing a new {TracedArray} takes time proportional to the number of
# items in the array. Returning the size and accessing an element of the array
# takes constant time.
class TracedArray
# @!attribute [r] reads
# @return [Integer] The number of read accesses.
attr_reader :reads
# Initializes a new TracedArray
# @param collection [Enumerable]
def initialize(collection)
@array = collection.to_a.sort
@reads = 0
end
# Returns the size of the array.
# @return [Integer]
def size
@array.size
end
# Return the ith element.
# @param i [Integer] The index (Negative indices are NOT supported).
# @return [Object]
# @raise [RangeError] if i lies outside the valid dimensions of the array.
def [](i)
fail RangeError, i unless (0...@array.size).cover? i
@reads += 1
@array[i]
end
end |
Some suggestions for changing the tests:
|
@remcopeereboom This looks interesting, but I think forcing a user to implement a O(log n) search algorithm might be above the scope of exercism in its current form, and could very well turn off some users who don't have a CS background. I think letting the user implement a solution that works first, and through code review be brought to better implementations is a better pattern. I wonder though if we could categorize exercises somehow by their intended goal using the topics array in the config. (i.e. algorithm design and analysis/general purpose/language specific/object oriented design etc.) This way a user who is interested in sharpening their CS skills could pull those types of exercises vs someone who is just trying to get a feel for the language and improve their broader knowledge. |
There is a (few?) changes and conversation being made regarding ratings for difficulty and perhaps categorizing based on things being taught per exercise. I am not sure that this will force the 0(log n) solution, but may encourage it. If it forces it, then the discussion goes away, and that is not a good thing. @bmulvihill are you finding that it does force this aspect? |
@kotp I was looking at how Lua was using the traced array where they are ensuring that an element is found in O(log n) https://github.com/exercism/xlua/blob/master/exercises/binary-search/binary-search_spec.lua#L32-L37 |
Thanks @bmulvihill wanted to clarify the reference. |
Yes, that was my thinking as well. More particularly, I don't think that it is a good platform for implementing specific algorithms. There is also a prime number sieve problem which asks clients to implement the Sieve of Eratosthenes, but very few people do so. Instead, they implement different algorithms that all find the first x primes. That was exercism is much better at, discussing a particular solution to a general (simple) problem. Some solutions might optimize performance, others might not. But the BinarySearch problem is specifically about performance. In fact, I'd ask how you would even say that something is a BinarySearch if it did not have O(log n) performance). There is simply no reason to do binary search if not for performance. After all, a simple linear search can be used on collections that are not sorted and data structures that are not arrays.
I don't think that works for binary search as there really is only one good way to code one up; it is very hard to spot all the array copies (even for reviewers); and the topic of algorithmic performance is not easily (nor adequately) explained in a nit. But perhaps I am being overly negative. I can see this working for a problem that ask to simply find the element in a sorted array, but at the point we can start asking whether the problem is not too simple. |
@remcopeereboom I agree with this statement. I think categorizing exercises under certain topics might solve this problem (but that is a larger issue). If I am a CS student, or I just looking to brush up on my algorithms before an interview this is the exact type of problem I would want to solve, and I would want my solution to hold up to the O(log n) requirement. If I am just trying to practice a language I might not be as interested in this exercise or want to be burdened with trying to meet that performance requirement. I think the exercise difference-of-squares is a great example of how a user can be lead through review and bread crumbs from an O(n) solution to an O(1) by discovering the mathematical formulas behind the algorithm (and this can be done without getting too much into the details of big O). I don't know if that directly applies here because then as you mentioned the problem might be too simple. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
The whole point of binary search is to do array searches in logarithmic time, instead of in linear time. Having recently taken a look at the implementations, I have not found a single one that actually does this. In fact, all the solutions I've looked at actually check to see that the array given is sorted by sorting copy of the array. This means that the solution is at least O(n) because it needs to create a copy and depending on the sorting algorithm (I think the default is merge sort), might well be O(n log n).
I think the README should make this point a lot clearer, but exercism might not be the best place to teach algorithmic complexity (or it might, I don't know). Perhaps we could add a rikki nit to check if the
BinarySearch
constructor sorts the input array and a call is made to the constructor in the code. Another rikki point might be to scan the code for array slices as these also create copy arrays, violating the O(log n) goal.While it is possible to actually implement the problem as it stands in an O(log n) manner, by not using recursion, it is rather limiting. Not to mention the fact that non-recursive binary search is actually a very tricky algorithm to code up (it's easy to get off-by-one errors). It is also possible with recursion, but people tend to have difficulty in coming up with the idea of passing the start and end indices around as parameters. That would be a nice thing to teach, but is not really in the spirit of Ruby (programmer happiness)
We should also remove the test that asks clients to raise an
ArgumentError
if the array given is not in sorted order. It seems to mislead a lot of people and doesn't even give us any security. There is nothing stopping clients from mutating the array passed in, but exercists don't seem to be aware of that. It's a false sense of security and so is arguably teaching them a bad habit.The text was updated successfully, but these errors were encountered: