There are various ways to solve Hamming.
One approach is to iterate over either a range of indexes or to use zip.
Another approach is to use the range of indexes.
Some other approaches include the use of enumerate
, or filter
with a lambda
.
The goal of this exercise is to compare two DNA strands and count how many of the nucleotides are different from their equivalent in the other string. The most common solution uses some kind of loop to iterate over the two strands and compare nucleotides with the same index.
Using range
is an approach to iterate over a sequence.
Although it may not be the most pythonic strategy, it is a good way to start.
range
is a built-in function and it is very fast.
The downside is that range
only works with iterators that can be indexed, like concept:python/lists and concept:python/strings.
While the built-in function enumerate
can take any iterator.
def distance(strand_a, strand_b):
if len(strand_a) != len(strand_b):
raise ValueError("Strands must be of equal length.")
count = 0
for index in range(len(strand_a)):
if strand_a[index] != strand_b[index]:
count += 1
return count
For more information, check the range approach.
The built-in zip
function returns an iterator of concept:python/tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together, and so on.
Using zip()
to iterate removes the need to index into the strands.
The downside is that if you need to index into your iterators, zip
won't work.
Although it is possible to combine zip
with enumerate
to generate indexes.
def distance(strand_a, strand_b):
if len(strand_a) != len(strand_b):
raise ValueError("Strands must be of equal length.")
count = 0
for nucleotide_a, nucleotide_b in zip(strand_a, strand_b):
if nucleotide_a != nucleotide_b:
count += 1
return count
For more information, check the zip approach.
Using the built-in sum
removes the need for a counter variable.
Removing the counter variable makes the code more concise.
The examples making use of sum
also use a generator expression, although that it is not required.
Using sum
in this fashion requires a bit more Python knowledge compared to the other approaches.
With zip
:
def distance(strand_a, strand_b):
if len(strand_a) != len(strand_b):
raise ValueError("Strands must be of equal length.")
return sum(nucleotide_a != nucleotide_b for
nucleotide_a, nucleotide_b in zip(strand_a, strand_b))
With range
:
def distance(strand_a, strand_b):
if len(strand_a) != len(strand_b):
raise ValueError("Strands must be of equal length.")
return sum(strand_a[index] != strand_b[index] for
index in range(len(strand_a)))
For more information, check the sum approach.