Another stream of the basic knowledge about programming is algorithm and data structure.
An algorithm is a set of instructions for accomplishing a task.
Every piece of code could be called an algorithm.
To learn this, you only need to know some basic algebra and Discrete mathematics.
Let's start with a simple algorithm.
Binary search is an algorithm; its input is a sorted list of elements. If an element you’re looking for is in that list, binary search returns the position where it’s located. Otherwise, binary search returns null.
Here's an example of how binary search works. I'm thinking of a number between 1 and 100.
You have to try to guess my number in the fewest tries possible. With every guess, I'll tell you if your guess is too low, too high, or correct.
Suppose you start guessing like this: 1, 2, 3, 4 ...
If you perform a simple search, and my number was 99, it could take you 99 guesses to get there!
Here's a better technique. Start with 50.
In general, for any list of $n$, binary search will take $log_2n$ steps to run in the worst case, whereas simple search will take $n$ steps.
Running time
Any time I talk about an algorithm, I'll discuss its running time. Generally you want to choose the most efficient algorithm— whether you're trying to optimize for time or space.
For simple search, the maximum number of guesses is the same as the size of the list. This is called linear time.
Binary search runs in logarithmic time.
Big O notation is special notation that tells you how fast an algorithm is.
You need to know how the running time increases as the list size increases. That's where Big O notation comes in.
Big O notation tells you how fast an algorithm is. For example, suppose you have a list of size n. Simple search needs to check each element, so it will take n operations. The run time in Big O notation is $O(n)$.
Big O notation lets you compare the number of operations. It tells you how fast the algorithm grows. Run times grow at very different speeds.
Binary search needs $log n$ operations to check a list of size n. It's $O(log n)$.
Sometimes the performance of an algorithm depends on the exact values of the data rather than simply the size of the problem. For these kinds of algorithms we need to characterize their performance in terms of best case, average case and worst-case performance.
$O(log n)$, $O(n)$, $O(n * log n)$, $O(n^2)$, $O(2^n)$, $O(n!)$
P (polynomial): Problems that can be solved by polynomial time.
NP (nondeterministic polynomial time): Decision problems for which the problem instances have proofs verifiable in polynomial time.
NP-complete: The hardest problem in NP. Every problem can be transformed in polynomial time into it.
NP-hard: Not NP but problems in NP can be transformed into it.
TSP (travelling salesman problem)
NP-hard: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?
NP-complete: where given a length L, the task is to decide whether the graph has a tour of at most L.
def anagramSolution1(s1,s2):
alist = list(s2)
pos1 = 0
stillOK = True
while pos1 < len(s1) and stillOK:
pos2 = 0
found = False
while pos2 < len(alist) and not found:
if s1[pos1] == alist[pos2]:
found = True
else:
pos2 = pos2 + 1
if found:
alist[pos2] = None
pos1 = pos1 + 1
else:
stillOK = False
return stillOK
def anagramSolution2(s1,s2):
alist1 = list(s1)
alist2 = list(s2)
alist1.sort()
alist2.sort()
pos = 0
matches = True
while pos < len(s1) and matches:
if alist1[pos]==alist2[pos]:
pos = pos + 1
else:
matches = False
return matches
Generate a list of all possible strings using the characters from s1 and then see if s2 occurs. (Bad idea)
Count the number of times each character occurs.
def anagramSolution4(s1,s2):
c1 = [0]*26
c2 = [0]*26
for i in range(len(s1)):
pos = ord(s1[i])-ord('a') # ascii
c1[pos] = c1[pos] + 1
for i in range(len(s2)):
pos = ord(s2[i])-ord('a')
c2[pos] = c2[pos] + 1
j = 0
stillOK = True
while j<26 and stillOK:
if c1[j]==c2[j]:
j = j + 1
else:
stillOK = False
return stillOK
List
Dictionaries
Recursion is a method of solving problems that involves breaking a problem down into smaller and smaller subproblems until you get to a small enough problem that it can be solved trivially.
Recursion allows us to write elegant solutions to problems that may otherwise be very difficult to program.
def listsum(numList):
theSum = 0
for i in numList:
theSum = theSum + i
return theSum
def listsum(numList):
if len(numList) == 1:
return numList[0]
else:
return numList[0] + listsum(numList[1:])
def fact(n):
if n==1:
return 1 #Base case
else:
return n * fact(n - 1) #recursive case
The recursive case is when the function calls itself. The base case is when the function does not call itself again... so it doesn't go into an infinite loop.
def greet(name):
print('hello '+ name + '!')
greet2(name)
print('getting ready to say bye...')
bye()
def greet2(name):
print('how are you, ' + name + '?')
def bye():
print('Ok, bye!')
D&C gives you a new way to think about solving problems. When you get a new problem, you don't have to be stumped. Instead, you can ask, "Can I solve this if I use divide and conquer?"
You want to divide this farm evenly into square plots. You want the plots to be as big as possible.
To solve a problem using D&C, there are two steps:
1. Figure out the base case. This should be the simplest possible case.
2. Divide or decrease your problem until it becomes the base case.
What is the largest square size you can use? You have to reduce your problem. Let’s start by marking out the biggest boxes you can use.
There's a farm segment left to divide. Why don't you apply the same algorithm to this segment?
If you find the biggest box that will work for this size, that will be the biggest box that will work for the entire farm. You just reduced the problem from a 1680 × 640 farm to a 640 × 400 farm!
To recap, here's how D&C works:
1. Figure out a simple case as the base case.
2. Figure out how to reduce your problem and get to the base case.
D&C isn't a simple algorithm that you can apply to a problem. Instead, it's a way to think about a problem.