Python Programming

Lecture 13 Algorithm Intro

13.1 Algorithm Intro

  • Complexity, Big-O (Searching)
  • Recursion, D&C
  • Sorting Algorithms
  • Greedy Algorithm
  • Dynamic Programming
  • Data Structures
  • Another stream of the basic knowledge about programming is algorithm and data structure.

  • When two program solve the same problem but look different, is one program better than the other? (easy to read? easy to understand? execution time?)
  • An algorithm is a set of instructions for accomplishing a task.

  • Every piece of code could be called an algorithm.

  • To learn this, you only need to know some basic algebra and Discrete mathematics.

  • Let's start with a simple algorithm.

Binary Search
  • Binary search is an algorithm; its input is a sorted list of elements. If an element you’re looking for is in that list, binary search returns the position where it’s located. Otherwise, binary search returns null.

  • Here's an example of how binary search works. I'm thinking of a number between 1 and 100.

  • You have to try to guess my number in the fewest tries possible. With every guess, I'll tell you if your guess is too low, too high, or correct.

  • Suppose you start guessing like this: 1, 2, 3, 4 ...

  • If you perform a simple search, and my number was 99, it could take you 99 guesses to get there!

  • Here's a better technique. Start with 50.

  • In general, for any list of $n$, binary search will take $log_2n$ steps to run in the worst case, whereas simple search will take $n$ steps.

  • Running time

  • Any time I talk about an algorithm, I'll discuss its running time. Generally you want to choose the most efficient algorithm— whether you're trying to optimize for time or space.

  • For simple search, the maximum number of guesses is the same as the size of the list. This is called linear time.

  • Binary search runs in logarithmic time.

Big O notation
  • Big O notation is special notation that tells you how fast an algorithm is.

  • You need to know how the running time increases as the list size increases. That's where Big O notation comes in.

  • Big O notation tells you how fast an algorithm is. For example, suppose you have a list of size n. Simple search needs to check each element, so it will take n operations. The run time in Big O notation is $O(n)$.

  • Big O notation lets you compare the number of operations. It tells you how fast the algorithm grows. Run times grow at very different speeds.

  • Binary search needs $log n$ operations to check a list of size n. It's $O(log n)$.

  • Sometimes the performance of an algorithm depends on the exact values of the data rather than simply the size of the problem. For these kinds of algorithms we need to characterize their performance in terms of best case, average case and worst-case performance.

  • $O(log n)$, $O(n)$, $O(n * log n)$, $O(n^2)$, $O(2^n)$, $O(n!)$

P, NP, NP-complete, NP-hard
  • P (polynomial): Problems that can be solved by polynomial time.

  • NP (nondeterministic polynomial time): Decision problems for which the problem instances have proofs verifiable in polynomial time.

  • NP-complete: The hardest problem in NP. Every problem can be transformed in polynomial time into it.

  • NP-hard: Not NP but problems in NP can be transformed into it.

TSP (travelling salesman problem)

  • NP-hard: Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?

  • NP-complete: where given a length L, the task is to decide whether the graph has a tour of at most L.

13.2 Examples

An Anagram Detection Example
  • A good example problem for showing algorithms with different orders of magnitude is the classic anagram detection prolem for strings.
  • One string is an anagram of another if the second is simply a rearrangement of the first. For example, 'heart' and 'earth' are anagrams.
Solution 1: Checking off
  • Check to see that each character in the first string actually occurs in the second.

def anagramSolution1(s1,s2):
    alist = list(s2)
    pos1 = 0
    stillOK = True

    while pos1 < len(s1) and stillOK:
        pos2 = 0
        found = False
        while pos2 < len(alist) and not found:
            if s1[pos1] == alist[pos2]:
                found = True
            else:
                pos2 = pos2 + 1

        if found:
            alist[pos2] = None  
            pos1 = pos1 + 1  
        else:
            stillOK = False
    return stillOK
Solution 2: Sort and Compare

def anagramSolution2(s1,s2):
    alist1 = list(s1)
    alist2 = list(s2)

    alist1.sort()
    alist2.sort()

    pos = 0
    matches = True

    while pos < len(s1) and matches:
        if alist1[pos]==alist2[pos]:
            pos = pos + 1
        else:
            matches = False
    return matches
  • Solution 3: Brute Force
    • Generate a list of all possible strings using the characters from s1 and then see if s2 occurs. (Bad idea)

  • Solution 4: Count and Compare
    • Count the number of times each character occurs.


def anagramSolution4(s1,s2):
    c1 = [0]*26
    c2 = [0]*26

    for i in range(len(s1)):
        pos = ord(s1[i])-ord('a') # ascii
        c1[pos] = c1[pos] + 1

    for i in range(len(s2)):
        pos = ord(s2[i])-ord('a')
        c2[pos] = c2[pos] + 1

    j = 0
    stillOK = True
    while j<26 and stillOK:
        if c1[j]==c2[j]:
            j = j + 1
        else:
            stillOK = False

    return stillOK
Performance of Python Data Structure
  • List

  • Dictionaries

13.3 Recursion

Recursion
  • Recursion is a method of solving problems that involves breaking a problem down into smaller and smaller subproblems until you get to a small enough problem that it can be solved trivially.

  • Recursion allows us to write elegant solutions to problems that may otherwise be very difficult to program.


def listsum(numList):
    theSum = 0
    for i in numList:
        theSum = theSum + i
    return theSum

def listsum(numList):
   if len(numList) == 1:
        return numList[0]
   else:
        return numList[0] + listsum(numList[1:])
Base case and recursive case

def fact(n):
    if n==1:
        return 1 #Base case
    else:
        return n * fact(n - 1) #recursive case
  • The recursive case is when the function calls itself. The base case is when the function does not call itself again... so it doesn't go into an infinite loop.

The stack

def greet(name):
    print('hello '+ name + '!')
    greet2(name)
    print('getting ready to say bye...')
    bye()

def greet2(name):
    print('how are you, ' + name + '?')
def bye():
    print('Ok, bye!')
The call stack with recursion
Divide and Conquer (D&C, 分而治之)
  • D&C gives you a new way to think about solving problems. When you get a new problem, you don't have to be stumped. Instead, you can ask, "Can I solve this if I use divide and conquer?"

  • You want to divide this farm evenly into square plots. You want the plots to be as big as possible.

  • To solve a problem using D&C, there are two steps:

  • 1. Figure out the base case. This should be the simplest possible case.

  • 2. Divide or decrease your problem until it becomes the base case.

  • What is the largest square size you can use? You have to reduce your problem. Let’s start by marking out the biggest boxes you can use.

  • There's a farm segment left to divide. Why don't you apply the same algorithm to this segment?

  • If you find the biggest box that will work for this size, that will be the biggest box that will work for the entire farm. You just reduced the problem from a 1680 × 640 farm to a 640 × 400 farm!

  • To recap, here's how D&C works:

  • 1. Figure out a simple case as the base case.

  • 2. Figure out how to reduce your problem and get to the base case.

  • D&C isn't a simple algorithm that you can apply to a problem. Instead, it's a way to think about a problem.

Summary

  • Algorithm Intro