Python Programming

Lecture 5 Strings

5.1 Strings

A string is a sequence of characters. The elements of a string are characters. Empty string ''.(not' ') You can access the characters one at a time with the bracket operator:


>>> fruit = 'banana'
>>> letter = fruit[1]
>>> print(letter)
a
>>> len(fruit)
6


>>> fruit = 'banana'
>>> fruit[1:3]
'an'
>>> fruit[3:]
'ana'

in operator


>>> print('a' in 'banana')
True
>>> print('seed' in 'banana')
False

iteration


fruit = 'banana'
for char in fruit:
    print(char)

Strings are immutable


>>> greeting = 'Hello, world!'
>>> greeting[0] = 'J'

TypeError: 'str' object does not 
support item assignment


>>> greeting = 'Hello, world!'
>>> new_greeting = 'J' + greeting[1:]
>>> print(new_greeting)

Jello, world!

Comparison operations are useful for putting words in alphabetical order.


>>> print('apple'>'banana')
False
>>> print('ba' > 'banana')
False
>>> a_list = ["orange", "apple", "banana"]
>>> sorted(a_list)
['apple', 'banana', 'orange']

Methods of Strings

.title(), .lower(), .upper()
String's methods do not change the original variable but return values.


>>> name = "ada lovelace"
>>> print(name.title())
Ada Lovelace

>>> print(name)
ada lovelace


>>> name = "Ada Lovelace"
>>> print(name.upper())
ADA LOVELACE

>>> print(name.lower())
ada lovelace

Stripping Whitespace

To programmers 'python' and 'python ' look pretty much the same. But to a program, they are two different strings. To ensure that no whitespace exists at the right end of a string, use the rstrip() method.


>>> favorite_language = 'python '
>>> favorite_language
'python '
>>> favorite_language.rstrip()
'python'
>>> favorite_language
'python '

However, it is only removed temporarily. To remove the whitespace from the string permanently, you have to store the stripped value back into the variable:


>>> favorite_language = 'python '
>>> favorite_language = favorite_language.rstrip()
>>> favorite_language
'python'

You can also strip whitespace from the left side of a string using the lstrip() method or strip whitespace from both sides at once using strip():


>>> favorite_language = ' python '
>>> favorite_language.rstrip()
' python'
>>> favorite_language.lstrip()
'python '
>>> favorite_language.strip()
'python'

Parsing strings

.find() searches for the position of a string in another string


>>> word = 'banana'
>>> index = word.find('a')
>>> print(index)
1
>>> word.find('na')
2
>>> word.find('na', 3)
4


>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1 : sppos]
>>> print(host)
uct.ac.za

.startswith() returns the boolean value


>>> line = 'Have a nice day'
>>> line.startswith('h') 
False

>>> line.lower()
'have a nice day'

>>> line.lower().startswith('h')
True

.split() break a sentence into words and make a list


>>> s = 'pining for the fjords'
>>> t = s.split()
>>> print(t)
['pining', 'for', 'the', 'fjords']
>>> print(t[2])
the

Formatted String Literals


>>> number = 42
>>> print('I have spotted number camels.') #error.
>>> print('I have spotted '+str(number)+' camels.') #not simple


>>> number = 42
>>> print(f'I have spotted {number} camels.')
I have spotted 42 camels.


>>> animal = 'camels'
>>> number = 42.12345678
>>> print(f'I spotted {number:.2f} {animal}.') 
I spotted 42.12 camels.

>>> print(f'I spotted {number:.0f} {animal}.') 
I spotted 42 camels.

>>> print(f'I spotted {number:.5} {animal}.') 
I spotted 42.123 camels.

>>> print(f'I spotted {number:.2%} {animal}.') 
I spotted 4212.35% camels.

String: Summary

The elements of a string are characters. Empty string ''
Features: Ordered, Immutable, Repeatable
Index and slice are the same with that of lists.
in operator shows the boolean value for whether a string contains a given string. You can compare two strings in Alphabetical order.
.upper(), lower(), .title()
rstrip(), .lstrip(), .strip()
.find(), .startwith(), .split()
Formatted String Literals

5.2 成语接龙

1. 加载成语词典。
2. 给定一个成语，找到可以接上的所有成语。
3. 从一个给定的成语开始，一直接下去，到不能接下去为止。


# 1. 加载成语词典
filename = 'idiom_dictionary.txt'
with open(filename, encoding="utf-8") as file_object:
    lines = file_object.readlines() #List


d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom = line[:endpoint].strip()
        pinyin_start = line.find("：", endpoint)
        pinyin_end =line.find("释义")
        each= line[pinyin_start+1: pinyin_end]
        pinyin_list = each.split()
        d_game[idiom] = pinyin_list
print(len(d_game))


# 2. 给定一个成语，找到可以接上的所有成语
idiom = input("请输入第一个成语\n")
char_4th = d_game[idiom][-1]
for x, y in d_game.items():
    if char_4th == y[0]:
        print(x)


# 3. 从一个给定的成语开始，一直接下去，到不能接下去为止。
idiom = input("请输入第一个成语\n")
enter=""
while enter!="q":
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            break
    enter=input("continue?")

有可能找不到能够接下去的成语了，但是程序不会结束，怎么解决呢？


idiom = input("请输入第一个成语\n")
enter=""
exist = True
while enter!="q" and exist:
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            exist = True
            break
        else:
            exist = False
    if exist:
        enter=input("continue?")
    else:
        print("对不起，没有成语了")

可进一步添加如下功能：
基本释义，谐音取词，人机对战，随机取词，模糊匹配


# 基本释义功能
# 修改第一步
d_ex={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom=line[:endpoint].strip()
        pinyin_end=line.find("释义")
        pinyin_start=line.find("：", endpoint)
        explanation=line[pinyin_end:]
        d_ex[idiom]=explanation
words = input("请输入要查询的成语\n")
print(d_ex[words])


# 谐音取词
# 修改第一步
import unicodedata
d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom=line[:endpoint].strip()
        pinyin_start=line.find("：", endpoint)
        pinyin_end=line.find("释义")
        each=line[pinyin_start+1: pinyin_end]
        each=unicodedata.normalize('NFKD',each).encode('ascii','ignore').decode()
        pinyin_list=each.split()
        d_game[idiom]=pinyin_list

Summary

Strings
Reading: Python for Everybody

Strings Chapter 6