Python Programming

Lecture 5 Strings

5.1 Strings

A string is a sequence of characters. The elements of a string are characters. Empty string ''.(not' ') You can access the characters one at a time with the bracket operator.


>>> fruit = 'banana'
>>> fruit[1]
'a'
>>> len(fruit)
6


>>> fruit = 'banana'
>>> fruit[1:3]
'an'
>>> fruit[3:]
'ana'

in operator


>>> print('a' in 'banana')
True
>>> print('seed' in 'banana')
False
>>> print('ana' not in 'banana')
False

Iteration


fruit = 'banana'
for char in fruit:
    print(char)

Strings are immutable (similar to Tuples)


>>> greeting = 'Hello, world!'
>>> greeting[0] = 'J'

TypeError: 'str' object does not 
support item assignment


>>> greeting = 'Hello, world!'
>>> new_greeting = 'J' + greeting[1:]
>>> print(new_greeting)

Jello, world!

Comparison operations are useful for putting words in alphabetical order.


>>> print('apple'>'banana')
False
>>> print('ba' > 'banana')
False
>>> a_list = ["orange", "apple", "banana"]
>>> sorted(a_list)
['apple', 'banana', 'orange']

Methods of Strings

Converting Characters: .title(), .lower(), .upper()

String's methods do not change the original variable but return values.


>>> name = "ada lovelace"
>>> print(name.title())
Ada Lovelace

>>> print(name)
ada lovelace


>>> name = "Ada Lovelace"
>>> print(name.upper())
ADA LOVELACE

>>> print(name.lower())
ada lovelace

Stripping Whitespace: .lstrip(), .rstrip(), .strip()

To programmers 'python' and 'python ' look pretty much the same. But to a program, they are two different strings. To ensure that no whitespace exists at the right end of a string, use the rstrip() method.
You can also strip whitespace from the left side of a string using the lstrip() method or strip whitespace from both sides at once using strip().


>>> favorite_language = ' python '
>>> favorite_language.rstrip()
' python'
>>> favorite_language.lstrip()
'python '
>>> favorite_language.strip()
'python'

However, it is only removed temporarily. To remove the whitespace from the string permanently, you have to store the stripped value back into the variable.


>>> favorite_language = 'python '
>>> favorite_language = favorite_language.rstrip()
>>> favorite_language
'python'

Parsing Strings: .find(), split(), .isalpha(), isspace()

.find() searches for the position of a string in another string


>>> word = 'banana'
>>> index = word.find('a')
>>> print(index)
1
>>> word.find('na')
2
>>> word.find('na', 3)
4


>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1 : sppos]
>>> print(host)
uct.ac.za

.split() breaks a sentence into words and make a list


>>> s = 'break a sentence into words'
>>> t = s.split()
>>> print(t)
['break', 'a', 'sentence', 'into', 'words']

.isalpha() returns True if all characters in the string are alphabetic (A–Z, a–z), otherwise False.

.isspace() returns True if the string only contains whitespace characters (spaces, tabs, newlines), otherwise False.


>>> print("Hello".isalpha())   # True（全是字母）
>>> print("Hello123".isalpha()) # False（包含数字）
>>> print("你好".isalpha())     # True（支持中文等非拉丁字符）

>>> print("   ".isspace())  # True（全是空格）
>>> print("\t\n".isspace()) # True（制表符和换行符也算空白）
>>> print("Hello".isspace()) # False（包含字母）

Formatted String Literals


>>> number = 42
>>> print('I have spotted number camels.') #error.
>>> print('I have spotted '+str(number)+' camels.') #not simple


>>> number = 42
>>> print(f'I have spotted {number} camels.')
I have spotted 42 camels.


>>> animal = 'camels'
>>> number = 42.12345678
>>> print(f'''I spotted {number:.2f} {animal}.''') 
I spotted 42.12 camels.
>>> print(f'''I spotted {number:.0f} {animal}.''') 
I spotted 42 camels.

>>> print(f'''I spotted {number:.2} {animal}.''') 
I spotted 4.2e+01 camels.
>>> print(f'''I spotted {number:.5} {animal}.''') 
I spotted 42.123 camels.

>>> print(f'''I spotted {number:.2%} {animal}.''') 
I spotted 4212.35% camels.

String: Summary

The elements of a string are characters. Empty string ''
Features: Ordered, Repeatable, Immutable
Index and slice are the same with that of tuples.
in operator shows the boolean value for whether a string contains a given string.
You can compare two strings in Alphabetical order.
String Methods
- .upper(), lower(), .title()
- rstrip(), .lstrip(), .strip()
- .find(), .split(), .isalpha(), isspace()
Formatted String Literals

5.2 Sentiment Analysis (情感分析）

情感分析(简化版)

情感分析（Sentiment Analysis），也称为情绪分析或意见挖掘，广泛用于各种行业和场景，帮助企业和个人理解文本中的情感倾向（正面、负面或中性）。以下是几个重要的应用场景：

社交媒体分析：检测品牌口碑，舆情监测
客户反馈分析：产品评论分析，客服聊天分析
电影音乐书籍评鉴分析：推荐算法改进，自动总结评论
竞争对手分析：市场调研，行业趋势分析
股票市场预测：金融新闻，社交媒体

1. Text Cleaning: Remove punctuation, convert to lowercase, split words, remove stopwords

2. Sentiment Analysis: Count positive/negative words to determine sentiment


text = "This movie was absolutely amazing! The story was engaging\
 and the characters were great. However, some scenes felt unnecessary\
 and a bit boring. Overall, I loved it!"


text = text.lower()
# 去除标点（只保留字母和空格）
cleaned_text = ""
for char in text:
    if char.isalpha() or char.isspace():
        cleaned_text += char

words = cleaned_text.split()

# 去除简单的停用词（自己定义）
stop_words = ["the", "was", "and", "a", "it", "i", "some"]
filtered_words = []  # 创建一个空列表来存放保留的单词
for word in words:
    if word not in stop_words:  # 仅当单词不在停用词中时才添加
        filtered_words.append(word)

print("关键词：", filtered_words)


# 定义正面 & 负面词
positive_words = ["amazing","great","engaging","loved"]
negative_words = ["boring","unnecessary","bad","terrible"]

# 计算正面词的数量
pos_count = 0
for word in filtered_words:
    if word in positive_words:
        pos_count += 1

# 计算负面词的数量
neg_count = 0
for word in filtered_words:
    if word in negative_words:
        neg_count += 1


# 判断情感倾向
if pos_count > neg_count:
    sentiment = "正面"
elif neg_count > pos_count:
    sentiment = "负面"
else:
    sentiment = "中性"

print("情感分析结果：", sentiment)
print(f"（正面词: {pos_count} 个, 负面词: {neg_count} 个）")

早在2010年，就有学者指出，可以依靠Twitter公开信息的情感分析来预测股市的涨落，准确率高达87.6%！
这个简化版只用到了几个单词的集合，而专业情感分析库（如, TextBlob、VADER）通常包含上千个情感词，还能识别不同的语境。
在 VADER 词典中，每个单词都有情感强度评分（如 “amazing” 可能是 4.0，“good” 可能是 1.9）。本例中没有这种权重计算。
真实情感分析库可以处理否定词、语法结构等，如 "not bad" 可能被解析为正面，而不是简单地统计 “not” 和 “bad” 的出现次数。

5.3 Chinese Idiom Relay (成语接龙)

1. 加载成语词典。
2. 给定一个成语，找到可以接上的所有成语。
3. 从一个给定的成语开始，一直接下去，到不能接下去为止。


# 1. 加载成语词典
filename = 'idiom_dictionary.txt'
with open(filename, encoding="utf-8") as file_object:
    lines = file_object.readlines() #List


d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom = line[:endpoint].strip()
        pinyin_start = line.find("：", endpoint)
        pinyin_end =line.find("释义")
        each= line[pinyin_start+1: pinyin_end]
        pinyin_list = each.split()
        d_game[idiom] = pinyin_list
print(len(d_game))


# 2. 给定一个成语，找到可以接上的所有成语
idiom = input("请输入第一个成语\n")
char_4th = d_game[idiom][-1]
for x, y in d_game.items():
    if char_4th == y[0]:
        print(x)


# 3. 从一个给定的成语开始，一直接下去，到不能接下去为止。
idiom = input("请输入第一个成语\n")
enter=""
while enter!="q":
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            break
    enter=input("continue?")

有可能找不到能够接下去的成语了，但是程序不会结束，怎么解决呢？


idiom = input("请输入第一个成语\n")
enter=""
exist = True
while enter!="q" and exist:
    char_4th = d_game[idiom][-1]
    for x, y in d_game.items():
        if char_4th == y[0]:
            idiom = x
            print(idiom)
            exist = True
            break
        else:
            exist = False
    if exist:
        enter=input("continue?")
    else:
        print("对不起，没有成语了")

可进一步添加如下功能：
基本释义，谐音取词，人机对战，随机取词


# 基本释义功能
# 修改第一步
d_ex={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom=line[:endpoint].strip()
        pinyin_end=line.find("释义")
        pinyin_start=line.find("：", endpoint)
        explanation=line[pinyin_end:]
        d_ex[idiom]=explanation
words = input("请输入要查询的成语\n")
print(d_ex[words])


# 谐音取词
# 修改第一步
import unicodedata
d_game={}
for line in lines:
    if line!="\n":
        endpoint=line.find("拼音")
        idiom=line[:endpoint].strip()
        pinyin_start=line.find("：", endpoint)
        pinyin_end=line.find("释义")
        each=line[pinyin_start+1: pinyin_end]
        each=unicodedata.normalize('NFKD',each).encode('ascii','ignore').decode()
        pinyin_list=each.split()
        d_game[idiom]=pinyin_list

Summary

Strings
Reading: Python for Everybody

Strings Chapter 6