Python Programming

Lecture 13 Json, API and HTML Basics

13.1 Json and API

Write and Load

import json

numbers = [2, 3, 5, 7, 11, 13]
filename = 'numbers.json'
with open(filename, 'w') as f_obj:
    json.dump(numbers, f_obj)

import json

filename = 'numbers.json'
with open(filename) as f_obj:
    numbers = json.load(f_obj)
print(numbers)

JSON can't store every kind of Python value. It can contain values of only the following data types: strings, integers, floats, Booleans, lists, dictionaries, and NoneType. JSON cannot represent Python-specific objects, such as File objects, CSV Reader or Writer objects.

Web API

import requests

url = "http://t.weather.itboy.net/api/weather/city/101020100"
r = requests.get(url)
print(r.status_code)
response_dict = r.json()  

f = response_dict['data']
ff = f['forecast']
ff_today = ff[0]
ff_1 = ff[1]
ff_2 = ff[2]

def show(day):
    for x in day:
        print(x+': '+str(day[x]))
    print('\n')
show(ff_today)
show(ff_1)
show(ff_2)
Where to find Web API?
Public APIs, 聚合数据, 量化投资平台
TMDB-API

import requests

api_access = 'Your API Key'

page = 1
url = f"https://api.tmdb.org/3/movie/\
top_rated?language=en-US&page={page}"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()
# print(response_dict)

movies=response_dict["results"]
print(len(movies))

for key, value in movies[0].items():
    print(f"{key}: {value}")

adult: False
backdrop_path: /zfbjgQE1uSd9wiPTX4VzsLi0rGG.jpg
genre_ids: [18, 80]
id: 278
original_language: en
original_title: The Shawshank Redemption
overview: Imprisoned in the 1940s for the double murder of his wife and her lover, 
upstanding banker Andy Dufresne begins a new life at the Shawshank prison, 
where he puts his accounting skills to work for an amoral warden. 
During his long stretch in prison, Dufresne comes to be admired by the other 
inmates -- including an older prisoner named Red -- 
for his integrity and unquenchable sense of hope.
popularity: 115.576
poster_path: /9cqNxx0GxF0bflZmeSMuL5tnGzr.jpg
release_date: 1994-09-23
title: The Shawshank Redemption
video: False
vote_average: 8.705
vote_count: 26204

Downloading Images


poster = movies[0]['poster_path']
title = movies[0]['title']
img_url = f"https://image.tmdb.org/t/p/w500{poster}"
r = requests.get(img_url, headers=headers)

if r.status_code == 200:
    with open(f'{title}.jpg', 'wb') as f:
        f.write(r.content)
else:
    print("download failed")

Top10


import requests
api_access = 'Your API Key'

page = 1
url = f"https://api.tmdb.org/3/movie/\
top_rated?language=en-US&page={page}"
headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()

movies=response_dict["results"]
top10 = movies[:10]

for movie in top10:
    poster = movie['poster_path']
    title = movie['title']
    img_url = f"https://image.tmdb.org/t/p/w500{poster}"
    r = requests.get(img_url, headers=headers)

    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

Now Playing


import requests
api_access = 'Your API Key'

page = 1
url = f"https://api.tmdb.org/3/movie/\
now_playing?language=en-US&page={page}"
headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()

movies=response_dict["results"]
top10 = movies[:10]

for movie in top10:
    poster = movie['poster_path']
    title = movie['title']
    img_url = f"https://image.tmdb.org/t/p/w500{poster}"
    r = requests.get(img_url, headers=headers)

    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

13.2 HTML Basics

IP, DNS, URL, Hypertext (F12)
Requests: GET, POST
Response
HTML, CSS, JavaScript


HTML DOM
CSS Selector
id, class, tagname CSS Selector Reference

     
                    

#container{
    font-size: 50pt
}
.wrapper{
    color: #a51f1f
}
div > p{
    font-size: 20pt
}   
                    
爬虫
  • 获取网页 (requests)
  • 提取信息 (BeautifulSoup, Pyquery)
  • 保存数据 (csv, xlsx, MySQL, MongoDB)
  • JavaScript 渲染页面 (Selenium)
  • 八爪鱼,火车头
其他相关概念
  • 静态网页和动态网页
  • Ajax渲染
  • Cookies, Session
  • Proxy server
  • 分布式爬虫 (Scrapy)

Summary

  • Reading: Python Crash Course, Chapter 16.2, 17