Python Programming

Lecture 17 Json, API and HTML Basics

17.1 Json and API

Write and Load


import json

numbers = [2, 3, 5, 7, 11, 13]
filename = 'numbers.json'
with open(filename, 'w') as f_obj:
    json.dump(numbers, f_obj)


import json

filename = 'numbers.json'
with open(filename) as f_obj:
    numbers = json.load(f_obj)
print(numbers)

JSON can't store every kind of Python value. It can contain values of only the following data types: strings, integers, floats, Booleans, lists, dictionaries, and NoneType. JSON cannot represent Python-specific objects, such as File objects, CSV Reader or Writer objects.

Web API


import requests

url = "http://t.weather.itboy.net/api/weather/city/101020100"
r = requests.get(url)
print(r.status_code)
response_dict = r.json()


f = response_dict['data']
ff = f['forecast']
ff_today = ff[0]
ff_1 = ff[1]
ff_2 = ff[2]

def show(day):
    for x in day:
        print(x+': '+str(day[x]))
    print('\n')
show(ff_today)
show(ff_1)
show(ff_2)

Where to find Web API?

Public APIs, 聚合数据, 量化投研

Stock Market (股票市场)


url="http://img1.money.126.net/data/hs/kline/day/history/2020/1399001.json"

代码为股票代码，上海股票前加0，如600756变成0600756，深圳股票前加1
大盘指数数据查询：上证指数000001前加0，沪深300指数000300股票前加0，深证成指399001前加1，中小板指399005前加1，创业板指399006前加1
是否复权，不复权为kline，复权为klinederc

贵州茅台


url="http://img1.money.126.net/data/hs/kline/day/history/2020/0600519.json"


import requests
import matplotlib.pyplot as plt
import pandas as pd

r = requests.get(url)
print(r.status_code)
response_dict = r.json() 
# print(response_dict)


data = response_dict['data']

for x in data[:5]:
    print("""日期: {}，开盘价：{}，收盘价：{}，最高价：{}
        最低价：{}，交易量：{}，涨幅跌幅：{}""".format(x[0],\
        x[1], x[2], x[3], x[4], x[5], x[6]))


200
日期: 20200102，开盘价：1128.0，收盘价：1130.0，最高价：1145.06
        最低价：1116.0，交易量：14809916，涨幅跌幅：-4.48
日期: 20200103，开盘价：1117.0，收盘价：1078.56，最高价：1117.0
        最低价：1076.9，交易量：13031878，涨幅跌幅：-4.55
日期: 20200106，开盘价：1070.86，收盘价：1077.99，最高价：1092.9
        最低价：1067.3，交易量：6341478，涨幅跌幅：-0.05
日期: 20200107，开盘价：1077.5，收盘价：1094.53，最高价：1099.0
        最低价：1076.4，交易量：4785359，涨幅跌幅：1.53
日期: 20200108，开盘价：1085.05，收盘价：1088.14，最高价：1095.5
        最低价：1082.58，交易量：2500825，涨幅跌幅：-0.58

TMDB-API (刮削器 )


import requests

api_access = 'Your API Key'


page = 1
url = f"https://api.themoviedb.org/3/movie/\
top_rated?language=en-US&page={page}"

headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()
# print(response_dict)


movies=response_dict["results"]

print(movies[0])
for key, value in movies[0].items():
    print(f"{key}: {value}")


adult: False
backdrop_path: /tmU7GeKVybMWFButWEGl2M4GeiP.jpg
genre_ids: [18, 80]
id: 238
original_language: en
original_title: The Godfather
overview: Spanning the years 1945 to 1955, a chronicle of the fictional
 Italian-American Corleone crime family. When organized crime family 
 patriarch, Vito Corleone barely survives an attempt on his life, his 
 youngest son, Michael steps in to take care of the would-be killers, 
 launching a campaign of bloody revenge.
popularity: 104.261
poster_path: /3bhkrj58Vtu7enYsRolD1fZdja1.jpg
release_date: 1972-03-14
title: The Godfather
video: False
vote_average: 8.7
vote_count: 17922

Downloading Images



poster = movies[0]['poster_path']
img_url = f"https://image.tmdb.org/t/p/w500{poster}"
r = requests.get(img_url, headers=headers)
if r.status_code == 200:
    with open('The Godfather.jpg', 'wb') as f:
        f.write(r.content)
else:
    print("download failed")

Top10


import requests

api_access = 'Your API Key'


page = 1
url = f"https://api.themoviedb.org/3/movie/\
top_rated?language=en-US&page={page}"
headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()

movies=response_dict["results"]
top10 = movies[:10]


for movie in top10:
    poster = movie['poster_path']
    title = movie['title']
    img_url = f"https://image.tmdb.org/t/p/w500{poster}"
    r = requests.get(img_url, headers=headers)
    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

Now Playing


import requests

api_access = 'Your API Key'


page = 1
url = f"https://api.themoviedb.org/3/movie/\
now_playing?language=en-US&page={page}"
headers = {
    "accept": "application/json",
    "Authorization": f"Bearer {api_access}"
}
response = requests.get(url, headers=headers)
response_dict = response.json()

movies=response_dict["results"]
top10 = movies[:10]


for movie in top10:
    poster = movie['poster_path']
    title = movie['title']
    img_url = f"https://image.tmdb.org/t/p/w500{poster}"
    r = requests.get(img_url, headers=headers)
    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

17.2 HTML Basics

IP, DNS, URL, Hypertext (F12)

Requests: GET, POST

Response

HTML, CSS, JavaScript

HTML DOM

CSS Selector

id, class, tagname CSS Selector Reference


#container{
    font-size: 50pt
}
.wrapper{
    color: #a51f1f
}
div > p{
    font-size: 20pt
}

爬虫

获取网页 (requests)
提取信息 (BeautifulSoup, Pyquery)
保存数据 (csv, xlsx, MySQL, MongoDB)
JavaScript 渲染页面 (Selenium)
八爪鱼，火车头

其他相关概念

静态网页和动态网页
Ajax渲染
Cookies, Session
Proxy server
分布式爬虫 (Scrapy)

Summary

Reading: Python Crash Course, Chapter 16.2, 17