Python Programming

Lecture 13 Json, API and HTML Basics

13.1 Json and API

Write and Load

import json

numbers = [2, 3, 5, 7, 11, 13]
filename = 'numbers.json'
with open(filename, 'w') as f_obj:
    json.dump(numbers, f_obj)

import json

filename = 'numbers.json'
with open(filename) as f_obj:
    numbers = json.load(f_obj)
print(numbers)

JSON can't store every kind of Python value. It can contain values of only the following data types: strings, integers, floats, Booleans, lists, dictionaries, and NoneType. JSON cannot represent Python-specific objects, such as File objects, CSV Reader or Writer objects.

Web API

import requests

url = "http://t.weather.itboy.net/api/weather/city/101020100"
r = requests.get(url)
print(r.status_code)
response_dict = r.json()  

f = response_dict['data']
ff = f['forecast']
ff_today = ff[0]
ff_1 = ff[1]
ff_2 = ff[2]

def show(day):
    for x in day:
        print(x+': '+str(day[x]))
    print('\n')
show(ff_today)
show(ff_1)
show(ff_2)
Where to find Web API?
Public APIs, 聚合数据, 量化投资平台
Stock Market (股票市场)

url="http://img1.money.126.net/data/hs/kline/day/history/2020/1399001.json"
  • 代码为股票代码,上海股票前加0,如600756变成0600756,深圳股票前加1
  • 大盘指数数据查询:上证指数000001前加0,沪深300指数000300股票前加0,深证成指399001前加1,中小板指399005前加1,创业板指399006前加1
  • 是否复权,不复权为kline,复权为klinederc

贵州茅台


url="http://img1.money.126.net/data/hs/kline/day/history/2020/0600519.json"

import requests
import matplotlib.pyplot as plt
import pandas as pd

r = requests.get(url)
print(r.status_code)
response_dict = r.json() 
# print(response_dict)

data = response_dict['data']

for x in data[:5]:
    print("""日期: {},开盘价:{},收盘价:{},最高价:{}
        最低价:{},交易量:{},涨幅跌幅:{}""".format(x[0],\
        x[1], x[2], x[3], x[4], x[5], x[6]))

200
日期: 20200102,开盘价:1128.0,收盘价:1130.0,最高价:1145.06
        最低价:1116.0,交易量:14809916,涨幅跌幅:-4.48
日期: 20200103,开盘价:1117.0,收盘价:1078.56,最高价:1117.0
        最低价:1076.9,交易量:13031878,涨幅跌幅:-4.55
日期: 20200106,开盘价:1070.86,收盘价:1077.99,最高价:1092.9
        最低价:1067.3,交易量:6341478,涨幅跌幅:-0.05
日期: 20200107,开盘价:1077.5,收盘价:1094.53,最高价:1099.0
        最低价:1076.4,交易量:4785359,涨幅跌幅:1.53
日期: 20200108,开盘价:1085.05,收盘价:1088.14,最高价:1095.5
        最低价:1082.58,交易量:2500825,涨幅跌幅:-0.58
IMDb-API (刮削器)

import requests

api_key = 'Your API Key'

url=f"https://imdb-api.com/en/API/Top250Movies/{api_key}"
r = requests.get(url)
print(r.status_code)
response_dict = r.json() 

top250 = response_dict["items"]
print(top250[0])

{'id': 'tt0111161', 'rank': '1', 'title': 'The Shawshank Redemption', 
'fullTitle': 'The Shawshank Redemption (1994)', 'year': '1994', 'image': 
'https://m.media-amazon.com/images/M/MV5BNDE3ODcxYzMtY2YzZC00NmNlLWJiNDMtZD
ViZWM2MzIxZDYwXkEyXkFqcGdeQXVyNjAwNDUxODI@._V1_Ratio0.6716_AL_.jpg', 
'crew': 'Frank Darabont (dir.), Tim Robbins, Morgan Freeman', 
'imDbRating': '9.2', 'imDbRatingCount': '2739500'}

Downloading Images


import requests

url="https://m.media-amazon.com/images/\
M/MV5BNDE3ODcxYzMtY2YzZC00NmNlLWJiNDMtZ\
DViZWM2MzIxZDYwXkEyXkFqcGdeQXVyNjAwNDUx\
ODI@._V1_Ratio0.6716_AL_.jpg"
r = requests.get(url)
if r.status_code == 200:
    with open('The Shawshank Redemption.jpg', 'wb') as f:
        f.write(r.content)
else:
    print("download failed")

Top10


import requests

api_key = 'Your API Key'

url=f"https://imdb-api.com/en/API/Top250Movies/{api_key}"
r = requests.get(url)
print(r.status_code)
response_dict = r.json() 

top250 = response_dict["items"]
top10 = top250[:10]
print(top10)

for movie in top10:
    url = movie['image']
    title = movie['title']

    r = requests.get(url)
    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

In Theaters


import requests

api_key = 'Your API Key'

url=f"https://imdb-api.com/en/API/InTheaters/{api_key}"
r = requests.get(url)
print(r.status_code)
response_dict = r.json() 

in_theaters = response_dict["items"]

for movie in in_theaters:
    url = movie['image']
    title = movie['title']
    print(title)

    r = requests.get(url)
    if r.status_code == 200:
        with open(f'{title}.jpg', 'wb') as f:
            f.write(r.content)
    else:
        print("download failed")

13.2 HTML Basics

IP, DNS, URL, Hypertext (F12)
Requests: GET, POST
Response
HTML, CSS, JavaScript


HTML DOM
CSS Selector
id, class, tagname CSS Selector Reference

     
                    

#container{
    font-size: 50pt
}
.wrapper{
    color: #a51f1f
}
div > p{
    font-size: 20pt
}   
                    
爬虫
  • 获取网页 (requests)
  • 提取信息 (BeautifulSoup, Pyquery)
  • 保存数据 (csv, xlsx, MySQL, MongoDB)
  • JavaScript 渲染页面 (Selenium)
  • 八爪鱼,火车头
其他相关概念
  • 静态网页和动态网页
  • Ajax渲染
  • Cookies, Session
  • Proxy server
  • 分布式爬虫 (Scrapy)

Summary

  • Reading: Python Crash Course, Chapter 16.2, 17