Simple Web Scraping in Python

A web scraping python script with pandas and requests packages

I created a python script where it accesses to a website from which I extracted some baseball data.

Pandas has a function called “read_html”, so you don’t have to use other packages like BeatifulSoup!

The structure of the script is the following:

Get html information with the requests package.
Read html in pandas.
Output the result to a csv file.

import requests
import pandas as pd

URL = 'https://www.baseball-almanac.com/hitting/hihr5.shtml'

def get_table(html, table):
	df = pd.read_html(html, attrs={'class': 'boxed'}, header=1)[0]
	return df

def main():
	html = requests.get(URL).text
	df = get_table(html, {'class': 'boxed'})
	df.to_csv('HR Year-by-Year Leaders.csv', index=None)

if __name__ == '__main__':
	main()

Hope this is helpful in some way for those who are learning / using Python.

Source code in my Github repo

2 thoughts on “Simple Web Scraping in Python”

Hi, thanks for highlighting that feature. I got a question about your code: why do you define “table” as an argument in the function that creates the DataFrame-object?

Yuki says:

April 9, 2022 at 12:15 pm

My intention was that since we’re scraping tables on the web page, I called it table. You can feel free to change it however you want though!

Reply

A web scraping python script with pandas and requests packages

2 thoughts on “Simple Web Scraping in Python”

Leave a Reply Cancel reply