Menu
Stuff by Yuki
  • Home
  • Python
  • Power BI
  • Tableau
  • Community
    • Makeover Monday
    • Workout Wednesday
  • About
  • Contact
Stuff by Yuki

Simple Web Scraping in Python

Posted on May 13, 2021

A web scraping python script with pandas and requests packages

I created a python script where it accesses to a website from which I extracted some baseball data.

Pandas has a function called “read_html”, so you don’t have to use other packages like BeatifulSoup!

The structure of the script is the following:

  1. Get html information with the requests package.
  2. Read html in pandas.
  3. Output the result to a csv file.
import requests
import pandas as pd

URL = 'https://www.baseball-almanac.com/hitting/hihr5.shtml'

def get_table(html, table):
	df = pd.read_html(html, attrs={'class': 'boxed'}, header=1)[0]
	return df

def main():
	html = requests.get(URL).text
	df = get_table(html, {'class': 'boxed'})
	df.to_csv('HR Year-by-Year Leaders.csv', index=None)

if __name__ == '__main__':
	main()

Hope this is helpful in some way for those who are learning / using Python.

Source code in my Github repo

2 thoughts on “Simple Web Scraping in Python”

  1. Jesse says:
    February 20, 2022 at 4:24 am

    Hi, thanks for highlighting that feature. I got a question about your code: why do you define “table” as an argument in the function that creates the DataFrame-object?

    Reply
    1. Yuki says:
      April 9, 2022 at 12:15 pm

      My intention was that since we’re scraping tables on the web page, I called it table. You can feel free to change it however you want though!

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • What are Power BI Dataflows?
  • Calculate the Max Sales Amount for Product
  • Convert DataFrame to Series in Polars
  • Pandas vs Polars – Speed Comparison
  • Polars with DuckDB – Using SQL in Polars

Popular Posts

  • A Running Total Calculation with Quick Measure in Power BI
  • How To Copy And Paste Report Page in Power BI
  • Year-Over-Year Calculation: Time Intelligence in Power BI
  • How to Fill Dates Between Start Date and End Date in Power BI (Power Query)
  • Network Visualizations in Python

connect with me

  • LinkedIn
  • Twitter
  • Github
©2023 Stuff by Yuki | Powered by SuperbThemes & WordPress