Lahman Baseball Database
Description of the Data
The data set contains many tables about baseball performance statistics, back to 1871.
We build this data into a sqlite database, as described below.
Transformations to the original data source
First we unzip the downloaded file and extract the csv files that we need. We remove the head from each file because we already loaded the headers of each table into our sql script, given below.
unzip v2023.1.zip
mv baseballdatabank-2023.1/core/*.csv ./
mv baseballdatabank-2023.1/contrib/*.csv ./
rm v2023.1.zip
rm -rf baseballdatabank-2023.1
sed -i '1d' *.csv
Then we create a new sqlite database:
sqlite3 lahman.db
Inside sqlite3, we do the following sql import statements, with many thanks to WebucatorTraining for this script, which we modified:
.read mdw-updated-lahman-mysql.sql
.separator ,
.import AwardsManagers.csv AwardsManagers
.import AwardsPlayers.csv AwardsPlayers
.import AwardsShareManagers.csv AwardsShareManagers
.import AwardsSharePlayers.csv AwardsSharePlayers
.import CollegePlaying.csv CollegePlaying
.import HallOfFame.csv HallOfFame
.import Salaries.csv Salaries
.import Schools.csv Schools
.import AllstarFull.csv AllstarFull
.import Appearances.csv Appearances
.import Batting.csv Batting
.import BattingPost.csv BattingPost
.import Fielding.csv Fielding
.import FieldingOF.csv FieldingOF
.import FieldingOFsplit.csv FieldingOFsplit
.import FieldingPost.csv FieldingPost
.import HomeGames.csv HomeGames
.import Managers.csv Managers
.import ManagersHalf.csv ManagersHalf
.import Parks.csv Parks
.import People.csv People
.import Pitching.csv Pitching
.import PitchingPost.csv PitchingPost
.import SeriesPost.csv SeriesPost
.import Teams.csv Teams
.import TeamsFranchises.csv TeamsFranchises
.import TeamsHalf.csv TeamsHalf
then type Control-D to quit sqlite3
back in the bash shell, type:
rm *.csv
rm mdw-updated-lahman-mysql.sql