Originally Posted by
cdcox
Minor update: I put in about 7 hours of good work over the weekend. The effort went into updating the player availability table. It is not done yet.
Nitty gritty details:
1. There are two sets of player IDs in the database; 1) our internal IDs and 2) IDs assigned by our former data source. We need to convert everything to run only off of our internal IDs. I started that today with the player availability table. This required some programming in addition to data wrangling.
2. Player availability resolution deals with issues of identifying unique identities for each individual in the NFL. The NFL typically identifies players as P.Mahomes for Patrick Mahomes. Simple, right?
a) Both Justin Watson and Jaylen Watson play for the Chiefs. The NFL parses these as Ju. Watson and Ja. Watson. Not so easy all of the sudden.
b) Curve ball. There are two Michael Carters in the NFL. They both play for the Jets. Except one of the Michael Carters played for two teams in 2023.
c) Sr., Jr., II, III, IV, and yes even V are separate suffixes of players in the NFL. These cases require special handling.
d) Don?t get me started on hyphenated last names.
e) two players changed their last names in the last few years.
f) I try to include team affiliation to increase certainty of identity. In-season change of teams throws that into doubt. They are very frequent.
g) Nicknames. Decobie -> Cobie. Which do I search for?
Everyone of these cases requires manual curation. I?ve greatly automated the process but it still requires a lot of manual curation.
Expect another update next weekend.