Okay, so yesterday I was messing around trying to see if I could get some cool insights from a tennis match. I picked a pretty interesting one: Paul vs. Fognini. Figured there’d be some good data to dig into.

First thing I did, naturally, was grab the match data. Scraped it off some tennis stats site – you know the ones. Used a bit of Python and Beautiful Soup, nothing too fancy. Just needed to get the basic stuff: points won, unforced errors, aces, double faults, that sort of thing.
Then, I started cleaning up the data. This is always the most boring part, right? Getting rid of all the garbage, making sure the numbers actually make sense. Found a couple of weird entries, probably typos or something. Just smoothed them out as best I could. Gotta have clean data if you want to get anything useful out of it.
Next up, I thought I’d try and visualize some of the key stats. Tossed the data into a Pandas DataFrame, then used Matplotlib to make some simple bar charts. Wanted to see the head-to-head comparison of things like first serve percentage, breakpoints converted, and total points won. That gave me a quick overview of where each player was strong or weak in this particular match.
I wanted to get a bit more detailed, so I looked at the unforced errors. I think it’s really crucial. Split them by set, too, to see if there was any shift over the course of the match. Found that one of the players (can’t recall exactly which one now) had a really bad second set with loads of errors. Probably a turning point in the match, looking back at it.
After that, I tried to calculate some derived stats. You know, stuff that isn’t directly in the raw data. Like, I calculated the “points won on return” percentage, and the “service points won” percentage for each player. This helped me get a better handle on their strengths on serve versus return. It’s not just about aces, but also about consistent serving.

Finally, I thought I’d see if I could predict the winner using some simple machine learning. Just threw the data into a Scikit-learn logistic regression model. Trained it on some past matches (had to grab some more data for this, of course). The model predicted the correct winner in this case, but honestly, it was probably just luck. The dataset was way too small to make any real conclusions.
So, yeah, that’s pretty much it. It was a fun little project. Nothing groundbreaking, but it was a good way to kill an afternoon and play around with some tennis data. Maybe I’ll try a different match next time.
Things I learned:
- Data cleaning is always a pain.
- Visualization helps you quickly understand the data.
- Simple machine learning models can be fun, but take the results with a grain of salt if you do not have big data.