CSE258 Homework 1
Name: Yi Rong
PID: REDACTED
Email: yrong@ucsd.edu
Preparation
- Activate the virualenv
- Install necessary dependencies for this hw
- Import packages
- Global functions
Requirement already satisfied: scipy in ./venv/lib/python3.8/site-packages (1.7.1)
Requirement already satisfied: sklearn in ./venv/lib/python3.8/site-packages (0.0)
Requirement already satisfied: matplotlib in ./venv/lib/python3.8/site-packages (3.4.3)
Requirement already satisfied: numpy<1.23.0,>=1.16.5 in ./venv/lib/python3.8/site-packages (from scipy) (1.21.2)
Requirement already satisfied: scikit-learn in ./venv/lib/python3.8/site-packages (from sklearn) (1.0)
Requirement already satisfied: pillow>=6.2.0 in ./venv/lib/python3.8/site-packages (from matplotlib) (8.3.2)
Requirement already satisfied: cycler>=0.10 in ./venv/lib/python3.8/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv/lib/python3.8/site-packages (from matplotlib) (1.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in ./venv/lib/python3.8/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: python-dateutil>=2.7 in ./venv/lib/python3.8/site-packages (from matplotlib) (2.8.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./venv/lib/python3.8/site-packages (from scikit-learn->sklearn) (3.0.0)
Requirement already satisfied: joblib>=0.11 in ./venv/lib/python3.8/site-packages (from scikit-learn->sklearn) (1.1.0)
Requirement already satisfied: six in ./venv/lib/python3.8/site-packages (from cycler>=0.10->matplotlib) (1.16.0)
Regression (week 1)
First, using the book review data, let’s see whether ratings can be predicted as a function of review length, or by using temporal features associated with a review.
Regression helper functions
Q2
{'user_id': '8842281e1d1347389f2ab93d60773d4d',
'book_id': '18245960',
'review_id': 'dfdbb7b0eb5a7e4c26d59a937e2e5feb',
'rating': 5,
'review_text': 'This is a special book. It started slow for about the first third, then in the middle third it started to get interesting, then the last third blew my mind. This is what I love about good science fiction - it pushes your thinking about where things can go. \n It is a 2015 Hugo winner, and translated from its original Chinese, which made it interesting in just a different way from most things I\'ve read. For instance the intermixing of Chinese revolutionary history - how they kept accusing people of being "reactionaries", etc. \n It is a book about science, and aliens. The science described in the book is impressive - its a book grounded in physics and pretty accurate as far as I could tell. Though when it got to folding protons into 8 dimensions I think he was just making stuff up - interesting to think about though. \n But what would happen if our SETI stations received a message - if we found someone was out there - and the person monitoring and answering the signal on our side was disillusioned? That part of the book was a bit dark - I would like to think human reaction to discovering alien civilization that is hostile would be more like Enders Game where we would band together. \n I did like how the book unveiled the Trisolaran culture through the game. It was a smart way to build empathy with them and also understand what they\'ve gone through across so many centuries. And who know a 3 body problem was an unsolvable math problem? But I still don\'t get who made the game - maybe that will come in the next book. \n I loved this quote: \n "In the long history of scientific progress, how many protons have been smashed apart in accelerators by physicists? How many neutrons and electrons? Probably no fewer than a hundred million. Every collision was probably the end of the civilizations and intelligences in a microcosmos. In fact, even in nature, the destruction of universes must be happening at every second--for example, through the decay of neutrons. Also, a high-energy cosmic ray entering the atmosphere may destroy thousands of such miniature universes...."',
'date_added': 'Sun Jul 30 07:44:10 -0700 2017',
'date_updated': 'Wed Aug 30 00:00:26 -0700 2017',
'read_at': 'Sat Aug 26 12:05:52 -0700 2017',
'started_at': 'Tue Aug 15 13:23:18 -0700 2017',
'n_votes': 28,
'n_comments': 1}
[([1, 2086], 5),
([1, 1521], 5),
([1, 1519], 5),
([1, 1791], 4),
([1, 1762], 3),
([1, 470], 5),
([1, 823], 5),
([1, 532], 5),
([1, 616], 4),
([1, 548], 5)]
Theta0 = 3.685681, Theta1 = 0.00006874
MSE = 1.552209
Q3
({0, 1, 2, 3, 4, 5, 6},
{2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017})
Feature vectors for the first two examples:
[[1, 2086, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[1, 1521, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]]
Q4
Using weekday and year directly as features:
MSE = 1.536774
Using One-Hot Encoding from Q3:
MSE = 1.512358
Q5
Using weekday and year directly as features:
MSE on train set = 1.526952
MSE on test set = 1.545680
One-Hot Encoding from Q3:
MSE on train set = 1.499324
MSE on test set = 1.518546
Q6
For :
Taking the derivative of MAE w.r.t. :
Setting to be 0 to find best value for , we get the condition that these two sets and should have the same size/cardinality in order for to be zero. i.e.
Therefore, by definition should be the median of label y, in which case the MAE value is minimized.
Classification (week 2)
In this question, using the beer review data, we’ll try to predict ratings (positive or negative) based on char- acteristics of beer reviews. Load the 50,000 beer review dataset, and construct a label vector by considering whether a review score is four or above
Q7
{'review/appearance': 2.5,
'beer/style': 'Hefeweizen',
'review/palate': 1.5,
'review/taste': 1.5,
'beer/name': 'Sausa Weizen',
'review/timeUnix': 1234817823,
'beer/ABV': 5.0,
'beer/beerId': '47986',
'beer/brewerId': '10325',
'review/timeStruct': {'isdst': 0,
'mday': 16,
'hour': 20,
'min': 57,
'sec': 3,
'mon': 2,
'year': 2009,
'yday': 47,
'wday': 0},
'review/overall': 1.5,
'review/text': 'A lot of foam. But a lot.\tIn the smell some banana, and then lactic and tart. Not a good start.\tQuite dark orange in color, with a lively carbonation (now visible, under the foam).\tAgain tending to lactic sourness.\tSame for the taste. With some yeast and banana.',
'user/profileName': 'stcules',
'review/aroma': 2.0}
[([1, 262], 0),
([1, 338], 0),
([1, 396], 0),
([1, 401], 0),
([1, 1145], 1),
([1, 728], 0),
([1, 471], 0),
([1, 853], 0),
([1, 472], 1),
([1, 1035], 1)]
True Positive: 14201
True Negative: 5885
False Positive: 10503
False Negative: 19411
TPR: 0.422498
TNR: 0.359104
FPR: 0.640896
FNR: 0.577502
Balanced Error Rate: 0.468303
Q8
[(array([0.19459931, 0.80540069]), 1),
(array([0.19643684, 0.80356316]), 1),
(array([0.20622631, 0.79377369]), 1),
(array([0.21202257, 0.78797743]), 1),
(array([0.21655241, 0.78344759]), 1),
(array([0.22127384, 0.77872616]), 1),
(array([0.22724752, 0.77275248]), 0),
(array([0.23156561, 0.76843439]), 1),
(array([0.23498476, 0.76501524]), 1),
(array([0.23606837, 0.76393163]), 0)]
No handles with labels found to put in legend.
Q9
Precision @ K ∈ {1, 100, 10000}:
K=1: 1.000000
K=100: 0.750000
K=10000: 0.619600