When I run the solution the query in the StackExchange Data Explorer
for the following project it returns 7147 rows whereas in the solution, according to the .info
of the dataframe based on the .csv
from the query, it returned 8839 rows.
Is this due to changes on StackExchange Data Science data explorer or is there potentially something that I am doing wrong? I am inclined to believe the former as I copied the query from solution and got the 7147 rows (therefore this occurs before I read the .csv into a df).
My Code: Transact SQL on StackExchange Data Science Data Explorer
SELECT Id, PostTypeID, CreationDate, Score, Viewcount, Tags, AnswerCount, FavoriteCount
FROM Posts
WHERE YEAR(CreationDate) = 2019 AND PostTypeID = 1
My Code (Jupyter Notebook):
import pandas as pd
questions = pd.read_csv("questions2019.csv", parse_dates = ["CreationDate"])
questions.info()
What I expected to happen:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8839 entries, 0 to 8838
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 8839 non-null int64
1 CreationDate 8839 non-null datetime64[ns]
2 Score 8839 non-null int64
3 ViewCount 8839 non-null int64
4 Tags 8839 non-null object
5 AnswerCount 8839 non-null int64
6 FavoriteCount 1407 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(4), object(1)
memory usage: 483.5+ KB
What actually happened:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7147 entries, 0 to 7146
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 7147 non-null int64
1 PostTypeID 7147 non-null int64
2 CreationDate 7147 non-null datetime64[ns]
3 Score 7147 non-null int64
4 Viewcount 7147 non-null int64
5 Tags 7147 non-null object
6 AnswerCount 7147 non-null int64
7 FavoriteCount 1546 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(5), object(1)
memory usage: 446.8+ KB
I know it doesn’t really affect how I approach the rest of the project but obviously is annoying that I cannot check my results against the solution.