# I believe the last solution of the I/O Bound Program is wrong

Dear TAs,

I believe there is a mistake in the last exercise within the `I/O Bound Program` module. The related links are given here at:
Description
Instruction

The Best Fielder is defined by someone with the bigger RF score, which has a formula of `RF = (A + PO) / G`, where G must be > 100.

So the project wanted us to find the top 20 fielders with SQL, and here is the final solution:

``````    query = '''
SELECT
(CAST(A AS FLOAT) + PO) / G as RF,
playerID
FROM Fielding
GROUP BY Fielding.playerID
HAVING G > 100
ORDER BY RF desc
LIMIT 20;
'''
``````

However, I believe the answer is incorrect. This is because each player can has more than one RF scores. Therefore we need to extract the `MAX(RF)` score from each player. Otherwise the clause `(CAST(A AS FLOAT) + PO)/G` will simply return `None` if that person has more than one entry. So I believe the correct solution should be:

``````    query = '''
SELECT
MAX((CAST(A AS FLOAT) + PO) / G) as RF,
playerID
FROM Fielding
GROUP BY Fielding.playerID
HAVING G > 100
ORDER BY RF desc
LIMIT 20;
'''
``````

The answers for the Best Pitcher and Best Batter are also questionable.
I believe the answers should be:

``````    # best batter
query = '''
SELECT
MAX((CAST(H AS FLOAT) + BB  + HBP) / (AB + BB + HBP + SF) +
(H + "2B" + 2*"3B" + 3*HR) / AB) as OPS,
playerID
FROM Batting
GROUP BY playerID
HAVING AB > 100
ORDER BY OPS desc
LIMIT 20;
'''

# best pitcher
query = '''
SELECT
MIN((13 * CAST(HR AS FLOAT) + 3*BB - 2*SO) / IPOuts + 3.2) as FIP,
playerID
FROM Pitching
GROUP BY playerID
HAVING IPOuts > 100
ORDER BY FIP asc
LIMIT 20;
'''
``````

Correct me if I am wrong.
Thanks

Hi @scoodood,

I agree with your suggestion that using a `MAX` makes more sense for the exercise. However, the reason why you see `None` results, it not that there are several rows with the same player id.

Using ` (CAST(A AS FLOAT) + PO) / G` will not always return `None` if there are several players with the same id. The real reason why it returns `None`, in some cases, is that the last entry having that player id has either `A`, `PO` or `G` defined to none.

For example if you do this:

``````conn = sqlite3.connect("lahman2015.sqlite", check_same_thread=False)
cur = conn.cursor()
cur.execute("""
SELECT playerID, A, PO, G
FROM Fielding
WHERE G > 100 AND playerID = 'aaronha01'
""")
res = cur.fetchall()
print(len(res))
for r in res:
print(r)

cur.execute("""
SELECT playerID, (CAST(A AS FLOAT) + PO) / G as RF
FROM Fielding
WHERE G > 100 AND playerID = 'aaronha01'
GROUP BY playerID
""")
res = cur.fetchall()
print(len(res))
for r in res:
print(r)
``````

You will see the following result:

``````34
('aaronha01', 4, 205, 105)
('aaronha01', 5, 223, 116)
('aaronha01', 9, 254, 126)
('aaronha01', 8, 212, 104)
('aaronha01', 17, 316, 152)
('aaronha01', 17, 313, 152)
('aaronha01', 9, 346, 150)
('aaronha01', 12, 305, 153)
('aaronha01', 11, 227, 119)
('aaronha01', 12, 261, 152)
...
('aaronha01', 66, 968, 109)
('aaronha01', 5, 206, 105)
('aaronha01', None, None, 128)  # <---- this is the problem
1
('aaronha01', None)
``````

When you donâ€™t use an aggregate function, `GROUP BY` will use the last result. As you can see, the last entry for player with id `'aaronha01'` has `None` values so that is why in this case it returns no results. If you add ` AND A != 'None'` to the queries, you will get the result for `('aaronha01', 5, 206, 105)` instead (the previous to last entry).

Hi @Francois,
You are absolutely correct that `(CAST(A AS FLOAT) + PO) / G` will simply return the last result of the `GROUP BY`, instead of `None`.

Thanks for confirming my suggested answers.