First of all, sorry if the concepts “load” makes no sense, technically speaking. However, I’d like to know the following: is there anyway to load the information from a .csv or .json into a SQL databank or table?
I’m not sure I overlooked something in the SQL courses from Dataquest, but I think we only learn to create and delete tables (and manually insert data), but in the case I have this data stored in a file aready (in my case, a .csv and a json file), is there anyway to load the information from these files to a SQL table (without manually inserting the data from these files)?
If you use PgAdmin for PostgreSQL you can use Import/Export for csv files or SQLite Studio for SQLlite
This does not work with JSON as this data structure is not classic for SQL tables, although it is supported in PostgreSQL
You could also use copy_from() to upload files using psycopg2 for PostgreSQL
And don’t forget about Pandas. You can use Pandas to read both CSV and JSON files. And then upload it to the database using df.to_sql()
Thanks for the reply @moriturus7.
Unfortunately, when I tried to do what you suggested, it didn’t work. I got this error message. Just to emphasize, I tried this with the csv file, not with the json file.
The problem I’m facing with the json file is not related to this topic, but basically I get a memory error when I try to read it (apparently, 3 GB is too much for my computer to handle at one time). I tried searching for the solution on stackoverflow, but I saw was recommendations to ijson. However, the documentation of this library is not clear at all, so I’m stil trying to working on this.
If you have any other ideas how to read the files, please let me know!
You need to see the details of the error to understand why you can’t upload a csv file using import. I can think of several reasons.
1.your file has headers and you didn’t specify this in the import settings - as a result, you write rows to columns with other data types.
2. You created a csv file using excel, excel uses ; as a separator, instead of ,
3. In your csv file are incorrect values.
There may be other reasons, so you need to look at the details of the error.
I can offer you a “Lazy” way if you want to handle JSON and not search for new libraries or write processing logic for your JSON file yourself. Use MongoDB as an intermediate database. It allows you to read JSON files and save their contents in CSV. Perhaps this will be the fastest solution, without the need for a deep dive into the problem.
Most every relational database management system (RDBMS) has import/execute excutable (pg_restore or mysqlimport for example) and the GUI Admin tool that might come from the same vendor (PGAdmin for example) will talk to that tool. So the thing would be to see if the tool for your platform supported JSON and CSV by looking at that manual page (for example https://www.postgresql.org/docs/13/app-pgdump.html) Now these tools are almost guaranteed to support CSV and almost guaranteed to not support JSON, read on for why and what you could do.
CSV is 2 dimmensional much like DataFrame and a relational table. It’s also terser and smaller to deal with because JSON (or XML or YAML) are annotated data formats i.e. 1 line of metadata and a million elements is a lot smaller than a million rows of name value pairs.
JSON has nested values which could contain whole hierarchies. Τhe DBMS vendor don’t seem to usually support how to map and load these arbitrary hierarchy with their bulk load executables.
So you could program a data loading script to do this with a language like Python and SQ. But also you could use third party software for Extract Load Transform (i.e. data pipelines) which might simplify the development of these loaders for example they might have a JSON to SQL table wizard.
The MongoDB suggestion from @moriturus7 is a clever one though. I’ve often stuck data into one database system that had the hookups to go from A to C in this manner. It’s easy to transform data once it’s in a database system compared to the limited options in a load tool. Personally, I think it’s even easier to transform data in relational system. It’s possible to store a JSON string in a SQL column either as a text or better as a dedicated JSON data type, so you could have a load table that you shred out the JSON to two dimensions in one or more final tables However, many people would probably rather complex JSON manipulation in a general language like Python or Ruby.