I am planning to build a datalake using hadoop and my data sources are multiple SQL server and structured log files from different system. Could you kindly guide me on how to start with data ingestion and build a data lake from these data sources.
Have a look at Scoop and NIFI tools from Apache.
Also, this article and this tools review may be helpful for start.
Thanks for the information, this really helps!!
My usecase is bit different, i don’t have access to the SQL server, rather I have data file (*.mdf) which is provided every day. I need to be able to extract tables data from this file. Is that possible?
Well, one way could be rising your own local SQL server, restoring data from .mdf file into it and then using any of the tools above to ingest into Hadoop.