“I have this one flat file and I need it in DWH” How often do you get this kind of request? Or maybe you’ve had this kind of request yourself?

No matter what, this is a common issue connected to data warehousing, and it sometimes seems hard to tackle. So, the next time you encounter it, try the following approach.

Ask yourself two questions:

  1. Is this new data or processed data from the data warehouse?
  2. Is it a static file, or does it need periodic refreshes?

Once you know that, there are three possible ways to tackle the flat file issue:

  1. New data + static file: Create a separate schema for such files. It’s also helpful to give analysts create/update permissions there so they can serve themselves.
  2. New data + periodic updates: Make it part of the ETL process, even if a hosted spreadsheet is the source. With human input, the pipeline may fail from time to time, but it’s still better than a manual process.
  3. Derived data: Extract the process behind the data from the requester and make it part of your data modeling.

“But it violates rule X and principle Y” I know and I don’t care. You shouldn’t either.

Data warehouses (marts, lakes, and all the stuff you have there) exist for people and business, not the other way around. Whenever a rule or principle holds you back from providing value to business stakeholders, it also prevents you from fulfilling your most basic responsibility; therefore, it’s bad. Keep it clean by providing analysts and business with a clear and simple process to follow whenever a flat file needs adding to the data infrastructure.

And let me know your way of handling such requests.