The Essence of Metadata and why you will need it

job_application

In my entry level data analysis job search I have found out that there are companies providing interesting solutions including metadata repositories. I am convinced that everyone new in data should be interested the topic.

In today’s world we are surrounded by masses of data. The companies who can manage their databases efficiently are those gaining the competitive advantage. But what if I tell you that just ‘data’ is not enough and data about data makes real sense? I like the following view, expressed by Jeffrey Pomerantz in his book:

I believed then, and still do, that the first course in any Information Science curriculum should be a course on metadata: almost everything else in the field depends on metadata, and the subject provides a hook into most of the issues in the field.

That’s right and metadata is nothing else but data about data. But is this just a meaningless combination of words or something else?

What is Metadata

Metadata provides information in form of description about data. There are various types of metadata

Business Metadata Technical Metadata Operational Metadata
describing content of data access permissions data sharing rules
informing about the data condition data models and mapping documentation data sharing, audit rules
include security levels mapping documentation audit results

Business Metadata vs. Technical Metadata

Business metadata is important for people who work in business for activities related to day-to-day enterprise activities. The information provided by metadata in this case may include: customers, account balances, product identification. Business metadata can be found in many places, for example on screens, reports, bank statements, documents, job application forms (see below). Technical metadata is used mainly for development, design, and maintenance (database table or index name, field constraints).

Importance of Metadata

  1. Supporting and improving data quality and respectively its accuracy.
  2. Providing information related to the data set version for which the model was trained and model better governance (Machine Learning).
  3. Making the data easier to find, use and re-use.
  4. Maintaining historical records of datasets.
  5. Enabling a better control over data.
  6. Increasing the web page visibility.
  7. Creating relationships between items and users.
  8. Organizing important business information.
  9. Describing the state of data.

For data scientists, the first three points will probably be of the most interest. However, almost anyone will need metadata. Imagine how cumbersome it is if you have thousands of digital images and some of them lose annotation. It is really time-consuming if you have to search for them and add descriptions manually. Surprisingly, even a book can be a source of metadata. The content of the book can be understood as data, whereas the book summary, title and table of contents are examples of metadata. Let’s say you are the owner of an online e-book catalogue and you want to manage metadata that describes various articles and book chapters. All these examples prove that it makes sense to care about metadata and consider how to better manage it.

Metadata in use

The Digital Public Library of America (DPLA) is devoted to providing users an access to materials. The resources are provided thanks to metadata functionality. The materials are available via browsing, searching and application program interfaces (APIs). APIs represent a fascinating use of metadata on the web (popular web services include: YouTube, Twitter, Reddit, Spotify). Metadata allows for the web to be composed of small pieces loosely joined. It has found a significant place in eScience, which is composed of data research methods.

Metadata Management in Business

There are important decisions to be made by managers of companies regarding metadata. The first thing to consider is what kind of metadata to include and what the metadata repository should look like. If the company has general needs, it may combine various types of metadata. There are multiple ways in which we can approach and define an enterprise metadata repository. The iterations of development can include gathering metadata sequentially by country, source of technology, relevance, type or function. A well-designed metadata repository and can yield a significant productivity improvement but must be built iteratively. It is important to define the desired outcomes and how they can be achieved.

Why Metadata Management Fails

Companies fail on metadata management if their business and technical goals for a metadata environment are not clearly defined. It is crucial to understand why the metadata repository is built and what its purpose is. The objectives can include: reducing costs, increasing revenue, and meeting regulatory needs. Another issue in metadata management relates to decisions regarding available metadata tools offered on the market. It is a prerequisite to consider various offerings and choose the one that best matches our needs and goals. The metadata repository should be easy to access. Too many issues accessing the data and retrieval may lead to dissatisfaction and metadata management failure. It may/might be a good idea to consider hiring a team responsible for metadata control. In big organizations the data amounts are enormous. The larger the metadata repository, the more important it is to use some automation techniques.

Conclusion

Metadata describes data and is present almost anywhere. It helps to improve the use of data and its accuracy. An intentional use of metadata can lead to higher returns and faster business growth. However, as the amount of data increases, the correct management of metadata becomes challenging. It is crucial to understand how to use metadata in order to achieve your company’s objectives and build an easy-to-access repository.

7 Likes