Day 16 apache parquet

Check out our live web application for this program - https://newdaynewlearning.netlify.app/

[!NOTE] There is a game waiting for you today, the best/first answer can win an exciting gift🎁

More about me:

I am just a Colleague of your’ s, Learning and exploring how Math, Business, and Technology can help us to make better decisions in the field of data science.

Topic : Apache Parquet

Article Source :

TL;DR :

Apache Parquet is an open source data file format that was designed to improve performance when handling column-oriented data in bulk. Apache Parquet is able to provide efficient compression and encoding schemes with enhanced performance due to its design. This makes it a common interchange format for both batch and interactive workloads, similar to other available columnar-storage file formats in Hadoop like RCFile and ORC.

Apache Parquet Advantages 

  • Reduces IO operations.
  • Column-based format makes it more efficient in terms of storage space but also speeds up analytics queries.
  • Highly efficient data compression and decompression.
  • Support type-specific encoding.
  • Supports several data types and nested data structures.

Apache Parquet Disadvantages 

  • Not human readable (binary).
  • More memory required to read data vs row-based format.
  • Can be slower to write than row-based file formats because of the metadata overhead.

Notes mentioning this note


Here are all the notes in this garden, along with their links, visualized as a graph.