Day 16 apache parquet
Check out our live web application for this program - https://newdaynewlearning.netlify.app/
[!NOTE] There is a game waiting for you today, the best/first answer can win an exciting giftđ
- Click on the link : https://www.nytimes.com/games/wordle/index.html
- You will get a 6 chance to guess the 5-letter word correctly
More about me:
I am just a Colleague of yourâ s, Learning and exploring how Math, Business, and Technology can help us to make better decisions in the field of data science.
- Check out my Second brain:Â https://medium.com/@ravikumar10593/
- Check out my Portfolio : Link
Topic : Apache Parquet
Article Source :
- Official Website - https://parquet.apache.org/
- Understanding Apache Parquet: A Detailed Guide
TL;DRÂ :
Apache Parquet is an open source data file format that was designed to improve performance when handling column-oriented data in bulk. Apache Parquet is able to provide efficient compression and encoding schemes with enhanced performance due to its design. This makes it a common interchange format for both batch and interactive workloads, similar to other available columnar-storage file formats in Hadoop like RCFile and ORC.
Apache Parquet AdvantagesÂ
- Reduces IO operations.
- Column-based format makes it more efficient in terms of storage space but also speeds up analytics queries.
- Highly efficient data compression and decompression.
- Support type-specific encoding.
- Supports several data types and nested data structures.
Apache Parquet DisadvantagesÂ
- Not human readable (binary).
- More memory required to read data vs row-based format.
- Can be slower to write than row-based file formats because of the metadata overhead.