This episode was delayed due to ongiong situation in Ukraine. Thank for understanding.
Hot updates
- Pulsar is updated
- Apache Kylin
“Extreme OLAP Engine for Big Data”- Three main versions: 2.4, 3.0 and most recent is 4.0.1
v4 released in the autumn 2021 - Brings OLAP back to data
- Been around since 2015, brought to you by eBAY
- Not a friend to HBase, but likes parquet
- Web Interface for all data steps
- Official python client
- with pandas support
- Three main versions: 2.4, 3.0 and most recent is 4.0.1
- Ambari is killed (put in the attic)
- Apache Hop 1.1
- https://www.leanwithdata.com/blog/2022/02/hop-1.1.0/
- At January, 18th graduated from Incubator
- Apache Hop Sucks!
- Dolphin Scheduler
Lightning news
- Apache Arrow for Rust
- Apache Iceber 0.13.0
- Hudi 0.10.1
- Apache HBase 2.4.9
- Apache Seatunnel
easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data - Apache ORC 1.6.13
- Apache Beam 2.35.0
- Apache Airflow 2.2.3
Discussion: DataSecOps
- OWASP
- Data debiasing
- Data anonymization
Dr. Igor Mosyagin
Data Engineer @ Klarna
Igor identifies himself as a pragmatic engineer with strong academic background. A theoretical physicist by training, he eventually assumed he had enough PhDs and left Academia to work with Data-* related things. As of 2022, Igor works as a Data Platform Engineer at Klarna. On top of that, he’s a huge fan of cephalopods, math rock, and quantum mechanics. He also hates baked carrots so much he decided to mention it in this bio
Pasha Finkelshteyn
Developer advocate @ JetBrains
Having 14 years of experience in IT, Pasha went through a fire in water, from technical support to developer, team lead, and data engineer. Now Pasha works as a developer advocate for Data Engineering at JetBrains. He helps develop the Big Data Tools plugin, gives talks on Kotlin and various aspects of data engineering, and work with data. Also, he is the author and maintainer of Kotlin API for Apache Spark.