Introduction Pandas standing for panel data is managing an outstanding object called DataFrame and the first time I encountered the famous « dataframe object » concept was when I was asked to put in place an Apache Spark SQL environment in my current customer’s day. As a SQL and java developper, I…
Scheduling with Apache Airflow
Airflow at a glance How does it work ? Quick demo Launching Airflow Even if the official repository provide a docker compose file, I fine tuned the file to simplify the getting started: Once ready, launch these commands Open your favorite browser and go to the Airflow login page: http://airflow.localtest.me…
Let’s play with Dataiku DSS
Introduction I recently ran into Dataiku Data Science Studio in my last mission and unfortunately didn’t have the opportunity to get my hands on this tool. After a quick search Dataiku can be install with a free license for demo purpose. The official github repository host a Dockerfile we can…
BI tips: The sparsing data trap
Introduction Did you ever ask yourself if all existing aggregate functions are always working the same way no matter the data quantity you have for a given period ? I would say « it depend of the need » but in a perfect world it always simplest to fill the gap for missing…
Commentaires récents