Worksheet on the ADEME dataset
In this worksheet, you work with the ADEME dataset, a collection of 10k energy diagnostic of building in France. Each builing is labeled with a letter A to G. A and B graded buildings show top level energy efficiency while F and G labeled buildings are called "passoire energetique" (energy guzzler).
ADEME is the French government agency for ecological transition, supporting sustainability, energy efficiency, and environmental innovation. The DPE dataset is available at this address https://data.ademe.fr/datasets/dpe-v2-tertiaire-2. This dataset only conerns the tertiary sector: services, administrations etc
The whole dataset includes over 600k energy audits but we will only work on a subset of 10k samples.
This worksheet has multiple parts and will cover most of the course content including
- normalization: NF1, NF2, NF3
- index creation: B-tree and hash indexes
- Optimizing quiries with EXPLAIN
- window functions and CTEs
- SQL and PL/pgSQL functions
Note: you do not have to answer all the questions. If you don't know skip ahead
As often the case in real world situations the dataset is far from being perfect.
Your mission is to understand the data, improve it and prepare it for production use.