Worksheet on Paris Trees data in MongoDB

Trees are a huge asset in the path to resilience with regard to climate change.

For the context, instructions and datasets please refer to

https://github.com/SkatAI/epita-mongodb/blob/master/docs/S03.02.mongodb-trees-practice.md

Important: in all the form below, you should paste the query, aggregation pipeline or command not the output or results.

Student Information

Please provide a valid email.
Please provide your name.
You must accept the privacy policy to continue.

Email:

Name:

Section 1: Load the data

In this section, you write aggregation pipelines to explore the trees dataset.

Q1.1 Write down your import command

Write down your mongoimport command as the 1st answer to the 1st question on the quiz

make sure to use a global variable to avoid disclosing your access credentials

Section 2: Data Exploration

In this section, you write aggregation pipelines to explore the trees dataset.

Q2.1 Count trees per domain

Write the aggregation pipeline to count trees per domain, order by most common domains first.

Write down the MongoDB query or aggregation pipeline, not the output or answer

Q2.2 Count the number of trees per stage

Write the aggregation pipeline to count the number of trees per stage for all values as well as number of trees with missing stage (null values), order by descending.

Q2.3 Dimensions of trees per stage

calculate the count, min, max and average height and circumference of trees per stage

Q2.4 Count `remarkable` trees

Write the query that counts how many trees are remarkable

Q2.5 What makes a tree remarkable ?

Write an aggregation pipeline that compares the average height and circumference of Platane trees that are remarkable with ones that are not.

Q2.6 Top 5 names of trees

Write the aggregation pipeline that returns the names of the top 5 most common trees names (taxonomy.names) in Paris

Q2.7 Calculate the age of the trees

Write the updateMany() statement to calculate the age of each tree with and add that number to a new age field to the trees collection.

Q2.8 how many greenSpaces are gardens?

Write the aggregation pipeline to count the number of each categorie from the greenSpaces collection.

Q2.9 is _perimetre_ available for all categories and topologies of greenSpaces ?

Write the aggregation pipeline that counts the number of greenSpaces that have a perimeter per category and typology

Q2.10 Sample 5 greenSpaces at random

write the aggregation pipeline that returns 5 randomly selected Green Spaces.

Section 3: Schema Validation

At this point you have a better understanding of the datasets and you are comfortable writing aggregation pipelines.

The next step is to make sure the data in the database is clean.

For that we will create the collection before importing the data and specify what sort of data can be added to the collection by writing a schema validator.

Please familiarize yourself with schema validation in MongoDB with these documents:

Q3.1 Write a trees schema validator

Your task is to write a validator that requires

  • height < 50
  • circumference < 500
  • required taxonomy.name and geo.geo_point_2d
  • validationAction: "error"

and use it in the db.createCollection("trees", {<validator>}) function

Write down the valdator in JSON format

Q3.2 Compare validationAction: error and warn
  • drop the trees collection
  • import the dataset with validationAction: "error", and validationAction: "warn".
  • what do you observe ?

Write down what you observe and your conclusion

Q3.3 Remove geolocation duplicates

Write the aggregation pipeline that finds all the trees that share the same geolocation.

Q3.4 Create a unique index

Write down the output of db.trees.getIndexes()

Section 4: GeoJSON

In this section, you will

  • convert string to geojson
  • find your address lat and long
  • given location find the number nearest trees
  • list the trees in a garden
Q4.1 Convert the `geo_point_2d` string into an array of strings

Write an aggregation pipeline that splits the geo_point_2d field into an array of 2 numbers while swapping the coordinates.

Q4.2 Convert the array of floats into a Point

Write the aggregation pipeline that will include the new field in the geo field as such:

js { geo: { geo_point_2d: '48.837794432065046, 2.3777788022073083', geo_point: { type: 'Point', coordinates: [ 2.3777788022073083, 48.837794432065046 ] } } }

Q4.3 Make the new field permanent

Write the statement using $set and $updateMany() that adds the new field into the collection.

Q4.4 convert gardens geolocation

Write the updateMany() statement that will transform the greenSpaces.geo.geo_point into a geoJSON type.

Q4.5 Most common trees in an area

Write the aggregation pipeline that returns the top 5 most common tree name in the vicinity of a particular garden You need to write one query and one pipeline - the query finds the greenSpace with the name - the pipeline uses $geoNear on the trees collection with the retrieved point.

Q4.6 find out if a tree belongs to a garden

Write the pipeline to find how many trees are in the green space with the highest surface with - categorie: Jardin - topologie : Jardins privatifs

Q4.7 add a `tree_count` field for all green spaces

Now write the updateMany() statement to add a tree_count new field to all the green spaces documents where tree_count counts the number of trees from the trees collection that are in the green space geo shape

Q4.8 Create a new collection

In the greenSpaces collection, add the list of trees in the garden as an array. Only list the height, circumference, and taxonomy (name, species, genre, variety) of the tree.