Passed DP-100 and the Azure Data Scientist Associate certification

I guess I'm a data scientist meow.

My first new (rather than renewal) certification of the year is the Azure Data Scientist Associate!

I found DP-100 to be one of the more straightforward exams I've taken in this series, although not necessarily 'easier'.

Due to the NDA and of course as general good practice, I can't go into any detail on what the exam specifically entailed, so here are some general pointers I've taken to approach my preparation for this exam specifically, and the certification exams in general.

  • Get familiar with the Azure Machine Learning SDK and the Machine Learning Studio interface. Create a mental 'mapping' from one to the other, such as how to create and configure a pipeline step in code, and how that relates to the options available in the UI.

  • Use the materials provided by Microsoft Learn on the exam page which in my opinion give a very logical step through of the features of Azure Machine Learning (SDK and Studio interface) and map closely to the stated exam objectives.

  • However... In general the Microsoft Learn courses give a good overview of what the components are about, but complete documentation can be found in the specific documentation pages for those components and there are of course more intricacies than what's in the Learn material, which should be treated as a jumping off point to go and find out more.

  • Create your own python script that goes through the "cycle" of authoring, testing and deploying a machine learning pipeline. You don't necessarily need to run it against an actual Machine Learning Workspace as long as it's clear what steps you are doing and why. Of course running it for yourself against an actual implementation is the best way (such as if you have access to Azure credits at work, or a trial subscription).

  • Where the various tools have options and parameters you can choose (in DP100: compute type for a pipeline step, model to use for an explainer, etc), go through the documentation for those components and learn what the parameters are and their use cases.

  • Throughout learning about all the components, a key question I always ask myself is how can I apply this? - Think about how it can be put to use in an actual business scenario, rather than just memorising abstract details. I find it much easier to learn anything by applying as I go. I pick a subject I'm interested in and create the use case according to that, for example creating a dataset that relates to Super Mario characters.

  • I typically go through the exam in two passes (if there are multiple sections I go through each section like this). On the first pass there are many questions I know the answer to instantly or can determine the correct answer confidently - I answer those and move on. If I'm unsure about a question or have no idea (I hate it when I have absolutely no idea, as it makes me wonder what other black holes there are in my preparation!) I think about it for a bit, put my best estimate and flag it for review. At the end I go back through the questions for review (I try to only revisit each one once).

    • In some cases it may be that my memory has been jogged by some other question and I now find I know the answer.
    • Typically in my case about 75% of the questions are readily answerable and 25% marked for review. However they seem to follow the Pareto Principle in that those 25% for review end up taking 75% of the overall time. I suppose that is logical since studies have repeatedly shown people are generally good at knowing how 'certain' they are of their knowledge, so the easy-to-answer ones are answered quickly by definition.

That's about all I can say on this subject, so onwards and upwards to DP-500: Designing and Implementing Enterprise-Scale Analytics Solutions using Azure and Power BI - for the Azure Enterprise Data Analyst Associate certification...