Want to get this certification? Well it is not an easy one. You’ll need to do the homework. From what I read online people usually spend 2–3 month on preparation. It’s not a secret that many of us won’t be using each of the Google products every day but we need to know them, right? This article is for those who don’t have time to read all the manuals. I will describe what I did to get ready for this exam in 8 days. First of all I need to say that I didn’t have a clue how serious that exam really is. Exam questions were way more complex and different from any online course questions I know. So if you don’t have any developer background please take your time, read the books and do the tutorials. It took 1 hour and 35 minutes to pass the test. Every exam question was exactly the same though. Thank you awslagi team for greatest resource exam prep.
AWS: awslagi.com
GCP: gcp-examquestions.com
Recommend Read
Official Google Cloud Certified Professional Data Engineer White papers and documents.
Preparation
Day 1
I started with practice tests. There is a plenty of online courses for this task. I gave an overview below on some of them that helped me to get ready. Long story short, if you don’t want to waste your time start with the tests. There is a practice exam from Google which I did and failed but then I knew how questions looked exactly. It gives you the format, level, and scope of questions you may encounter on the certification exam.
Day 2
On day 2 I started to shape an idea of how to deal with case studies, what exam structure and questions are. I started to pay attention to words like economically, cost-effective, as soon as possible, etc. These types of keywords very often define the right answer because on exam you can find multiple answers that technically satisfy the requirements.
Day 3–5
I did 2 practice exams a day occasionally reading google docs related to topics I didn’t know. I did that during my morning cardio in the gym while I was cycling. 30–40 minutes is more than enough to do the practice exam.
Day 6–8
I did two practice exams a day but now I had two browser tabs opened with previously passed practice exams. Every question I was uncertain about I checked straight away and read the docs. I think this tactics helped to polish my knowledge. Also I started to take some product specific notes and tie them to those keywords I was talking about earlier.
How to pass the exam?
- There is no generic answer to that question. During the real exam I felt that I know nothing and questions seemed very difficult. However, the following strategy worked for me:
Do the practice tests to understand the type of questions and structure. - Learn product features
- Pay attention to question keywords as very often they define the correct answer.
4. Read the manual. It’s optional but very useful.
Read official Google docs. At least an overview and case studies. These guides are great and have all the information you need to pass the exam.
After all there are a lot of BigTable questions.
Pay attention to: Development and Production instances, Disk Types (HDD vs. SSD). BigTable Performance Example: Your organization will be deploying a new fleet of IoT devices, and writes to your Bigtable instance are expected to peak at 50,000 queries per second. You have optimized your row key design and need to design a cluster that can meet this demand. What do you do?
BigTable Performance Example: You are asked to investigate a Bigtable instance that is performing poorly. Each row in the table represents a record from an IoT device and contains 128 different metrics in their own column, each metric containing a 32-bit integer. How could you modify the design to improve performance?
Relational Database questions
Pay attention to: Replicas, availability and migration guides.
A lot of questions about Pub/Sub, Kafka and windowing.
Pay attention to: Kafka Mirroring, Differences between these two. Pub/Sub handles the need to scale exponentially with traffic coming from around the globe. Apache Kafka will not be able to handle an exponential growth in users globally as well as Pub/Sub. Cloud Pub/Sub guarantees to deliver messages at least once to every subscriber. As multiple systems need to be notified of every order, you should create one topic and use multiple subscribers. Order of delivery is not guaranteed by Pub/Sub so attach a timestamp in the publishing system if possible. Building and operationalizing data processing systems.
Dataproc
Pay attention to:
HDFS vs. Google Cloud Storage for Dataproc workloads.
Best practice: Dataproc clusters better be job specific. Use cloud storage if you need scaling because HDFS won’t scale well and needs custom settings. Also Google recommends using Cloud Storage instead of HDFS as it is much more cost effective especially when jobs aren’t running.
Dataflow
Pay attention to:
PCollection branching, Flatten and Joins, transformations and sliding windows.
Flatten — You can use the Flatten transform in the Beam SDKs to merge multiple PCollections of the same type.
Join — You can use the CoGroupByKey transform in the Beam SDK to perform a relational join between two PCollections. The PCollections must be keyed (i.e. they must be collections of key/value pairs) and they must use the same key type. Operationalizing machine learning models. I started with Google docs straight away as I was already familiar with ML basic concepts but I think official Google ML crash course is really useful for exam purposes. 100% I should have started with this one first. And it has a lot of reading content as well as videos.
ML products
This is a full list and very easy to find all docs. Just skim through Overview and use cases sections.
Natural Language API
Example question: You wish to build an AutoML Natural Language model for classifying some documents with user-defined labels. How can you ensure you are providing quality training data for the model?
Read: Preparing your training data | AutoML Natural Language | Google Cloud
Google Cloud AI Platform
Google Cloud TPUs
Google Glossary of ML terms
Online practice exams
Google Certified Professional Data Engineer from gcp-examquestions.com
The Guarantee Part with actual exam questions ensure you can pass in fastest and easiest way. It was covered all my exam. Highly recommended. Free practice on this site like that
Source: Medium