30th Jul 2021 9 minutes read New to Data Engineering? Don't Make These Mistakes Himanshu Kathuria sql learn sql Data Engineering Table of Contents What Does a Data Engineer Do? Five Data Engineering Mistakes to Avoid Building Systems That Are too Complex Not Checking the Accuracy of the Data Working Without Thinking and Performing Actions Mechanically Without Asking Questions Not Considering the Needs of the End Users Not Communicating Enough With the Business How to Become a Great Data Engineer Ready to Start Your Data Engineering Career? It's best to learn from the mistakes of others. This advice also works for data engineering. In this article, you'll find tips to help you advance your career and avoid common data engineering mistakes. The data revolution has produced tremendous opportunities and created various high-paying jobs related to the collection, maintenance, and manipulation of data. Data engineering is one of the most lucrative and interesting jobs of this family. Do you know the average salary for data engineers in the United States is as high as $116,000 and can be as high as $192,000 for experienced ones? Data engineering gives you an opportunity to make good money and also make a tremendous impact for your organization and clients. Interested? Read how much you can earn in this industry. This is certainly attractive. But unfortunately, not everyone who aspires to make a career in data engineering becomes successful. So, if you want a career as a data engineer and earn that fancy six-figure salary, you need to avoid some mistakes that can derail you and even stop your professional growth. In this article, I will cover some of the mistakes you need to avoid and ways to help you add maximum value to your customers, employers, or clients. In some organizations, the data engineer role is clearly defined. However, you may actually do the work of a data engineer without having a formal “data engineer” job title. So, before I cover these mistakes, take a look at what data engineering actually means in the article “Who Is a Data Engineer?”. What Does a Data Engineer Do? A data engineer is responsible for defining, creating, and maintaining the infrastructure required for collection, manipulation, and retrieval of data. To understand this better, take Airbnb as an example. Millions of users around the world are searching for places to stay. On the other hand, thousands of property owners are looking to rent out through Airbnb to these prospective customers. To meet the needs of these users, search results have to fetch relevant information from the database. This includes the details of the property, pricing information, availability, and whatnot. Airbnb may also want to store all the small actions like clicking a particular link or reading a particular page for a longer duration, since these may be valuable user actions. Several other pieces of data need to be collected once the booking is confirmed and at the steps like payment, customer support, etc. to data can flow smoothly without any hiccups. Since speed is critical, a data engineer needs to ensure that efficiency is maintained and that the systems perform as fast as possible. This requires a robust database structure and clearly defined algorithms accessing it. This is the data engineer’s responsibility. Many confuse data analysts and data engineers. While they both deal with data, there are clear differences. A data analyst focuses on analyzing data and generating insights that are useful for driving business. As a data analyst, you may also build visualizations to aid in understanding the analysis and make sure stakeholders have a good view of key metrics. In contrast, data engineering involves creating data pipelines and flows. Defining database structures, data relationships, and algorithms for data retrieval and collection are all done by data engineers. The data stored in such structures is then used by data analysts to help the business make informed decisions. Now you know what data engineering means. Without further ado, let me take you through some mistakes you should definitely avoid if you want to climb the ladder in the field. Five Data Engineering Mistakes to Avoid Building Systems That Are too Complex As the data needs become complex and the delivery timelines shrink, sometimes there is a tendency by data engineers to build complex systems. Complex systems may have thousands of lines of code, some of them unstructured, making it difficult to maintain. In fact, it often becomes almost impossible to debug when there is an issue, except by the original developer. As a data engineer, you need to create systems that simplify the problem and make it easier for even a newbie to understand. It is important to maintain a good modular structure for your work, create functions that are easy to understand for everyone, and use proper naming conventions. Someone who neither wrote the code nor designed the system needs to be able to understand it easily. Not Checking the Accuracy of the Data The systems you design may be using many different kinds of data, which may be collected through various sources. With the number of information sources increasing by the day, accuracy becomes a very important factor for a data engineer to address. Let’s say you are the data engineer for the sales and marketing systems of your company. You may be building data pipelines and dealing with various kinds of information: Social media data. Search engine data. Order information data from ERPs and data warehouses. Salespeople or employee Information from HRMS systems. Financial forecasts and information. And so on. You can intuitively sense social media and search engine data require a lot of cleaning. But you might assume upstream systems like ERPs and data warehouses would have clean data. While you might think the data you get would be accurate, it is entirely possible to have errors or issues in the intermediate transformation steps before you receive the data. This may have a huge impact on the systems you build. As a data engineer, remember the golden rule: never assume data is accurate unless you have done your checks. In fact, in my opinion, you should have standard checks built in your development process to ensure accuracy. If you use SQL, you can build some queries at your end to highlight any discrepancies. It can mean the difference between success and failure of the project. Basic checks include counts of the records and the totals you are expecting. For instance, if you are building a system that accepts the order data and the finance data, you can make sure the total amount of sales through orders equals the total revenue for that source on the finance end. This is just a basic example. You may need to do record-wise checks as well. Fortunately, with SQL and Python, you can easily automate a lot of this. Working Without Thinking and Performing Actions Mechanically Without Asking Questions Depending on where you are in the data engineering hierarchy, your responsibilities may differ. For example, if you are just beginning your data engineering career, you may be given a small module in a big project to develop. With more experience, you may be designing an entire system architecture to serve some specific purpose. No matter where you are in that ladder, always remember to have answers to some important questions. Why am I doing this project for my organization? What value does it add? Who are my customers (both immediate and ultimate)? How are they going to use the system I build? This is important so that you don’t lose sight of the direction of the project. In fact, this clarity will also help you prioritize the tasks and functionalities better. You may be building a fancy system which can process terabytes of information every hour. But if it does not meet the purpose of the project or the needs of your customers, it will be of no use at all. Not Considering the Needs of the End Users As a data engineer, you develop systems that are typically used by analysts, data scientists, programmers, business users, and of course, end customers. Some of them may be your direct customers, while others are indirect. For instance, an analyst may access the database you design directly to create a visualization dashboard. The business users consume that dashboard to make decisions, which will be ultimately beneficial for the customers. It is important to know: Who are your end users and what kind of data do they need? How will your end users access information? Do they understand data models? SQL? Are your data structures good enough for all the needs? What tools are your end users skilled at? Do involve them in your development process to make sure you are always aligned. This is important, as there could be a gap between what you are building and what your end users are expecting. Do not make the mistake of leaving them out of your picture. Not Communicating Enough With the Business Regular communication with the business is an absolute necessity for a data engineer. Think of the business both as a customer and a supporter. Say you need additional resources, maybe a cloud subscription, a more powerful machine, or an additional engineer in your team. You will make your justifications or case to the business. Regular conversations with the business makes getting that buy in easier, since they will be aware of the needs. At the end of the day, your project has to add value to the business. Regular communication allows you to not only understand the requirements properly but also make sure that your project adds value. Trust me. I have seen projects fail, not because they weren’t built properly but because they did not get the buy-in from the business, and it was from the lack of proper communication. How to Become a Great Data Engineer Avoiding these mistakes helps you go a long way in your career. However, if you are just starting out and want to know what you need to do to become a good data engineer, here are the four basic things I think will help you the most. First things first. Make sure you master the skills needed. This includes: SQL (with focus on database structures). Some concepts of NoSQL. Python. Kafka. AWS. These are not exhaustive by any means. But they are very important. For SQL and Python, I recommend LearnSQL.com and LearnPython.com. They not only offer a comprehensive set of learning resources, but the way in which the courses are organized makes sure you learn things well. In fact, LearnSQL.com has an awesome data engineering path with a lot of resources. You could also take up some other helpful online courses. Here is a great article with the top online courses for data engineers. Second, hone your problem-solving skills. It is a key skill for data engineers. Third, follow the best development practices: Keep your code nice and simple. Create well-defined, modular code with structured functions. Follow proper naming conventions. And finally, learn and be curious. Ready to Start Your Data Engineering Career? Data engineering is a great field and is bound to grow in the future. With the demand for data ever increasing, and with the development of Big Data technologies to aid in responding to this demand, I think the opportunities will only increase. However, as the jobs become more and more lucrative, more people become attracted to them, which increases the competition. So, it is important to differentiate yourself. By now, you should have a good idea of what to do and what not to do to make a fantastic data engineering career. Work hard, learn new skills, stay updated with the latest technologies, and always be better than yesterday! Tags: sql learn sql Data Engineering