Building Scalable
Data Ecosystems
for 8+ Years.
With more than half a decade of professional experience, I've been designing, developing, and delivering Data Engineering solutions ranging from creating standalone ETL scripts to complex frameworks on clouds like AWS and Azure. Writing clean, reusable, and well-documented code one project at a time.
Experience
A timeline of engineering complex data architectures and leading technical transformations.
Managing Consultant
Systems Limited - Lahore, Pakistan
- terminal PartnerLinq - Visibility - Data Engineering Lead
- terminal Creator of the Data Framework (Python, Pandas, PySpark, Delta Lake) capable of acquiring, ingesting, and executing analytical processes.
- terminal Responsible for end-to-end solutioning and design, innovation, and optimizations.
- terminal Designed and data modeled the metastore, data/delta lake, Canonical Models, and SQL Server datastore.
- terminal Designed and developed REST and GraphQL APIs for Semantic Integration, allowing outside applications to interact with internal systems.
- terminal Designed and orchestrated data pipelines using Databricks Workflows, Airflow, and Azure Data Factory.
- terminal Developed a query builder to construct SQL queries for a variety of dialects based on a grid model.
Senior Software Engineer
Northbay Solutions - Lahore, Pakistan
- terminal Developed ETL scripts using PySpark and created a comprehensive Data Lake framework.
- terminal Migrated SAS code into PySpark code and developed dependency packages for AWS Glue ETL jobs.
- terminal Developed a service responsible for triggering processes automatically when files are placed on S3 utilizing AWS Lambda and Cloudwatch.
- terminal Created dynamically parallel State Machines and frameworks to automate cloud migration and infrastructure deployments (Cloudformation).
- terminal Created API using API Gateway for Kinesis Data Streams and built tools for DynamoDB access.
Intern
Mentor Graphics - Lahore, Pakistan
- terminal Interned as QA for the Nucleus Real-time Operating System utilizing C/C++, Python, and Bash.
Core Technical Stack
Validated ExpertiseFeatured Projects
Flagship projects that I've created from the scratch .
PartnerLinq - Data Framework
Built a modular data framework using Python, Pandas, PySpark, and Delta Lake. This framework allows creation of data pipeline for both Data Engineering and Machine Learning use-cases via simple orchestration. The modules are independent and can be used in any order. It can use Domain-Specific Language (DSL) for curation of the data into the Canonical Models as well as for building analytical queries, using Semantic Model, which allows infinite flexibility and scalability regadless of the number of inputs and outputs as well as the complexity of the transformations. Complex data pipelines can be created without having to do any deployment.
PartnerLinq - Visibility - Integration API
This API, based on Django Rest Framework, is responsible for allowing other systems to use the Data Framework for their own data engineering and machine learning needs. It can keep track of the statuses of the jobs and can act smartly route jobs between multiple clusters depending on the requirements.
PartnerLinq - Visibility - Semantic API
This API, based on Django Rest Framework, for serving the metadata from the metastore. It exposes a GraphQL endpoint for the frontend to consume so that it can build the views dynamically based on the metadata. It also has a REST endpoint for backward compatibility as well for some complex scenaerios.
Let's build something scalable.
Currently open to consulting opportunities or senior engineering leadership roles focused on high-scale data infrastructure.