My Portfolio

A profile picture of Grace Wijaya

Kaushal Grover

, Age 23

Data Engineer

Available for work

+91 9318335768

New Delhi, India

grover.kaushal27@gmail.com

Connect on Linkedin

Github Projects

About

I am a Data Engineer with strong foundation in large-scale data infrastructure, ETL pipelines, backend systems, and web scraping.
In my current role at Remitz, I have designed and implemented scalable data pipelines, processing over 70 million medical claims daily. I have successfully replaced legacy systems, transitioned the codebase to GitHub, and developed multiple Tableau dashboards with automated data source refresh using Docker. I follow Agile workflows and collaborate across teams to deliver high-impact and reliable solutions.

With a Master's in Computational Biology, I have previously worked on survival analysis and machine learning applications in genomics, with publications highlighting my ability to conduct in-depth research, analyze complex datasets, and effectively document technical workflows.

Skills

Python (PySpark, PyTest, Selenium, Scrapy, Multiprocessing, Scikit-learn, MatPlotLib, Flask),
PostgreSQL, Tableau, Apache Superset, Docker, Linux, Bash Scripting, Github, R, Survival Analysis, Machine Learning, NGS tools, NGS Pipelines, Genomics, Statistics, Jira, Confluence.

Experience

Data Engineer
Sep 2024 - Current · 7 months

Remitz |  Puerto Rico, USA


Leading automated revenue recovery services provider, offering smarter medical billing and claims collection.

Designed and Implemented scalable pipelines using PySpark, PostgreSQL, AWS, and Python, processing 50+ million claims and remits daily. Utilized advanced Object-Oriented Programming (OOP) principles for modular and reusable code.

Engineered robust web scraping modules using Python, Selenium and asynchronous requests to extract remits and billing data from third-party healthcare portals.

Helped migrate the company’s codebase from an on-premise server to GitHub, including symlinks, test datasets, environment variables, pull request reviews, and regular version releases.

Developed 5+ Tableau Cloud dashboards for stakeholders and business decision-making. Automated daily data source refreshes with on-premise Linux PostgreSQL servers using Docker containers and Bash scripting.

Deployed Apache Superset in a production-like environment using Minikube (Kubernetes), configuring embedded dashboards and role-based access control (RBAC) to support the organization’s transition from Tableau.

Revamped legacy PySpark transformation logic with class-based designs to improve modularity, scalability, and performance, achieving 70% faster pipeline run time on Linux servers.

Developed comprehensive test datasets and integrated automated pipeline testing, cutting debugging time by 25%.

Authored detailed documentation in Confluence, covering workflow logic and environment configurations. Worked closely with cross-functional teams, ensuring Agile sprint deliverables were met consistently.


Master’s Dissertation
Jan 2024 - Jun 2024 · 6 months

JNU | Delhi, India


Dr. Arnab Bhattacharjee’s Lab, Department of Computational and Integrative Sciences (SCIS)

Dissertation Title: Identification of Novel Biomarkers for Head and Neck Squamous Cell Carcinoma (HNSC) Using AMPK Pathway Gene Expression Data

Analyzed over 830 samples from the TCGA and GEO datasets, applying data normalization and differential expression techniques to filter potential biomarkers, focusing on genes with a minimum of 2-fold increase.

Developed a 10-fold cross validated machine learning Cox Hazard Model using Python and scikit-learn, with 3 regularization techniques, leading to the identification of 6 key genes as independent prognostic markers.

Validated results through statistical analysis (ROC curves, PCA, correlation heatmaps, network graphs, immune response analysis), improving model’s accuracy to 70%, providing insights for cancer prognosis.


Research Intern
Jun 2023 - Jul 2023 · 2 months

IISc | Bangalore, India


Dr. Mohit Kumar Jolly’s Lab, Department of Biological Sciences and Bioengineering (BSSE)

Research Focus: Investigated the stochastic dynamics of multipotent naïve CD4+ T cell differentiation, proposing a novel two-step differentiation process with insights in cancer biology.

Analyzed and compared 10 Bulk RNA-seq expression datasets against 4 curated signature gene sets, utilizing statistical techniques like Gene Set Enrichment Analysis (GSEA) and Principal Component Analysis (PCA), to generate statistically significant results with 95% confidence.

Contributed to the analysis of the Pan-Cancer TCGA dataset, applying Survival Cox Models and Kaplan-Meier analysis, NetworkX and cancer gene sets to uncover critical insights into patient survival and gene expression patterns.

Developed and optimized Python scripts using Matplotlib, Pandas, and other libraries to streamline the analysis of large-scale gene expression datasets, resulting in more efficient data processing and visualization.

Collaborated closely with 2 PhD candidates and 1 fellow research intern, leading weekly project updates and discussions. This teamwork approach led to the completion of 2 key projects within the internship, increasing project efficiency by 50%.

Research Project
Sep 2021 - Feb 2022 · 6 months

Shivaji College, University of Delhi | Delhi, India


Dr. Renu Baweja’s Lab, Department of Biochemistry

Title: Effect of milk and mustard oil consumption: A case study on youth in Delhi-NCR. Published in SAGE journals.

Collected and compiled the survey data to analyze the trends of different lifestyle factors and correlated them with diseases and Hematological Parameters.

Utilized Microsoft Excel to identify key patterns and significant variables contributing to health deterioration.



Projects

Education

M.Sc. Computational Biology
Sep 2022 - Aug 2024 · 2 years

JNU, New Delhi


Achived the highest overall marks in my batch, with top scores in dissertation evaluation.

Collaborated on research with two PhD students from different departments, resulting in a paper currently under review.

CGPA: 7.54/9

Concentrations: Bioinformatics, Data Structures, Python, R, Statistics I & II, Calculus I & II, Machine Learning, Omics Sciences, Molecular Simulation, Linux.

B.Sc. (Hons) Biochemistry
Jul 2019 - Jul 2022 · 3 years

Shivaji College, University of Delhi, New Delhi


Editorial member for the 4th edition of the departmental magazine Biokemi. Edited research articles and the "Fun with Science" columns to ensure proper structure for publication.

Published an original article in Biokemi departmental magazine.

Concentrations: Genetics, Recombinant DNA, Immunology, Cell Culture, UV/Vis Spectroscopy, PCR, Blotting.

Publications

Duddu, A. S., Andreas, E., Harshavardhan, B., Grover, K., Singh, V. R., Hari, K., Jhunjhunwala, S., Cummins, B., Gedeon, T., & Jolly, M. K. (2024). Multistability and predominant double-positive states in a four node mutually repressive network: a case study of Th1/Th2/Th17/T-reg differentiation. bioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2024.01.30.575880

Acknowledged in:

Sharma, M., Vavilala, P., Singh, A., & Baweja, R. (2022). Effect of milk and mustard oil consumption: A case study on youth in Delhi-NCR. Nutrition and Health, 29(1), 25–29. https://doi.org/10.1177/02601060221116198

Honors and Awards

3rd Place | "Skills to Start-Up"

IISER Bhopal

IISER Bhopal

Issued by National level science entrepreneurship (Scipreneur) Competition, Innovation and Incubation Centre for Entrepreneurship (IICE).

Developed and presented an innovative business model addressing a scientific problem.

Collaborated with a team to compete against participants including PhD candidates.

Received prize money and a certificate for achieving 3rd place.

Hackathon | “Diabetes Solution”
Jul 2019 - Jul 2022 · 3 years

ISCB RSG India


Participated and received prize money in a hackathon focusing on identifying diabetes causing genes.

Collaborated with a team to analyze differentially expressed genes, construct a protein-protein interaction (PPI) network, and perform Gene Ontology (GO) enrichment and annotation for diabetes-related genes.

Volunteer Experience

Member
Apr 2023 - Apr 2024 · 1 yr 1 mo

ISCB RSG India

IISER Bhopal

RSG India is a branch of the International Society for Computational Biology Student Council (ISCB-SC), comprising over 2,000 students, faculty, industry professionals, and guest speakers across India.

Managed the RSG India YouTube account, including uploading webinar videos and designing channel banners, thumbnails and demonstrating strong communication and digital literacy skills.

Collaborated with a team to design over 10 promotional posters and flyers for various events, leading to a 30% increase in event attendance and showcasing creativity and teamwork.

Assisted in creating and distributing Google Forms and questionnaires to enhance audience engagement.

Anchor
19 - 20 Oct 2023 · 2 Days

JNU, New Delhi


Anchored at "Scientegration" 2023 4th Symposium. 2-Day Department Symposium of various PhD presentations and guest speakers.

Introduced Speakers, Faculty and Guests. Volunteered in Prize and certificate Distribution.

Anchor
18 - 20 Feb 2021 · 3 Days

Shivaji College, University of Delhi, New Delhi


Hosted interactive questionnaire sessions during a 3-day hands-on workshop on "Green Synthesis of Nanoparticles and their Biomedical Applications," organized by the Department of Biochemistry.

Facilitated discussions among 50+ participants, enhancing engagement and reinforcing key workshop concepts.

Test Scores

IELTS English Score:

GAT B ‘22:

GATE Biotechnology ‘22:

IIT Jam ‘22:

TIFR JGEEBILS ‘22: