
Kaushal Grover
, Age 23
Data Engineer
Available for work
About
I am a Data Engineer with strong foundation in large-scale data infrastructure, ETL pipelines, backend systems, and web scraping.
In my current role at Remitz, I have designed and implemented scalable data pipelines, processing over 70 million medical claims daily. I have successfully replaced legacy systems, transitioned the codebase to GitHub, and developed multiple Tableau dashboards with automated data source refresh using Docker. I follow Agile workflows and collaborate across teams to deliver high-impact and reliable solutions.
With a Master's in Computational Biology, I have previously worked on survival analysis and machine learning applications in genomics, with publications highlighting my ability to conduct in-depth research, analyze complex datasets, and effectively document technical workflows.
Skills
Python (PySpark, PyTest, Selenium, Scrapy, Multiprocessing, Scikit-learn, MatPlotLib, Flask),
PostgreSQL, Tableau, Apache Superset, Docker, Linux, Bash Scripting, Github, R, Survival Analysis, Machine Learning, NGS tools, NGS Pipelines, Genomics, Statistics, Jira, Confluence.
Experience
Data Engineer
Sep 2024 - Current · 7 months
Remitz | Puerto Rico, USA
Leading automated revenue recovery services provider, offering smarter medical billing and claims collection.
Designed and Implemented scalable pipelines using PySpark, PostgreSQL, AWS, and Python, processing 50+ million claims and remits daily. Utilized advanced Object-Oriented Programming (OOP) principles for modular and reusable code.
Engineered robust web scraping modules using Python, Selenium and asynchronous requests to extract remits and billing data from third-party healthcare portals.
Helped migrate the company’s codebase from an on-premise server to GitHub, including symlinks, test datasets, environment variables, pull request reviews, and regular version releases.
Developed 5+ Tableau Cloud dashboards for stakeholders and business decision-making. Automated daily data source refreshes with on-premise Linux PostgreSQL servers using Docker containers and Bash scripting.
Deployed Apache Superset in a production-like environment using Minikube (Kubernetes), configuring embedded dashboards and role-based access control (RBAC) to support the organization’s transition from Tableau.
Revamped legacy PySpark transformation logic with class-based designs to improve modularity, scalability, and performance, achieving 70% faster pipeline run time on Linux servers.
Developed comprehensive test datasets and integrated automated pipeline testing, cutting debugging time by 25%.
Authored detailed documentation in Confluence, covering workflow logic and environment configurations. Worked closely with cross-functional teams, ensuring Agile sprint deliverables were met consistently.
Master’s Dissertation
Jan 2024 - Jun 2024 · 6 months
JNU | Delhi, India
Dr. Arnab Bhattacharjee’s Lab, Department of Computational and Integrative Sciences (SCIS)
Dissertation Title: Identification of Novel Biomarkers for Head and Neck Squamous Cell Carcinoma (HNSC) Using AMPK Pathway Gene Expression Data
Analyzed over 830 samples from the TCGA and GEO datasets, applying data normalization and differential expression techniques to filter potential biomarkers, focusing on genes with a minimum of 2-fold increase.
Developed a 10-fold cross validated machine learning Cox Hazard Model using Python and scikit-learn, with 3 regularization techniques, leading to the identification of 6 key genes as independent prognostic markers.
Validated results through statistical analysis (ROC curves, PCA, correlation heatmaps, network graphs, immune response analysis), improving model’s accuracy to 70%, providing insights for cancer prognosis.
Research Intern
Jun 2023 - Jul 2023 · 2 months
IISc | Bangalore, India
Dr. Mohit Kumar Jolly’s Lab, Department of Biological Sciences and Bioengineering (BSSE)
Research Focus: Investigated the stochastic dynamics of multipotent naïve CD4+ T cell differentiation, proposing a novel two-step differentiation process with insights in cancer biology.
Analyzed and compared 10 Bulk RNA-seq expression datasets against 4 curated signature gene sets, utilizing statistical techniques like Gene Set Enrichment Analysis (GSEA) and Principal Component Analysis (PCA), to generate statistically significant results with 95% confidence.
Contributed to the analysis of the Pan-Cancer TCGA dataset, applying Survival Cox Models and Kaplan-Meier analysis, NetworkX and cancer gene sets to uncover critical insights into patient survival and gene expression patterns.
Developed and optimized Python scripts using Matplotlib, Pandas, and other libraries to streamline the analysis of large-scale gene expression datasets, resulting in more efficient data processing and visualization.
Collaborated closely with 2 PhD candidates and 1 fellow research intern, leading weekly project updates and discussions. This teamwork approach led to the completion of 2 key projects within the internship, increasing project efficiency by 50%.
Research Project
Sep 2021 - Feb 2022 · 6 months
Shivaji College, University of Delhi | Delhi, India
Dr. Renu Baweja’s Lab, Department of Biochemistry
Title: Effect of milk and mustard oil consumption: A case study on youth in Delhi-NCR. Published in SAGE journals.
Collected and compiled the survey data to analyze the trends of different lifestyle factors and correlated them with diseases and Hematological Parameters.
Utilized Microsoft Excel to identify key patterns and significant variables contributing to health deterioration.
Projects
Education
M.Sc. Computational Biology
Sep 2022 - Aug 2024 · 2 years
JNU, New Delhi
Achived the highest overall marks in my batch, with top scores in dissertation evaluation.
Collaborated on research with two PhD students from different departments, resulting in a paper currently under review.
CGPA: 7.54/9
Concentrations: Bioinformatics, Data Structures, Python, R, Statistics I & II, Calculus I & II, Machine Learning, Omics Sciences, Molecular Simulation, Linux.
B.Sc. (Hons) Biochemistry
Jul 2019 - Jul 2022 · 3 years
Shivaji College, University of Delhi, New Delhi
Editorial member for the 4th edition of the departmental magazine Biokemi. Edited research articles and the "Fun with Science" columns to ensure proper structure for publication.
Published an original article in Biokemi departmental magazine.
Concentrations: Genetics, Recombinant DNA, Immunology, Cell Culture, UV/Vis Spectroscopy, PCR, Blotting.
Publications
Duddu, A. S., Andreas, E., Harshavardhan, B., Grover, K., Singh, V. R., Hari, K., Jhunjhunwala, S., Cummins, B., Gedeon, T., & Jolly, M. K. (2024). Multistability and predominant double-positive states in a four node mutually repressive network: a case study of Th1/Th2/Th17/T-reg differentiation. bioRxiv (Cold Spring Harbor Laboratory). https://doi.org/10.1101/2024.01.30.575880
Acknowledged in:
Sharma, M., Vavilala, P., Singh, A., & Baweja, R. (2022). Effect of milk and mustard oil consumption: A case study on youth in Delhi-NCR. Nutrition and Health, 29(1), 25–29. https://doi.org/10.1177/02601060221116198
Honors and Awards
3rd Place | "Skills to Start-Up"
IISER Bhopal
IISER Bhopal
Issued by National level science entrepreneurship (Scipreneur) Competition, Innovation and Incubation Centre for Entrepreneurship (IICE).
Developed and presented an innovative business model addressing a scientific problem.
Collaborated with a team to compete against participants including PhD candidates.
Received prize money and a certificate for achieving 3rd place.
Hackathon | “Diabetes Solution”
Jul 2019 - Jul 2022 · 3 years
ISCB RSG India
Participated and received prize money in a hackathon focusing on identifying diabetes causing genes.
Collaborated with a team to analyze differentially expressed genes, construct a protein-protein interaction (PPI) network, and perform Gene Ontology (GO) enrichment and annotation for diabetes-related genes.
Volunteer Experience
Member
Apr 2023 - Apr 2024 · 1 yr 1 mo
ISCB RSG India
IISER Bhopal
RSG India is a branch of the International Society for Computational Biology Student Council (ISCB-SC), comprising over 2,000 students, faculty, industry professionals, and guest speakers across India.
Managed the RSG India YouTube account, including uploading webinar videos and designing channel banners, thumbnails and demonstrating strong communication and digital literacy skills.
Collaborated with a team to design over 10 promotional posters and flyers for various events, leading to a 30% increase in event attendance and showcasing creativity and teamwork.
Assisted in creating and distributing Google Forms and questionnaires to enhance audience engagement.
Anchor
19 - 20 Oct 2023 · 2 Days
JNU, New Delhi
Anchored at "Scientegration" 2023 4th Symposium. 2-Day Department Symposium of various PhD presentations and guest speakers.
Introduced Speakers, Faculty and Guests. Volunteered in Prize and certificate Distribution.
Anchor
18 - 20 Feb 2021 · 3 Days
Shivaji College, University of Delhi, New Delhi
Hosted interactive questionnaire sessions during a 3-day hands-on workshop on "Green Synthesis of Nanoparticles and their Biomedical Applications," organized by the Department of Biochemistry.
Facilitated discussions among 50+ participants, enhancing engagement and reinforcing key workshop concepts.
Test Scores
IELTS English Score:
GAT B ‘22:
GATE Biotechnology ‘22:
IIT Jam ‘22:
TIFR JGEEBILS ‘22: