About Me:
I completed my PhD in Electrical Engineering, during which I consistently focused on predictive modelling of complex nonlinear systems from many messy datasets. Moreover I also did courseworks in Machine Learning and Statistics for Data Science. Alongside my coursework, I conducted research that involved Exploratory Data Analysis (EDA) and Machine Learning algorithms. This combination of academic training and hands-on research sparked my interest in transitioning from academia to the data science industry. I found the process of working with raw data, uncovering patterns, and transforming numbers into meaningful insights and actionable business recommendations fascinating. I began relentlessly working on Python, SQL, Excel, Tableau and this journey naturally shifted my career trajectory toward becoming a data scientist, where I could utilize my skills in data analysis, machine learning, and statistical modeling to solve real-world problems.
- Programming: Python, MySQL,PostgreSQL, Excel, Matlab
- ML & Deep Learning Frameworks: Keras, Scikit-Learn,TensorFlow
- Data Analytics & Visualization: Pandas, NumPy, SciPy, Statsmodel, Matplotlib, Seaborn, Plotly
- Development Tools: VS Code, PyCharm, MATLAB & Simulink
- Version Control: Proficient in Git & GitHub
Recognitions & Certifications
- SQL Associate Certificate – Industry Recognized Certification Issued by DataCamp
- Honored Listee, Marquis Who’s Who 2025 – Recognized for professional achievement and distinction*
Technical Expertise
- Writing SQL Queries for Real world Problems: Subqueries, Joins, CTEs, Window Functions, Ranking Functions
- Data Cleaning & Visualization: handling missing values, dataformating, removing outliers, feature extracction, grouping data based on features;outlier treatment with (Interquartile Range)IQR capping, correlation heatmap;Visualization using Matplotlib and seaborn
- Descriptive and Inferential statistical Metrics(Hypothesis testing & confidence Interval);
- Hands-on experience working on large-scale datasets with machine learning algorithms for regression, classification, clustering type problems, feature engineering and model deployment.ML Model Deployment using Flask
- Strong expertise in time series forecasting, Data Analytics, Anomaly detection, and Deep learning architectures (TCN, CNN, RNN, LSTM)
- Developed hybrid modeling involving DL-based models for analyzing and predicting complex system dynamics, Time series Analysis using ARIMA, SARIMAX and NARX methods.
- Actively working as an independent contractor /subject expert to train AI in Electrical Engineering at Outlier.ai
Notable Research in Data Science (With high impact factor Research Publications):
Time series forecasting of a Real Time pH Neutralization Process:
Collected around 10705 samples of real-time dataset(pH base flow rate, acid flow rate) of a complex nonlinear pH neutralization process. The nonlinear dynamics of the pH titration curve is predicted using deep learning algorithms: Temporal Convolutional Networks (TCN) and LSTM networks
Key Highlights:
- Data partitioned to (train/test) and model development, Identified the trends, seasonality and irregular values of pH, acid and base flow rate data for time- series forecasting.
- Developed TCN and LSTM based pH prediction models with TCN outperforming LSTM with RMSE = 0.023 and R² = 97.6%, ensuring 18% less downtime in process operations.
- Tuned hyperparameters (kernel size, dilation, dropout) using grid search, improving model accuracy by +8% and cutting validation loss by > 30%, while preventing overfitting through dropout (0.3) and weight normalization.
- Reduced calibration/setup time by 40% through accurate modeling of nonlinear titration curves, increasing production throughput and reducing batch delays.
- Delivered $50K/year in savings through improved pH balance, reducing chemical wastage by 30% and extending equipment lifespan in pilot-scale systems.
- Designed models for cloud and edge deployment, supporting real-time pH control supporting smarter scheduling, procurement, and logistics planning.
- Predicted the nonlinear dynamics of the multislope pH titration curve using Nonlinear ARX,ARIMA, RNN and CNN-LSTM Networks.
- Executed sequence modeling: used dilated causal convolutions and a residual learning framework for parallel processing and to handle very long sequences.
- Implemented hyperparameter tuning (dropout, learning rates, filter size, hidden-state size) and grid search optimizer to enhance generalization
- Validated the prediction accuracy through standard statistical metrics(Descriptive and Inferential).
Regression Analysis using kSINDYc - A ML Approach:
- Conducted time series analysis on process control data sets including Continuous Strirred Tank Reactors, Heat Exchanger and Bioreactor using a novel machine learning algorithm called key term based Sparse Regression of Nonlinear Dynamics and Control (kSINDYc) and validated their training and testing accuracy using statistical metrics.
- Formulated nonlinear objective functions, involved regularization methods like (Ridge, Lasso, Droppot) to overcome overfitting
Featured DS Project Works (Github):
a. ExpiryGenie – Smart Food Expiry Tracker (Deployed on AWS with HTTPS)
Households and small food businesses often face avoidable food waste due to forgotten expiry dates. There was a need for a smart, accessible solution to track food shelf life, provide expiry alerts, and help users reduce waste and save money.
Why This Project Matters
• Built an intuitive multi-tab dashboard to add food items via manual entry, text, voice, and image/receipt scanning using Gemini AI and OCR.
• Integrated AWS S3 to store food records and user credentials securely.
• Provisioned and configured AWS EC2 Ubuntu instance with Python 3.12.
• Used Elastic IP for stable access and Streamlit with nohup to run persistently in the background.
Business Insights
• Food waste reduction: Enables early expiry alerts based on AI-predicted shelf life or receipt scan input.
• Cost savings: Tracks items used on time, displaying money saved per user.
• User behavior: Data on frequently wasted categories can guide personalized reminders or donation prompts.
b. Real-World Stock Forecasting Dashboard with Streamlit cloud
An interactive data science web app for stock price analysis and prediction using real-time data from Yahoo Finance. Built with Python
, Streamlit
, and popular machine learning and deep learning libraries, this project helps users analyze market trends, explore financial statements, and forecast future prices using models like ARIMA, SARIMA, and LSTM.Github
Why This Project Matters
While many stock prediction projects focus on just one model, this project uniquely combines multiple forecasting models, classification algorithms, EDA, feature engineering, time series models and deep learning in a single interactive app.
c. Bit-Coin Price prediction using Time-series Forecasting
ToolsUsed: NumPy, Pandas, datetime, Matplotlib, Seaborn, Statsmodels and SciPy
The primary objective of this project is to compare the accuracy of bitcoin price in USD prediction from time-series data based on two different models, Long Short term Memory (LSTM) network and ARIMA model. Collected the recent dataset (Sep 2014 - March 2025) from yahoo financing. Here are the questions I was interested in answering:
- How exactly did detecting trends 12–24 hours earlier help the business?
- What smoothing techniques did you use, and why were they chosen over others?
- Were there any false positives in trend detection? How did you handle them?
- How reliable was the ADF test in real-world (live) data compared to historical data?
Key Highlights:
- Detected early price trends by transforming and stabilizing Bitcoin data using ADF tests and smoothing techniques, helping teams react 12–24 hours sooner during trend and seasonality shifts.
- Improved price prediction forecast accuracy by comparing ARIMA and LSTM models, showing LSTM reduced errors by 18% -helping guide future model choices.
- Saved analyst monitoring time by automating steps like differencing and correlation checks (ACF, PACF) with statistical tools, reducing manual work by around 10 hours per week, making insights easy to understand for non-
- Built trust in predictions by validating models with clear performance metrics (MAE, RMSE, and R²) and visuals, making insights easy to understand for non-technical teams.
- Recommended weekly model updates to keep forecasts accurate during market swings, helping reduce prediction delays and improve response times during high-noise events like news-driven price fluctuations.
c. Predictive Analytics for Employee Turnover Reduction with ML
- Uncovered Key Attrition Drivers Using Exploratory Data Analysis
Identified top predictors of employee churn — low satisfaction (<0.5) and extreme monthly hours (>250) — through correlation heatmaps, KDE plots, and project-based distribution analysis.
Key Highlights:
- Segmented Leavers into Actionable Clusters Using KMeans
Applied unsupervised clustering to identify 3 distinct exit profiles: disengaged, burnt-out high performers, and misaligned contributors — enabling HR to design targeted retention plans.
- Built Predictive Models with 96% AUC to Forecast Attrition
Trained and validated Logistic Regression, Random Forest, and Gradient Boosting classifiers (with SMOTE) using 5-fold CV; Gradient Boosting emerged best with 0.96 AUC and strong feature interpretability.
- Enabled Early Intervention Through Risk Scoring and Reporting
Designed a risk scoring system to flag at-risk employees monthly, empowering HR with real-time dashboards for proactive retention strategies.
- Delivered Business Impact with Data-Driven HR Strategy
Insights led to pilot programs on workload balance and internal promotions, projecting a 15–20% reduction in attrition and lowering rehiring costs across departments
d. EDA and Hypothesis Testing on Marketing Campaign Dataset
- Performed exploratory data analysis and hypothesis testing on a marketing dataset integrating the five Ps (People, Product, Price, Place, and Promotion). Demonstrated the US market’s performance relative to other countries through comparative statistical analysis. Here are the questions I was interested in answering.
- How does the US market’s performance specifically differ from other countries? Are there particular products, prices, or promotions driving this?
- Which of the five Ps (People, Product, Price, Place, Promotion) had the biggest impact on campaign success?
- Based on your hypothesis testing, what immediate actions would you recommend for marketing strategy in the US versus globally?
Key Highlights:
- Ordinal Encoding and mapping based on Education level as (for category:Basic: 0,2Nd Cycle: 1, Graduation: 2,Master: 3,Phd: 4)
- Encoded categorical variables using ordinal and one-hot encoding with Pandas, ensuring compatibility with statistical models and boosting hypothesis test accuracy.
- Conducted hypothesis testing with SciPy t-tests to validate age-channel preferences, child-based online behavior, channel cannibalization, and U.S. purchasing dominance - guiding campaign focus and regional resource allocation.
- Identified high/low-performing products through Pandas groupby analysis, recommending promotion reallocation toward top-selling categories to maximize Return On Investment (ROI).Forecasted $250K increase in quarterly revenue based on improved conversion rates and reduced churn
Recommended reallocation of 30% more budget toward Segment B(ages 25–34) and shift email content strategy to focus on personalized offers.
e. SQL Project using Paintings & Museum Dataset:
Analyzed museum inventory data with SQL to identify unexhibited paintings and underutilized museum spaces. Discovered 15% of artworks not displayed, highlighting opportunities to improve visitor engagement and provided insights to optimize art rotations and enhance museum profitability.
Key Highlights:
- Optimized museum inventory insights by querying and identifying 15% of paintings not currently exhibited and uncovering underutilized museum spaces.
- Improved collection management by detecting museums with no associated paintings, enabling better artwork distribution planning and enhancing visitor experience.
- Enhanced data-driven decision-making by cleaning inconsistencies (e.g., mismatched artist IDs, missing gallery assignments) to support accurate reporting on artwork visibility and artist representation
Project/Course Certifications:
Honors & Awards
Ph.D. in Electrical Engineering | ML & Deep Learning | Data Science Research
In my PhD research at Anna University, (2018-2023), I worked on a challenging multi-disciplinary topic – ‘Nonlinear system Identification, nonlinearity quantification and control of Nonlinear Systems’, where I integrated the concepts of Nonlinear system identification from Electrical Engineering, time series forecasting (ARIMA Models) and analysis using Machine learning/deep learning algorithms and applied it to control engineering problems. This experience shaped my skills in analysing large raw datasets, applying rigorous optimization solvers,hyperparameter tuning and developing new approaches to complex real-world problems.This experience also stimulated my deep interest in diving in-depth in Machine learning/Deep learning algorithms and exploring Data Science to transition into this field.
Research Publications
I have published around 10 peer-reviewed articles in science citation indexed journals. Among these, three research articles were focused on Machine learning based system identification as a first author in journals with high impact factor. My publications have over 60 total citations and a h-index of 5.
You can view my publications in the my Google Scholar page
Collaborative Projects and Scientific Writing
As a Full-time researcher at Anna University, I designed and executed multiple projects, collaborated with faculty at Central Research labs, and mentored Masters students on advanced topics in control systems, Machine learning and Deep learning which resulted in good publications in high impact factor Journals.
Github Projects:
Data Science Projects
ML Projects
For Python Projects