Researcher | Data Analyst | Buisness Analyst
View the Project on GitHub Kathirrodri17/Kathir-J-M.github.io
Hello and welcome to my Portfolio! I’m Kathir, Mater’s degree Gradute in Economics and a Data analyst with a talent for turning raw data into compelling stories and meaningful insights. This Page showcases a collection of projects that highlight my expertise in data manipulation, statistical analysis, and visualization.
In this Portfolio, you’ll find a range of projects demonstrating my skills in various aspects of data analysis:
Each project in this page includes detailed descriptions, code link, and where applicable, interactive visualizations or dashboards. You can explore the code, analysis, and visualizations to see the methodologies and techniques used to tackle real-world data challenges.
I use a variety of tools and technologies in my data analysis work, including:
Developed a machine learning model to predict sales for various products across Big Mart outlets, using XGBoost Regressor to address challenges in the retail industry such as inventory management and supply chain optimization.
Utilized Kaggle’s Big Mart Sales dataset, which includes product and outlet details such as item weight, visibility, type, MRP, outlet size, location, and establishment year.
Item_Weight
and the mode for Outlet_Size
. Applied label encoding to convert categorical variables to numerical values.The model provided reasonable sales predictions with some room for improvement, demonstrating its potential for enhancing inventory management and customer satisfaction.
This project involves an in-depth exploratory data analysis (EDA) of COVID-19 data using SQL, Excel, and Tableau. The goal was to analyze extensive datasets from multiple countries, clean and prepare the data, and create a comprehensive dashboard to summarize key insights.
Performed exploratory data analysis of global COVID-19 data to uncover key insights and trends. The analysis focused on infection rates, mortality, and vaccination coverage across different countries and regions, culminating in the creation of an interactive Tableau dashboard.
Data Exploration and Findings: Imported and explored data from the CovidDeaths
table, including metrics such as total cases, new cases, total deaths, and population by location and date.
Analysis of Total Deaths vs. Total Cases in India: Calculated the death percentage in India to assess the severity of COVID-19 in terms of mortality, comparing it with other regions.
Analysis of Total Cases vs. Population: Examined the percentage of the population infected by COVID-19 in India to evaluate the extent of the spread and the effectiveness of public health measures.
Countries with Highest Infection Rates: Identified countries with the highest infection rates relative to their populations, highlighting the most affected regions.
Countries with Highest Death Counts: Found the countries with the highest total death counts to understand the absolute impact of COVID-19.
Death Counts by Continent: Analyzed continents with the highest total death counts to identify regional disparities in COVID-19’s impact.
Global COVID-19 Metrics: Summarized global metrics, including total cases, total deaths, and death percentage, providing a macro perspective on the pandemic’s impact.
Population vs. Vaccination Coverage: Explored the correlation between total population and vaccination numbers to assess progress towards achieving herd immunity.
Rolling Vaccination Coverage: Calculated rolling vaccination coverage over time using a Common Table Expression (CTE) to track the pace of vaccination campaigns.
Temporary Tables for Vaccination Analysis: Created temporary tables to efficiently analyze and store vaccination coverage data, enabling deeper insights into trends.
This project provides a comprehensive exploratory data analysis of COVID-19, offering valuable insights into global and regional impacts, including infection rates, mortality, and vaccination coverage. The interactive Tableau dashboard visualizes these findings, serving as a tool for understanding the pandemic’s effects and supporting informed public health decisions.
For an interactive exploration of the data, visit the COVID-19 Dashboard. To get to know the code and depth of the process check Github
This Project aims to predict the winners of English Premier League (EPL) matches for the 2024-25 season by leveraging historical data from the 2022-2024 seasons to develop a Machine Learning model to predict. The prediction model focuses on determining match winners using various match-related features such as team performance, statistics, and match conditions.
Developed a prediction model to forecast EPL match outcomes, particularly the winners, using historical data from the 2022-2024 seasons. The model assists fans, analysts, and stakeholders in making informed decisions about upcoming matches.
Data was scraped from the English Premier League using HTML extraction from Fbref. The data collection process was carried out in two phases:
Tools used: Beautiful Soup and Pandas for data extraction and transformation into a Pandas DataFrame.
The prediction model was built using Scikit-Learn with the following classifiers:
The dataset was split into:
Initial Results:
To improve prediction accuracy, the following steps were taken:
Final Results:
This project focused on uncovering critical sales insights to support a computer hardware manufacturer’s decision-making process. The sales team struggled to access actionable data, so the goal was to provide a clear, data-driven view of their sales performance.
Conducted exploratory data analysis (EDA) to extract valuable insights from the sales data. Key SQL queries included:
Example SQL Queries:
-- Show all customer records
SELECT * FROM customers;
-- Show total revenue in Chennai in 2020
SELECT SUM(transactions.sales_amount)
FROM transactions
INNER JOIN date
ON transactions.order_date = date.date
WHERE date.year = 2020 AND transactions.market_code = 'Mark001';
Transformed and normalized the data to ensure consistency, enabling accurate analysis. A custom formula was applied to convert all sales amounts to a single currency.
Custom Formula:
= Table.AddColumn(#"Filtered Rows", "normalised_amount", each if [currency] = "USD" or [currency] = "USD#(cr)" then [sales_amount] * 75 else [sales_amount])
Established relationships between tables in Power BI to enable seamless analysis across multiple data sources. Developed an interactive dashboard that visualizes sales performance across different dimensions, empowering the sales team to make informed decisions.
Performance Insights: Identified sales trends and key performance indicators (KPIs) across different markets and currencies. Profit Analysis: Analyzed profitability across different product lines and regions. Conclusion This project successfully addressed the challenge of unlocking hidden sales insights for a computer hardware manufacturer. By leveraging SQL for in-depth data analysis, Power Query for data transformation, and Power BI for data modeling and visualization, the sales team gained access to a comprehensive dashboard that enables informed, data-driven decisions. The integration of these tools not only streamlined the analysis process but also provided valuable insights that were previously inaccessible, ultimately contributing to better strategic planning and improved business outcomes.
If you have any questions or feedback, don’t hesitate to reach out. You can connect with me on LinkedIn for more about my professional journey and other work.
Thank you for visiting my portfolio. I hope you find the projects interesting and insightful!
J.M.Kathir
LinkedIn | kathirrodriguez@gmail.com | Github | CV