big-data-processing

Here are 25 public repositories matching this topic...

souvik-databricks / dlt-with-debug

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

pyajs / veronica

Star

big data processing and machine learning platform，just like useing sql

sql python3 pyspark machine-learning-platform big-data-processing xql

Updated Oct 15, 2024
Python

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

kochlisGit / Big-Data-Algorithms

Star

Implementation of algorithms for big data using python, numpy, pandas.

python bloom-filter lsh streams frequent-itemset-mining pcy frequent-itemsets stream-mining shingling big-data-processing lsh-algorithm min-hasing similar-items a-priori multistage-pcy multihash-pcy

Updated Apr 27, 2020
Python

devarshpatel1506 / smart_traffic_routing

Star

From traffic sensors to smarter cities: real-time congestion prediction with Kafka, Spark, LSTM, XGBoost, and dynamic routing powered by graph algorithms.

visualization machine-learning kafka big-data spark algorithms data-engineering optimization-algorithms streaming-data big-data-processing streamlit

Updated Sep 25, 2025
Python

JKA098 / Pokemon-Feistiness-Apache-Spark-Job

Star

The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.

java ubuntu distributed-computing open-data batch-processing data-pipeline hadoop-mapreduce cluster-computing linux-environment big-data-processing apache-sparksql data-analytics-project

Updated May 8, 2025
Python

IncredibleProgress / sweetheart.py

Star

rock-solid pillars for enterprise-grade solutions

python vue jupyter ubuntu rethinkdb rhel rust-lang nginx-unit tailwindcss big-data-processing py-script

Updated Feb 5, 2024
Python

JamesHanZhang / table-data-format-transform-app

Star

excel, markdown, csv, sql 数据源批量/单独格式互相转换

easy-to-use data-preprocessing etl-framework big-data-processing csv-to-excel csv-to-sql multifileupload data-cleaning-pipeline excel-to-md

Updated Nov 23, 2023
Python

levindoneto / pandas-simple-csv-parser

Star

Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.

parser csv data-manipulation pandas-dataframes conda-environment pandas-datareader big-data-processing

Updated Jan 7, 2019
Python

louiecai / Sentiment-Analysis-API

Star

Sentiment-Analysis-API

nlp machine-learning deep-learning sentiment-analysis neural-network lstm-neural-networks rnn-pytorch big-data-processing

Updated Jul 18, 2022
Python

Faisal-AlDhuwayhi / Data-Lake

Star

Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark

aws sql big-data spark amazon-emr pyspark data-engineering data-lake cloud-computing amazon-s3 etl-pipeline big-data-processing

Updated Dec 23, 2022
Python

vishu-tyagi / BigQuery-ELT

Star

BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP

python docker bigquery airflow spark terraform pyspark dbt elt batch-processing big-data-analytics etl-pipeline big-data-processing elt-pipeline

Updated Feb 6, 2023
Python

Turnipdo / Docker-Spark-Setup

Star

Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.

setup spark docker-container big-data-processing

Updated Jun 21, 2024
Python

mixaisealx / DevOps-n-DataOps

Star

Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).

Updated Feb 20, 2026
Python

OuchenOussama / hespressence

Star

Kappa Architecture Based Sentiment Analysis System for User Comments

nlp big-data sentiment-analysis big-data-analytics big-data-processing

Updated Feb 10, 2025
Python

DrFarouk / big-data-analytics

Star

A practical coursework-style project from my Master's studies in Big Data Analytics (at University of East London), showcasing hands-on use of big data tools and techniques on a real-world cyber-security dataset.

Updated Nov 18, 2025
Python

amgunawan / big-data-project-class-b_team3

Star

End-to-end big data pipeline for delivery operations analytics using distributed storage, batch & stream processing, and live dashboards to support operational monitoring and decision-making.

big-data big-data-processing food-delivery-analytics