A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
-
Updated
Dec 7, 2022 - Python
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
big data processing and machine learning platform,just like useing sql
A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.
Implementation of algorithms for big data using python, numpy, pandas.
From traffic sensors to smarter cities: real-time congestion prediction with Kafka, Spark, LSTM, XGBoost, and dynamic routing powered by graph algorithms.
The following readme file, assume that before running the Spark analytic job, you have already installed the correct versions of **Java**, **Hadoop**, **Spark** and that you are inside **Ubuntu**.
rock-solid pillars for enterprise-grade solutions
excel, markdown, csv, sql 数据源批量/单独格式互相转换
Simple CSV parser for huge volumes of data with the use of the library Pandas for Python for getting specific columns of a CSV file and putting the extracted data into one or more files (each column in a separated file or all of them in the same output) in a short amount of time.
Sentiment-Analysis-API
Building Data Lake and ETL pipelines using Amazon EMR, S3, and Apache Spark
BigQuery data pipeline with dbt, Spark, Docker, Airflow, Terraform, GCP
Setting up a Spark cluster in a Docker environment for improved repeatability and reliability. This project includes a simple transformation on a dataset containing approximately 31 million rows.
Hands-on project demos covering infrastructure automation (Ansible, Docker), big-data processing & streaming (Hive, Spark, Kafka), and network experiments (MitM, TCP-over-UDP).
Kappa Architecture Based Sentiment Analysis System for User Comments
A practical coursework-style project from my Master's studies in Big Data Analytics (at University of East London), showcasing hands-on use of big data tools and techniques on a real-world cyber-security dataset.
End-to-end big data pipeline for delivery operations analytics using distributed storage, batch & stream processing, and live dashboards to support operational monitoring and decision-making.
Exploring and Implementing Scalable Data Processing Techniques
This work is from my master thesis: Condition Monitoring with Machine Learning: A Data-Driven Framework for Quantifying Wind Turbine Energy Loss.
Solved tasks of the master's degree courses of speciality "Algorithms and Systems for Big Data Processing".
Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.
To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."