XAI SPACE
  • 👋Who am I?
  • 🌏Mission and Vision
  • 🧑‍🤝‍🧑Team
  • 👨‍💻Core functionality
    • 🧠Technologies
    • 🤖AI Framework
    • 🖥️Data Stream
  • 🛣️Roadmap
  • 🔗Links
  • 🌠Disclaimer
Powered by GitBook
On this page
  • Architecture Description
  • Kafka
  • Flink
  • Spark
  • ClickHouse
  • Other components
  1. Core functionality

Data Stream

PreviousAI FrameworkNextRoadmap

Last updated 5 months ago

Architecture Description

This architecture aims to build a large-scale data stream processing platform with both real-time and batch processing capabilities.

Kafka

• As a message queue system, Kafka is used to receive and buffer real-time data from various data sources.

• Data sources can be user operation logs, sensor data, or business system events.

Flink

• Data in Kafka is processed in real time through Flink to complete filtering, aggregation, and complex event processing (CEP).

• Flink tasks will continue to consume data from Kafka topics and output the processed results to downstream storage.

Spark

• Used for batch analysis of historical data or large-scale offline data.

• Spark reads historical data from data stores (such as HDFS, S3) and performs big data computing and machine learning tasks.

ClickHouse

• As a high-performance columnar storage, ClickHouse is used to store and query processed aggregated or historical analysis data.

• ClickHouse supports real-time data insertion and complex query operations, and is suitable for building BI reports and data analysis systems.

Other components

• Zookeeper: Coordinates Kafka partitions and distributed task management.

• Connector/ETL: Data is imported from different sources into Kafka or Flink/Spark.

• Dashboard or BI tool: User interface used to visualize and analyze ClickHouse data.

👨‍💻
🖥️