Home AGI Data processing pipeline

Data processing pipeline

History / Edit / PDF / EPUB / BIB /
Created: January 27, 2016 / Updated: August 30, 2025 / Status: in progress / 2 min read (~234 words)
artificial-general-intelligence

Processing data is one of the core activities of a program. There are many ways to write how to process a given set of data, however the concept of pipes and streams has been a popular one for many years.

In this article, we look into a potential implementation that would allow us to deal with data processing in a generic fashion.

Plug-and-play addition of new processing units
The ability to replay already processed data on newer processing units only
Processing units have an identifier, a list of processing units it depends on (dependencies) and a processing function
Processing units, like pipes, can be connected to one another

Graph Directed graph of operations to be executed
Feed/Placeholder Indicate where data can be fed into the graph
Operation Operation executed on data provided as input
Fetch/Output The result of an operation
Session A context within which a set of computation is executed

Stream set of data
Kernel functions operations