Data processing pipeline

History / Edit / PDF / EPUB / BIB /
Created: January 27, 2016 / Updated: July 24, 2025 / Status: in progress / 2 min read (~234 words)
artificial general intelligence

Processing data is one of the core activities of a program. There are many ways to write how to process a given set of data, however the concept of pipes and streams has been a popular one for many years.

In this article, we look into a potential implementation that would allow us to deal with data processing in a generic fashion.

  • Plug-and-play addition of new processing units
  • The ability to replay already processed data on newer processing units only
  • Processing units have an identifier, a list of processing units it depends on (dependencies) and a processing function
  • Processing units, like pipes, can be connected to one another

  • Graph Directed graph of operations to be executed
  • Feed/Placeholder Indicate where data can be fed into the graph
  • Operation Operation executed on data provided as input
  • Fetch/Output The result of an operation
  • Session A context within which a set of computation is executed

  • Stream set of data
  • Kernel functions operations