Streams - NodeConf 1012

Marco Rogers, Yammer

Node Streams

I'm Marco Rogers
I work at Yammer.
I'm a javascript guy.

What are they?

Callbacks

				
fs.readFile('/path/to/file.txt', function(err, data) {
  // do something with your data
});
				
			

What are they?

Event Emitter

				
fileHandle.on('data', function(data) {
  // do something with your data
});
				
			

What are they?

Streams are special event emitters.

Streams are the way node tells you about incremental data.

				
request.on('data', function(chunk) {
  console.log('Got some data', chunk);
});
request.on('end', function() {
  console.log('Done');
  response.end();
});
				
			

What are they?

Streams are the way node encourages you to stop procrastinating.

				
// uploading
var data = '';
req.on('data', function(chunk) {
  data += chunk;
});
req.on('end', function() {
  fs.writeFile('./files/test.json', data);
  res.end();
});
				
			

Why not?

Buffering into memory is inefficient. That's how the other guys do it.

Streams are meant to be composed together into a "pipe" of continuous data.

Streams API

				
// uploading
var file = fs.createWriteStream('./files/test.json');
req.pipe(file);
				
			

Why?

  • Keep things moving.
  • Lower memory usage.
  • Higher throughput.
  • Handle backpressure.

Why?

Example

uploading a file

How?

Write your own streams!

How?

ReadStream: Emits data as output

				
req.on('data', function(data) { ... });
file.on('data', function(data) { ... });
				
			

How?

WriteStream: Accepts data written as input

				
res.write(data);
file.write(data);
				
			

How?

ReadWriteStream: Kicks ass

Create a ReadWrite Streams to process and filter data while you stream it to it's destination

ReadWriteStream

Writable

  1. this.writable = true
  2. Implement write(), process some data
  3. Implement end(), stop when you're done
  4. Expect a last write on end()

ReadWriteStream

Readable

  1. this.readable = true
  2. Emit "data" at some point
  3. Emit "end" at some point

ReadWriteStream

Compatibility

  • Implement pause() and resume()
  • Watch for "drain" event
  • Flip readable/writable when appropriate
  • Don't emit data after "end"

How?

Example

gzipping

Why?

Streams are how you bring modularity to streaming data.

Separate data processing from request handling.

Write more streams. Support streams in your libraries.

Help us make streams better.

Sneak Peek - New Readable Streams

Slated for node 0.10.0 stable release.

Improve the story around parsing non-trivial streams of data.

Make stream support in node more consistent.

				
var readstream = require('fs')
   .createReadStream('./data.json');

// hook to readable so you can restart your processing
readstream.on('readable', function process() {
  while((data = readstream.read()) !== null) {
  	// do something with your data chunk
  }
});
				
			

New base classes

Duplex - easier ReadWriteStream

Transform - process streaming data

Try them out

https://github.com/isaacs/readable-stream

"streams2" branch in node

Questions?

  • Strings vs. Buffers?
  • Better error handling?
  • When not to use streams?
  • Doing async work before piping?
  • Why doesn't pause() actually pause?

Marco Rogers

Thanks. Let me know what other topics you want to hear about.