How to read from standard input

Standard input, standard output, and pipes are so fundamental
Unix concepts that you certainly think you know them well.
Nevertheless, I bet you can learn something from this blog.

The task we pick today is to write a program that reads from
standard input and processes that input, with two requirements:
  1) When the input comes in big chunks, the processing
     is fast: High throughput.
  2) When the input is typed by a user directly, the processing
     reacts quickly: When the user has terminated typing a line,
     the input is processed immediately.

For simplicity, "processing" the input will mean to echo it on
standard output. Like the 'cat' program does. For simplicity also,
we will treat read errors like EOF, and ignore write errors on
standard output.

Easy, you think? ISO C has everything you need, because standard
I/O in ISO C was created for precisely this purpose, right?
Let's see.

Here's the first try:
================================== mycat1.c ==================================
#include <stdio.h>

int
main ()
{
  for (;;)
    {
      int c = fgetc (stdin);
      if (c == EOF)
        break;
      fputc (c, stdout);
    }
  return 0;
}
==============================================================================

Let's compile this program. We use the option '-O' so as to enable the usual
compiler optimizations, and -Wall so that gcc reports dumb programming blunders
that we made.

$ gcc -O -Wall mycat1.c -o mycat1

Let's check the interactive behaviour, by typing two lines, "Hello" and
"World", and terminate the input by pressing Ctrl-D.

$ ./mycat1
Hello
Hello
World
World
[Ctrl-D]

The input was echoed immediately after each line, which is fine.

Now let's check the throughput. We want to benchmark the stdin processing,
so let's minimize the output processing by piping it to /dev/null.

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat1 > /dev/null
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 30.1262 s, 34.8 MB/s
 Clock:     30.13
  User:     27.33
System:     1.17

This is on a CPU with 1 GHz. 35 MB per second means ca. 30 ns of processing
time for each byte, or ca. 20-50 instructions. This does not sound optimal,
really.

Next try. We've heard that fgetc and fputc are functions and there are
C macros that do the same thing and should be faster. Let's try these
instead:

================================== mycat2.c ==================================
#include <stdio.h>

int
main ()
{
  for (;;)
    {
      int c = getc (stdin);
      if (c == EOF)
        break;
      putc (c, stdout);
    }
  return 0;
}
==============================================================================

The interactive behaviour is the same:

$ ./mycat2
Hello
Hello
World
World
[Ctrl-D]

And the throughput?

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat2 > /dev/null
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 30.4019 s, 34.5 MB/s
 Clock:     30.39
  User:     27.56
System:     1.19

It's the same. No significant change.

So, next try. We heard that throughput can always be increased by
introducing buffers. Every operation that used to be performed on
a single byte will now be performed on an entire buffer at once.
This reduces the function call overhead, and increases the locality
of references during the processing. Here's what it looks like:

================================== mycat3.c ==================================
#include <stdio.h>

int
main ()
{
  for (;;)
    {
      char buf[4096];
      size_t count = fread (buf, 1, sizeof (buf), stdin);
      if (count == 0)
        break;
      fwrite (buf, 1, count, stdout);
    }
  return 0;
}
==============================================================================

Indeed, the throughput is increased:

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat3 > /dev/null
 Clock:     3.11
  User:     0.15
System:     0.82
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.11303 s, 337 MB/s

3 seconds, instead of 30 seconds, that was well worth the effort!
And 3 ns, or 2 to 5 instructions, of processing time for each byte,
that looks reasonable.

What about the interactive behaviour?

$ ./mycat3
Hello
World
Hello
World
[Ctrl-D][Ctrl-D]

First, the lines are no longer processed immediately, and we had to type
Ctrl-D twice to terminate the input.

The latter problem is a programming error. When fread() returns the
12 bytes "Hello<newline>World<newline>", it has already consumed one
of the Ctrl-Ds. Then our loop calls fread() again, and it doesn't return
until the user presses Ctrl-D once again. So, the fix is to exploit
the information that EOF was reached also when fread()'s return value
is between 0 and sizeof (buf).

================================== mycat4.c ==================================
#include <stdio.h>

int
main ()
{
  for (;;)
    {
      char buf[4096];
      size_t count = fread (buf, 1, sizeof (buf), stdin);
      if (count > 0)
        fwrite (buf, 1, count, stdout);
      if (count < sizeof (buf))
        break;
    }
  return 0;
}
==============================================================================

The throughput of this program is unmodified:

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat4 > /dev/null
 Clock:     3.15
  User:     0.15
System:     0.81
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.15642 s, 332 MB/s

and the interactive behaviour is improved: It does not require two Ctrl-D
keystrokes any more.

$ ./mycat4
Hello
World
Hello
World
[Ctrl-D]

But it is not interactive: The first line is processed only after the second
line and the Ctrl-D were entered.

If we didn't have the second requirement about interactive input, for example
if our program expects well-formed XML documents that users never type by hand,
we could stop here.

Next try. We heard that the standard I/O can also be set to an unbuffered
mode or line-buffered mode. Let's try the unbuffered mode first:

================================== mycat5.c ==================================
#include <stdio.h>

int
main ()
{
  setvbuf (stdin, NULL, _IONBF, 0);
  for (;;)
    {
      char buf[4096];
      size_t count = fread (buf, 1, sizeof (buf), stdin);
      if (count > 0)
        fwrite (buf, 1, count, stdout);
      if (count < sizeof (buf))
        break;
    }
  return 0;
}
==============================================================================

$ ./mycat5
Hello
World
Hello
World
[Ctrl-D]

The interactive behaviour is the same, and the throughput as well.

And the line-buffered mode:

================================== mycat6.c ==================================
#include <stdio.h>

int
main ()
{
  setvbuf (stdin, NULL, _IOLBF, 0);
  for (;;)
    {
      char buf[4096];
      size_t count = fread (buf, 1, sizeof (buf), stdin);
      if (count > 0)
        fwrite (buf, 1, count, stdout);
      if (count < sizeof (buf))
        break;
    }
  return 0;
}
==============================================================================

$ ./mycat6
Hello
World
Hello
World
[Ctrl-D]

It has also no effect. Why? Because the fread() call requires buffered input
nevertheless. The setvbuf call would have made a difference for the programs
that read bytes one by one: it reduces the throughput of 'mycat2' from 35 MB/sec
to 2.0 MB/sec. setvbuf(...,_IONBF,...) has the effect that the stdio will
fetch single bytes from the operating system, rather than trying to fill a
buffer with as many bytes as immediately available. But setvbuf cannot
improve the interactive behaviour.

So, what else can we do? We can switch between the fgetc() approach and the
fread() approach, depending on a command-line option. fgetc() is essentially
the same as fread() with a 1-byte buffer. So the easiest ways to unify both
approaches is like this:

================================== mycat7.c ==================================
#include <stdio.h>
#include <string.h>

int
main (int argc, char *argv[])
{
  int unbuffered = (argc > 1 && strcmp (argv[1], "--unbuffered") == 0);
  for (;;)
    {
      char buf[4096];
      size_t bufsize = (unbuffered ? 1 : sizeof (buf));
      size_t count = fread (buf, 1, bufsize, stdin);
      if (count > 0)
        fwrite (buf, 1, count, stdout);
      if (count < bufsize)
        break;
    }
  return 0;
}
==============================================================================

The program now exhibits good interactive behaviour

$ ./mycat7 --unbuffered
Hello
Hello
World
World
[Ctrl-D]

and good throughput

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat7 > /dev/null
 Clock:     2.70
  User:     0.16
System:     0.75
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.70661 s, 387 MB/s

But the complexity has only been pushed from the program to the user.
This is bad: Users shouldn't have to learn about all command-line options
in order to run a program. If there's a best choice, that choice should be
built-in.

So let's try to make the program guess the best choice. If stdin is a tty
(a terminal or terminal emulator), we prefer interactive input; otherwise
we prefer high throughput.

================================== mycat8.c ==================================
#include <stdio.h>
#include <unistd.h>

int
main ()
{
  int unbuffered = isatty (STDIN_FILENO);
  for (;;)
    {
      char buf[4096];
      size_t bufsize = (unbuffered ? 1 : sizeof (buf));
      size_t count = fread (buf, 1, bufsize, stdin);
      if (count > 0)
        fwrite (buf, 1, count, stdout);
      if (count < bufsize)
        break;
    }
  return 0;
}
==============================================================================

$ ./mycat8
Hello
Hello
World
World
[Ctrl-D]

and good throughput

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat8 > /dev/null
 Clock:     2.91
  User:     0.19
System:     0.77
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.91414 s, 360 MB/s

So, that's the solution, now?

No. The detection of whether unbuffered processing is desired was only a
guess. A heuristic. A buggy guess, in other words. In this case:

$ cat | ./mycat8
Hello
World
Hello
World
[Ctrl-D]

the input was processed too late. Whereas this was right:

$ cat | ./mycat7 --unbuffered
Hello
Hello
World
World

How could the program guess that in the case of
  $ cat | ./mycat8
unbuffered input is better, whereas in
  $ dd if=/dev/zero bs=1M count=1000 | ./mycat8
or even
  $ dd if=/dev/zero bs=1M count=1000 | cat | ./mycat8
high throughput is better? Can the program get information about
the origin of the data? No, there are no such APIs in Unix. The
program can determine that stdin comes from a pipe; this is possible
via fstat(). But there is no info available beyond that.

So this heuristic was a dead end.

All the trouble is caused by fread(), which insists in pulling the
specified number of bytes, even if it means to wait. In Unix speak,
the fread() call "blocks". But there is a Unix system call for reading
data that returns just what is available, if something is available. It's
the read() system call.

But this means that we drop the ISO C standard I/O and use the Unix
system calls instead. Here is the program:

================================== mycat9.c ==================================
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int
main ()
{
  for (;;)
    {
      char buf[4096];
      ssize_t count = read (STDIN_FILENO, buf, sizeof (buf));
      if (count <= 0)
        break;
      fwrite (buf, 1, count, stdout);
    }
  return 0;
}
==============================================================================

This program now features the good interactive behaviour:

$ ./mycat9
Hello
Hello
World
World
[Ctrl-D]

and the good throughput as well:

$ dd if=/dev/zero bs=1M count=1000 | time ./mycat9 > /dev/null
 Clock:     2.73
  User:     0.17
System:     0.74
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.74331 s, 382 MB/s

That's the lesson learned. If you want good interactive behaviour and high
throughput on standard input, use Unix read() instead of <stdio.h>.

We are still not done, though. While Unix read(), compared to ISO C fread(),
adds the ability to read just as many bytes as are readily available, it also
reports errors for events that would have gone unnoticed with fread(). Namely,
it returns -1, setting errno to EINTR ("Interrupted system call"), when a
signal had to be handled. On most platforms this happens only when the program
had a signal handler installed in a particular way; refer to the manual pages
for sigaction(), signal(), and siginterrupt() for details. But on some
platforms it occurs even if the program has not installed any signal handler.
To observe this, stop the program through Ctrl-Z and restart it. On MacOS X,
you get this:

$ ./mycat9
Hello
Hello
[Ctrl-Z]^Z
[1]+  Stopped                 ./mycat9
$ fg
./mycat9
$

The fix it to check for the errno value EINTR explicitly. Of course, we do this
only when the read() call failed, that is, when it returned -1. And we need
a #ifdef because EINTR exists only on Unix platforms. Native Windows platforms
don't have it. The program thus looks like this:

================================== mycat10.c ==================================
#include <errno.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>

int
main ()
{
  for (;;)
    {
      char buf[4096];
      ssize_t count = read (STDIN_FILENO, buf, sizeof (buf));
      if (count == 0)
        break;
      if (count < 0)
        {
#ifdef EINTR
          if (errno != EINTR)
#endif
            break;
        }
      if (count > 0)
        fwrite (buf, 1, count, stdout);
    }
  return 0;
}
===============================================================================

And it continues to read from standard input after being suspended and
restarted:

$ ./mycat10
Hello
Hello
[Ctrl-Z]^Z
[1]+  Stopped                 ./mycat10
$ fg
./mycat10
World
World
[Ctrl-D]

Now, this code is a big ugly, and it is easy to forget to handle EINTR each
time you call read(). For this reason, Gnulib has a module 'safe-read' that
provides a safe_read() function, similar to read(), but that handles EINTR
by restarting the call.

================================== mycat11.c ==================================
#include <config.h>

#include <stdio.h>
#include <unistd.h>

#include "safe-read.h"

int
main ()
{
  for (;;)
    {
      char buf[4096];
      size_t count = safe_read (STDIN_FILENO, buf, sizeof (buf));
      if (count == 0 || count == SAFE_READ_ERROR)
        break;
      fwrite (buf, 1, count, stdout);
    }
  return 0;
}
===============================================================================

$ gcc -O -Wall mycat11.c libgnu.a -o mycat11

So that's finally the way to read from standard input with high throughput,
with good interactive behaviour, without bugs on MacOS X, and without #ifdefs
in the middle of the code.


I'll end the lesson with a few remarks about what you can do if you don't
want the user to have to press Return/Enter first. That is, if processing
should begin as soon as the user presses a key.

Normally, you are using the basic line editing behaviour (echoing of
characters, erase behaviour of the Backspace key) built into the tty device.
(Don't confuse this with the advanced line editing, which supports arrow
keys for movement, forward-erase behaviour of the Delete key, and so on.
This line editing comes from the GNU readline library. Either the program
is linked against GNU readline, or a wrapper such as 'rlwrap' or 'rlfe'
(earlier called 'fep') is used.)

This line editing logic sits in the kernel, but can be turned off via
<termios.h> system calls.

But when you turn it off, you also turn off keystrokes that users are
accustomed to rely on: Ctrl-D for terminating the input, Ctrl-C for
interrupting and terminating the program, Ctrl-Z to regain control,
and so on. Users won't like to miss these features.

================================== mycat12.c ==================================
#include <errno.h>
#include <stdio.h>
#include <termios.h>
#include <unistd.h>

static struct termios oldtermio; /* original tty mode */
static int oldtermio_initialized;

/* Sets the terminal in cbreak, noecho mode.  */
static int
term_raw (void)
{
  if (!oldtermio_initialized)
    {
      if (tcgetattr (STDOUT_FILENO, &oldtermio) && errno != ENOTTY)
        return -1;
      oldtermio_initialized = 1;
    }
  {
    struct termios newtermio;
    size_t i;

    newtermio = oldtermio;
    newtermio.c_iflag &= ISTRIP | IGNBRK;
    newtermio.c_lflag &= ISIG;
    for (i = 0; i < NCCS; i++)
      newtermio.c_cc[i] = 0;
    newtermio.c_cc[VMIN] = 1;
    newtermio.c_cc[VTIME] = 0;
    if (tcsetattr (STDOUT_FILENO, TCSAFLUSH, &newtermio) && errno != ENOTTY)
      return -1;
  }
  return 0;
}

/* Sets the terminal in nocbreak, echo mode.  */
static int
term_unraw (void)
{
  if (oldtermio_initialized)
    {
      if (tcsetattr (STDOUT_FILENO, TCSAFLUSH, &oldtermio) && errno != ENOTTY)
        return -1;
    }
  return 0;
}

int
main ()
{
  term_raw ();
  for (;;)
    {
      int c = fgetc (stdin);
      if (c == EOF)
        break;
      fputc (c, stdout);
    }
  term_unraw ();
  return 0;
}
===============================================================================

With this program, user keystrokes are processed immediately:

$ ./mycat12
World

But now the user is caught: No control characters are interpreted. Newlines
are echoed as Ctrl-M (= Carriage Return), not Carriage Return + Line Feed,
and so on. Let's kill the process:

$ kill `ps aux | fgrep ./mycat12 | awk '{ print $2 }'`

You may also have to restore the tty into the normal modes:

$ stty sane

In summary, this facility exists but is better avoided if you don't want to
program actions for various control characters and if you don't want hate
mails from your users.