Eclipse 2017

Last week we were in Oregon to witness the total solar eclipse.

I had always wanted to see one;  this might be my last chance.
I took several consumer cameras, including my smart phone, and took pictures with several of them, but none of those pictures really came out.  (Part of my goal was not to get too engrossed in taking pictures since totality only lasted a little over two minutes.)
Here is one picture at or near totality.  Hard to tell which from the picture;  hard to tell that an eclipse is even occurring:

Here is an accidental selfie of me and the sun shortly after totality.  In the picture it looks like 1-2% but it’s really more like 95-99%.

The one thing I shot that did work out the way I wanted is a video I shot looking at the ground during the eclipse.  I wanted to capture what it looks like on the ground during totality and in this I succeeded.

And if want to see a decent picture of the eclipse (not taken by me) there’s always this one:

My Experience with Start-ups

Recently a friend of mine sent me a link to this CNBC article:  http://www.cnbc.com/2016/08/30/i-got-scammed-by-a-silicon-valley-start-up-commentary.html that tells the sad tale of the messy collapse of a Silicon Valley start-up.  This New York Times article strips away the pseudonyms and gives some additional details:  http://www.nytimes.com/2016/09/01/technology/a-silicon-valley-dream-collapses-in-allegations-of-fraud.html?_r=0

Aside from the allegations of fraud, I think this story is all too common here in Silicon Valley.  After all, I’ve heard that 90% of start-ups fail.  And I’m sure many of them involve non-payment of salary toward the end.  I would like to think that most entrepreneurs are more honest than to engage in the kinds of fraudulent behavior alleged by Ms. Kim in her post.  But as with any activity involving humans, there’s bound to be some bad players.

Anyway I thought I would relate my own  experience with start-ups.  There have been three:

  1.  My first encounter was doing some moonlighting creating a design for s specific interface needed by some start-up (I don’t even remember their name.)  I provided the requested design, they paid me and that was the end of our relationship.  (They offered me a longer-term position, but I was headed for a two-year assignment in Japan, which I considered a much better offer.)  They later failed, but I wasn’t involved by then and don’t know the details.
  2. MS2.  (I don’t mind giving company names since my experiences were positive.)  This company was providing planning software, that was probably intended for other start-ups, but ended up being mainly of interest to more mature and larger companies.  They were doing pretty well and had some customers until the “dot-com crash” happened in 2000.  Our potential customers stopped buying software and we stopped getting new revenue.  Then 9/11 happened and that was the final blow.  One of our largest customers was headquartered in the World Trade Center towers, and the company along with most of its employees were killed in the attack.  I survived two rounds of layoffs and was hit by the third.  I was paid until the end of my employment, and even got two-week’s severance pay.  The management went out of their way to assure me that it wasn’t me, it was the circumstances that led to my separation.  During the months of the company’s collapse, the management was always open and (to my knowledge) honest about our situation.  The company struggled for another year with a skeleton  staff; the forth round of layoffs was the last and took everyone remaining.  I’m sure the investors lost their money (but that’s a risk they understood);  I don’t believe anyone else did.
  3. Roku:  I still work there, as of this writing, and so obviously the company is still viable.  In fact, it’s well known and has a large customer base.  (www.roku.com).  Roku is still funded by venture capital and therefore is still “in start-up mode” but we’re all hoping they will go public one of these days and we’ll all make some money from our options.   They’ve never missed a paycheck, and I believe they’re honest with us.

I would guess my experience with start-ups is more positive than the average, and was certainly better than Ms. Kim’s.  I think she was naive if she didn’t understand the risks of working for a start-up.  I think filing a wage claim against a failing start-up, as she did, is a waste of effort.  The fact that she apparently ended up working for allegedly dishonest management is (I hope) not common and not to be expected, and for that she has my sympathy.

I know people who go from start-up to start-up.  Some of them have been in the situation where their employer runs out of money and can’t pay them.  Most of them probably hope they’ll pick a winner and get rich.  But it is my experience that most of them do it for the satisfaction of building a product and the sheer joy of writing software.  They understand the risks, and they know that when one start-up fails, there will be another one waiting for their talents.

I feel lucky to live and work in Silicon Valley, where so many innovations have been born.  I’m sorry Ms. Kim’s experience wasn’t as positive.

Who Invented the Computer?

“We’ve learned the hard way that between the word ‘first’ and the word ‘computer’ there are about 19 adjectives”
— John Hollar, CEO of the Computer History Museum

The title of this essay is a simple question.  One would think it would have a simple answer, but that doesn’t seem to be the case.  There are some who would say quite strongly that it is this person or that, but there is little agreement as to which person to give the credit.  (The spreadsheet listed at the end of this article includes claims from several contenders’ Wikipedia articles.  These claims are remarkably similar, although each is slightly different, and if read carefully, each is probably true.)

Why is it that there is no consensus on such a recent event?  After all, if I ask, “Who invented the telephone?” or “Who invented the telegraph?”  almost everyone knows the answers:  Alexander Graham Bell and Samuel F. B. Morse.  (As we shall see, even these answers aren’t as clearly correct as we might suppose.)

The computer was arguably invented within the last 100 years.  It shouldn’t be that hard to know the answer.  I’m not asking “who invented the wheel?” after all.

I propose to analyze some of the reasons why this question is hard and take a stab  at my own answer to the question posed in the title.

The First Contender

ENIAC
ENIAC

I grew up believing that the first computer was the ENIAC built in the 1940’s and  invented by John W. Mauchly and J. Presper Eckert, Jr.  [1]  They even filed a patent on their invention.  There are two problems here.  First, their patent was eventually overthrown in court because (according to the court findings) they had taken ideas from an earlier design.  [2]  I would argue that I am asking a historical question not a legal one.  So court decisions, while possibly relevant, are not definitive.

The second, more serious complaint (at least in my own mind),  about the ENIAC as first computer is that the ENIAC is not stored-program.  That is to say the ENIAC was not “programmed” by storing instructions in memory.  Rather the ENIAC was “programmed” by setting a series of dials and switches and plugging in wires somewhat like a telephone switchboard.  Later it did adopt the idea of encoding instructions in numeric form, they were not stored in a read/write memory.  [11]

All modern computers from your Apple Watch to the giant racks of servers at Amazon, Google, Facebook, etc. are stored-program computers.  They execute a series of instructions that are stored in memory, generally the same memory that stores the data  being processed.

So I say, the ENIAC, while a precursor to the modern computer, is not itself a full-fledged computer, in the modern sense of the word.

The Oldest Stored-Program Computing Device

The author posing with a recently built analytical engine at the Computer History Museum
The author posing with a recently built difference engine at the Computer History Museum

To find the first stored-program device we actually must go back much earlier to the Analytical Engine, designed by Charles Babbage ca. 1840.  [3]  This device’s program was contained in a set of punched cards, which can be considered a form of storage.

The three problems with the Analytical engine’s claim to primacy are (1) It is not electronic or even electro-mechanical.  It is entirely mechanical.  (2) It was never actually built. And (3) the program is stored in punched cards and not in the engine’s “store”.  (True, early commercial computers used punched cards for programs as well as data.  But they were input media and were read into memory before they were executed.  The memory itself was reusable; the punched cards (whether Babbage’s or IBM’s) were not.

The von Neumann Architecture

John von Neumann wrote a paper in 1945, entitled “First Draft of a Report on the EDVAC” [4]  that included the idea of storing computer instructions in memory.  It is largely due to that paper and its concept of storing the instructions in memory, that modern computers are often referred to as “von Neumann machines”.  But that, too, is controversial because others, including Mauchly and Eckert had raised that possibility and were collaborating with von Neumann on the “First Draft” mentioned above.

I must add that the “First Draft” is a remarkable paper that described and argued for many aspects of computer design, not only the use of memory to store the program.  In this paper he also argued for the use of binary arithmetic, and the necessity of using an all-electronic (as opposed to electro-mechanical) design in order to achieve reasonable speed of computation.

Define Your Terms

As can be seen from the above examples,  in order to determine who created the first computer, we have to have a common definition of “computer”.

I would argue that the modern computer has all of the following characteristics:

  1. General purpose:  It can solve a variety of types of problems by being “programmed” i.e. given a set of instructions that represent the procedure for solving the problem at hand.  Moreover, the user of the computer can choose which programs to install, and which programs to run.  This is to distinguish from an embedded device, which has the other characteristics described below, but is used for a specific purpose.
  2. Stored program:  The set of instructions is encoded in a numeric form and stored in memory along with the data that is being used to solve the problem.
  3. Electronic:  It is made from electronic elements:  integrated circuits, transistors, or even vacuum tubes.  It is not electro-mechanical, i.e. using relays, nor mechanical (such as Babbage’s analytical engine, or even the abacus).

Of course a computer has other characteristics, but I propose to use the above three as the defining characteristics of a “computer”.

I recognize that some historians and computer scientists would disagree with one or more of the above criteria, but they are the ones I’m using.  I submit that all modern usage of the term “computer” at least implies all three of these qualities.

By applying the above three criteria we can eliminate the candidates already mentioned:  the Analytical Engine because it was mechanical, and the ENIAC because it was not stored program.

But does this definition resolve the question?  Unfortunately, not to everyone’s satisfaction.

Who Came First?

In the 1940s and 1950s many researchers were working on the problems of automatic computation.  These researchers include some of the giants of computer science including not only John von Neumann (mentioned above) but also Alan Turing, and Claude Shannon.  With so many people working on the problem (and generally collaborating with one another) it’s not surprising that the same ideas (such as stored program) arose in the writings of several people at more or less the same time.

It is sometime maddening to try to determine which one deserves the “original” credit.  I recently read a book that says the EDSAC computer was the first computer to use the stored-program idea.  But another article says it was the second–without naming the first.

Perhaps this whole question is an exercise in historical nit picking.  Many ideas arose at about the same time from multiple individuals.  Does it really matter whether the idea of stored-program computers came from von Neumann, or Eckhart and Mauchly, or Turing, all of whom published papers about stored-program computers in the mid 1940s, or even Konrad Zuse  who filed two patents on stored-program computers in 1936–ten years earlier than the “first draft” paper that has caused von Neumann’s name to be associated so strongly with stored-program computers?

Of course it can matter financially, as patent holders can become rich and famous, while also-rans to the patent office are soon forgotten. I argued above that such things are legal rather than historical arguments.

Yet, the outcome of such battles can determine the narrative that becomes history.  At the beginning of this essay I posed the questions of who invented the telephone and the telegraph.  The reason we say that Bell invented the telephone is that he made it to the patent office first.  But there is significant evidence that his ideas were preceded by Antonio Meuchi. [5] And the Morse’s telegraph was preceded by several versions invented in England.  [6]

Reduction to Practice

The phrase, “reduction to practice” is a legal term used in patent law.  It means turning an idea into a working machine (or process, etc.)  Under U.S. patent law it is not necessary to reduce an idea to practice in order to be granted a patent.  In history, I think it fair to give credit to both the originator(s) of an idea as well as those who implement it.

As an example, the idea of storing the program in memory,  can go to von Neuman in his “First Draft” paper.  Even if it is disputed that von Neumann deserves full credit for the stored-program idea, is indisputable that his paper led a number of groups to start working on implementing ideas found in that paper.  The EDVAC computer itself was the specific subject of von Neumann’s paper.  But its implementation is preceded by the EDSAC.  And that is preceded by the Manchester Small-Scale Experimental Machine, nicknamed “Baby”.  [7]  Both the Baby and the EDSAC were directly inspired by von Neumann’s paper, and they both include the stored-program concept.

And the winner is…

So to determine who invented the computer we must deal with primacy, ideas vs. implemtation, and the precise definition of “computer.”

Of all the components of my proposed definition of “computer”, I argue that the “stored program” idea–the concept of storing the computer’s instructions in the computer’s memory is the most important.  It is a unique feature among modern inventions, and makes the computer uniquely powerful.  Just look at the number and variety of “apps” on your smart phone to get a sense of the flexibility that the stored-program idea empowers.  You can switch your computer-in-your-pocket from making a phone call, to showing you the weather, to playing a game, to reading a book, to watching a movie, by simply replacing the program (“app”) that you have stored into memory.

So who came up with the idea of storing computer instructions in memory?  The general ideas was first mentioned  not by John von Neumann, but by Konrad Zuse, ten years earlier, or even Charles Babbage, a hundred years earlier.  But  both Zuse’s and Babbage’s devices stored the program in non-erasable media separate from the the data “store” (to use Babbage’s term).  In terms of storing instructions in the same kind of memory as the data, then the credit does deserve to go to John von Neuuman and his unnamed collaborators.

Of course, the stored-program concept is not the only component of a computer design.  It is not possible to single out one inventor of these broader set of concepts because the ideas came from many people.  I would give credit to at least the following set of individuals for contributing to the development of the first computer:

  • Gotfried Leibniz – Gave structure to the idea of binary arithmetic [8]
  • Charles Babbage – The first to design a programmable computing machine 
  • Ada Lovelace – The first to realize that a computing device (specifically Babbage’s Analytical Engine) could be used to solve any symbolic problem, not just numeric computations.  [10]
  • George Boole – Inventor of the logical algebra that now bears his name and is a fundamental component of all computer hardware and software [9]
  • Alan Turing – Formalized the concepts of an ideal computing machine.
  • John von Neumann – Wrote the “First Draft” describing the design of a computer. [4] 
  • Claude Shannon – Proposed in his masters thesis that computers can be designed from components that follow Boolean Algebra.
  • Konrad Zuse – Designed and built a number of early electro-mechanical computers
  • John Atanasoff – Created early electronic computing devices.  Now credited with many of the ideas that went into the ENIAC.

As for implementation there is actually little dispute that the Manchester Small-Scale Experimental Machine, the “Baby” computer,  is the first stored-program general-purpose electronic computer.  Thus I give the credit for implementing the first computer to Frederic C. Williams, Tom Kilburn and Geoff Tootill, the men to built the “Baby”.

Manchester_baby_head_on
Manchester “Baby” Computer

See this spreadsheet for a list of the devices mentioned in this article and how they measure up to the definition I have given here.

 

References

[1] ENIAC: Celebrating Penn Engineering History,  https://www.seas.upenn.edu/about-seas/eniac/mauchly-eckert.php

[2]  Judge declares the ENIAC patent invalid, October 19, 1973,  http://www.edn.com/electronics-blogs/edn-moments/4398948/Judge-declares-the-ENIAC-patent-invalid–October-19–1973-

[3] The Babbage Engine,  http://www.computerhistory.org/babbage/engines/

[4] First Draft of a Report on the EDVAC by John von Neumann  https://web.archive.org/web/20130314123032/http://qss.stanford.edu/~godfrey/vonNeumann/vnedvac.pdf

[5] Antonio Meucci  http://www.famousscientists.org/antonio-meucci/

[6] The History of the Electric Telegraph and Telegraphy,  http://inventors.about.com/od/tstartinventions/a/telegraph.htm

[7] “Manchester Small-Scale Experimental Machine”, Wikipedia,  https://en.wikipedia.org/wiki/Manchester_Small-Scale_Experimental_Machine#cite_note-1

[8] “Explanation of Binary Arithmetic” in Memoires de l’Academie Royale des Sciences by Gotried Libniz , http://www.leibniz-translations.com/binary.htm

[9] “The Calculus of Logic” in Cambridge and Dublin Mathematical Journal, Vol. III (1848), pp. 183-98 by George Boole,  http://www.maths.tcd.ie/pub/HistMath/People/Boole/CalcLogic/CalcLogic.html

[10] “A Selection and Adaptation From Ada’s Notes found in “Ada, The Enchantress of Numbers,” by Betty Alexandra Toole Ed.D. (Strawberry Press, Mill Valley, CA)”, http://www.agnesscott.edu/lriddle/women/ada-love.htm

[11]  Eniac in Action: Making and Remaking the Modern Computer by Thomas Haugh, Mark Priestly, and Crispin Rope.  (MIT Press, Cambridge, Massachusetts, and London, England, 2016)  (https://smile.amazon.com/ENIAC-Action-Remaking-Computer-Computing-ebook/dp/B01LZCBO85/ref=sr_1_1?s=digital-text&ie=UTF8&qid=1479519175&sr=1-1&keywords=eniac+in+action )

Note:  References (4], [8], [9], and [10]  are all original writings by the creators of the ideas they describe.  I particularly recommend von Neuumann’s “First Draft” [4].

 

Four Years Later

I have been having my PSA tested every six months and am pleased to say it is completely negative, meaning I have no signs of recurrence of the cancer. Since prostate cancer is slow-growing, it will be another six years before I can officially be pronounced “cured”.

Life Cycle of a Linux Program

Introduction

This is an investigation of the life cycle of a program in a Linux system.

Actually, there are two (at least) meanings of “program life cycle”:

  1. The development life cycle (requirements, design, code, test, deploy…)
  2. The execution life cycle of a program when it is run.

We will discuss the latter.  How is a new program begun?  How is it ended?  (What happens in between is largely up to the program itself.)

The discussion that follows assumes you have some familiarity with C programming, since our sample program was written in C and we will be examining some C code.  It also assumes you have some familiarity with the Linux application programming interface.

It also assumes you have some familiarity with programming and running in a Linux environment.

All source file references are relative to the root of the run-time library source tree and are specific to Fedora 17, the system from which I got this information.

This article describes what happens on an x86 (aka Intel IA-32) processor.  For other processor types, the machine instructions will be different but the concepts are the same.

Sample Program

Here is the program we will be investigating. It is the familiar “hello world” program:

#include <stdio.h>

int main()
{
    printf ("Hello, World!\n");
}

As you can see this program displays a simple message to standard output.

As you can also see this program uses one C library call, printf Typically, such functions are not linked into the executable file during the compilation process. Instead they are linked (i.e. added to the program) at run time,  once the program is started. The library code comes from a separate file, “libc.so”.  (The actual name on my system includes a version number:  libc-2.19.so.)  More complex programs may have many more such shared libaries, each coming from a separate “.so” file. Adding these libraries to your program is known as “late binding” or “run-time binding”. We’ll see how this is done momentarily.

Compiling the Program

We used the following gcc command to build the program:

$ gcc -o hello hello.c

This command creates the executable file, “hello”. This file contains the machine-language code for our program which we can examine with the objdump command:

$ objdump -d hello

hello:     file format elf32-i386

...

Disassembly of section .text:

 08048350 <_start>:
 8048350:   31 ed                   xor    %ebp,%ebp
 8048352:   5e                      pop    %esi
 8048353:   89 e1                   mov    %esp,%ecx
 8048355:   83 e4 f0                and    $0xfffffff0,%esp
 8048358:   50                      push   %eax
 8048359:   54                      push   %esp
 804835a:   52                      push   %edx
 804835b:   68 e0 84 04 08          push   $0x80484e0
 8048360:   68 70 84 04 08          push   $0x8048470
 8048365:   51                      push   %ecx
 8048366:   56                      push   %esi
 8048367:   68 4d 84 04 08          push   $0x804844d
 804836c:   e8 cf ff ff ff          call   8048340 <__libc_start_main@plt>
 8048371:   f4                      hlt    

...

 0804844d <main>:
 804844d:   55                      push   %ebp
 804844e:   89 e5                   mov    %esp,%ebp
 8048450:   83 e4 f0                and    $0xfffffff0,%esp
 8048453:   83 ec 10                sub    $0x10,%esp
 8048456:   c7 04 24 00 85 04 08    movl   $0x8048500,(%esp)
 804845d:   e8 be fe ff ff          call   8048320 <puts@plt>
 8048462:   c9                      leave  
 8048463:   c3                      ret

(<main> is the main function seen in our source file, hello.c, shown at the beginning of this article. <_start> is the program startup code that we’ll see again shortly.) There is other code as well, which I have deleted from the above output for clarity.

Note that the compiler substituted “puts” for “printf” as an optimization.

In addition to the machine-language instructions for the program, the executable file includes information about the functions to be loaded at run time from the .so libraries:

$ objdump -T  hello

hello:     file format elf32-i386

DYNAMIC SYMBOL TABLE:
00000000      DF *UND*  00000000  GLIBC_2.0   puts
00000000  w   D  *UND*  00000000              __gmon_start__
00000000      DF *UND*  00000000  GLIBC_2.0   __libc_start_main
080484fc g    DO .rodata    00000004  Base        _IO_stdin_used

We see puts as well as __libc_start_main, which we will encounter again shortly, and some other functions used internally by the C run-time library.

Running the Program — the shell

Normally the program would be started from a shell:

$ ./hello
Hello, World!

Every program needs its own process to run in.  Therefore the shell will fork a child process:

while ((pid = fork ()) < 0 && errno == EAGAIN && forksleep < FORKSLEEP_MAX)
{
     ...handle EAGAIN error
}

The shell, running in the child process,  will then call the execve system function to start the program:


execve (command, args, env);

Of course, not every program is started from a shell, but whatever program is used to start our program, the program that starts our program will very probably call fork and will certainly call one of the exec family of system calls.

execve() System Call — the Kernel

In the kernel, the execve system call will create a new memory space for the new program and map the program file into memory.

What do we mean by “map the program file into memory”?  In the early days of computers, before the use of virtual memory, programs were actually “loaded” into memory, meaning the entire program file was copied from some storage device such disk, tape, or cards, into memory.  For a large program this could take considerable time.

With the use of virtual memory it is only necessary for the kernel to construct data structures that specify where the various parts of the program should go in memory and where they should come from on disk.  With this mechanism, only the portions of the program that are needed are copied into memory and only once they are needed.  Some portions (such as error recovery routines) may never be needed and are thus never loaded into memory.

In order to map the .so library files, mentioned above, into memory the kernel maps one .so file , often called “ld.so” and referred to as “the dynamic loader” into the process’s memory.

(For more details about how the kernel handles the execve system call, see Understanding the Linux Kernel, 3rd Edition,  by Daniel P. Bover, Chapter 20.)

The kernel then begins running the new program, starting with code in ld.so.  This allows the loading of the additional .so files to be done from within ld.so in the user space instead of by the kernel.

How does the kernel actually transfer control to ld.so?  Normally when the kernel finishes a system call it goes through a return sequence which concludes with an iret (interrupt return) instruction or a sysexit instruction.  That instruction restores the process’s next instruction address to the IP (Instruction Pointer) register so that the next time the CPU fetches an instruction it is from the instruction following the system call.

In this case however, the program that executed the execve is no longer in the process’s memory:  it has been replaced by the new program.  So the kernel “diddles” with the stack where the return address was stored such that when the iret or sysexit instruction is executed, control “returns” to the first instruction of the new program which in this case is the instruction labeled _start within ld.so.

The Dyamic Loader — the Start of user-mode execution

ld.so begins with the following assembly language code (defined as the RTLD_START macro in sysdeps/i386/dl-machine.h.

#define RTLD_START asm (“\n\

_start:\n\
# Note that _dl_start gets the parameter in %eax.\n\
movl %esp, %eax\n\
call _dl_start\n\
_dl_start_user:\n\
# Save the user entry point address in %edi.\n\
movl %eax, %edi\n\
# Point %ebx at the GOT.\n\
call 0b\n\
addl $_GLOBAL_OFFSET_TABLE_, %ebx\n\
# See if we were run as a command with the executable file\n\
# name as an extra leading argument.\n\
movl _dl_skip_args@GOTOFF(%ebx), %eax\n\
# Pop the original argument count.\n\
popl %edx\n\
# Adjust the stack pointer to skip _dl_skip_args words.\n\
leal (%esp,%eax,4), %esp\n\
# Subtract _dl_skip_args from argc.\n\
subl %eax, %edx\n\
# Push argc back on the stack.\n\
push %edx\n\
# The special initializer gets called with the stack just\n\
# as the application’s entry point will see it; it can\n\
# switch stacks if it moves these contents over.\n\
” RTLD_START_SPECIAL_INIT “\n\
# Load the parameters again.\n\
# (eax, edx, ecx, *–esp) = (_dl_loaded, argc, argv, envp)\n\
movl _rtld_local@GOTOFF(%ebx), %eax\n\
leal 8(%esp,%edx,4), %esi\n\
leal 4(%esp), %ecx\n\
movl %esp, %ebp\n\
# Make sure _dl_init is run with 16 byte aligned stack.\n\
andl $-16, %esp\n\
pushl %eax\n\
pushl %eax\n\
pushl %ebp\n\
pushl %esi\n\
# Clear %ebp, so that even constructors have terminated backchain.\n\
xorl %ebp, %ebp\n\
# Call the function to run the initializers.\n\
call _dl_init_internal@PLT\n\
# Pass our finalizer function to the user in %edx, as per ELF ABI.\n\
leal _dl_fini@GOTOFF(%ebx), %edx\n\
# Restore %esp _start expects.\n\
movl (%esp), %esp\n\
# Jump to the user’s entry point.\n\
jmp *%edi\n\

This code begins by calling _dl_start which is written in C and is in debug/glibc-2.15-a316c1f/elf/rtld.c.

We can use strace (which traces system calls) to follow this startup code.  Here is the output from that program:

$ strace ./hello
...
brk(0)                                  = 0x85b0000

The call to brk(0) is a “trick” to determine the location of the program’s heap.  (The heap is the memory area used for dynamic memory by the program.)

access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)

This call to access looks for a file called “/etc/ld.so.nohwcap” but the call returns the error, ENOENT, meaning the file does not exist.

mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7704000

This call to mmap2 requests 8K of additional memory from the kernel which maps it into address 0xb7704000.

Note: There is a Linux security feature, called “address space layout randomization”, which is designed to deter certain forms of hacking by making it difficult to predict where code will reside. The result is that memory is allocated in different locations each time the program is run. For example, the above memory area was allocated by the kernel at 0xb7704000. However, on a previous run of the same program on the same system, this memory had been allocated at 0xb77b7000.

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)

Looking for another file, “/etc/ld.so.preload” which also does not exist.

open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=97882, ...}) = 0
mmap2(NULL, 97882, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76ec000
close(3)                                = 0

The above four system calls result in mapping the file /etc/ld.so.cache into memory at 0xb76ec000.

  • open() opens the file.
  • fstat64() returns, among other things, the size of the file (which will be used by the mmap2 call).
  • mmap2() maps the file into memory.
  • and close() closes the file.

ld.so.cache is a file that contains information about the location of system libraries within the file system.  This file, which was created by the ldconfig utility program, is used to speed up the locating of standard shared libraries.

open("/lib/i386-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\233\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1754876, ...}) = 0
mmap2(NULL, 1759868, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb753e000
mmap2(0xb76e6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a8000) = 0xb76e6000
mmap2(0xb76e9000, 10876, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e9000
close(3)                                = 0

The above steps are where the C run-time library, libc.so, is actually mapped into memory. There are three calls to mmap2 for three portions of the file: executable instructions, constant data, and global variable data.

mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb753d000

This system call adds an additional memory area immediately above the area used by the main program. This area will be used for various housekeeping information about the program.

set_thread_area({entry_number:-1 -> 6, base_addr:0xb753d940, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0

This call to set_thread_area tells the kernel to set up a TLS (Thread Local Storage) data area. Note that the address is within the memory area most recently allocated.

mprotect(0xb76e6000, 8192, PROT_READ)   = 0
mprotect(0x8049000, 4096, PROT_READ)    = 0
mprotect(0xb7727000, 4096, PROT_READ)   = 0

These mprotect statements change the memory protection to read-only for lib.so constants, the program’s constants, and ld.so’s own constants,
respectively.

munmap(0xb76ec000, 97882)               = 0

The memory area previously mapped from ld.so.cache is now removed from memory by munmap.

At this point the shared libaries are mapped into memory.  (In our case, there is only one, libc.so)

Control returns from _ld_start to _start which falls through to  _dl_user_start which continues with this code previously shown from the RTLD_START macro:

# Pass our finalizer function to the user in %edx, as per ELF ABI.\n\
leal _dl_fini@GOTOFF(%ebx), %edx\n\
# Restore %esp _start expects.\n\
movl (%esp), %esp\n\
# Jump to the user’s entry point.\n\
jmp *%edi\n\

The jmp (jump) instruction at the end of the above code transfers to the entry point of our program.  Our program’s entry point was passed from the kernel in the %eax register.  It was previously moved to the %edi register by this code:

# Save the user entry point address in %edi.\n\
movl %eax, %edi\n\

With the jmp instruction, our program now begins at a routine called _start.  (Note this is not the same as the _start function in ld.so; each instance of _start is defined locally within its own module.)

Beginning our Program Code — the C Run-time Library

Upon entry to the program (at _start) the following information has been provided to the program:

  • The command-line arguments and environment variables are loaded into the top end of the stack memory area.
  • The stack pointer is set just below the above data.
  • argc and argv are then pushed onto the stack.   These are the count and address of the command line arguments, respectively.

(The above three steps were done by the kernel as part of the execve processing.)

Our program begins with:


0x8048ba8 <_start>      xor    %ebp,%ebp

The xor instruction shown above (the first instruction of the program) sets the %ebp register to zero. This register is used to keep track of stack frames used by C functions, and setting this value to zero means this is the end of the set of stack frames.

0x8048baa <_start+2>    pop    %esi 
0x8048bab <_start+3>    mov    %esp,%ecx

The above instructions get argc and argv from the stack to the %esi and %ecx registers respectively.

0x8048bad <_start+5>    and    $0xfffffff0,%esp

This makes sure the stack pointer is on a word boundary, i.e. on an address divisible by 16.

0x8048bb0 <_start+8>    push   %eax 
0x8048bb1 <_start+9>    push   %esp <stack end>
0x8048bb2 <_start+10>   push   %edx  <_dl_fini> [from ld.so]
0x8048bb3 <_start+11>   push   $0x8049340 <__libc_csu_fini>
0x8048bb8 <_start+16>   push   $0x80492a0 <__libc_csu_init>
0x8048bbd <_start+21>   push   %ecx  <argv> [saved above]
0x8048bbe <_start+22>   push   %esi  <argc> [saved above]
0x8048bbf <_start+23>   push   $0x8048ce0 <main>
0x8048bc4 <_start+28>   call   0x8048d00 <__libc_start_main>

The above instructions push the arguments for the subsequent function call onto the stack, and then call __libc_start_main, the first C-language code in the program.

0x8048bc9 <_start+33>   hlt

This instruction would be executed if __libc_start_main returned to its caller, but that should never happen. If it did, hlt (halt) is a privileged instruction and will cause the program to fail.

The following code is from debug/glibc-2.15-a316c1f/csu/libc-start.c.
It shows the entry into __libc_start_main.

/* Note: the fini parameter is ignored here for shared library.  It
   is registered with __cxa_atexit.  This had the disadvantage that
   finalizers were called in more than one place.  */
STATIC int
LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
         int argc, char *__unbounded *__unbounded ubp_av,
#ifdef LIBC_START_MAIN_AUXVEC_ARG
         ElfW(auxv_t) *__unbounded auxvec,
#endif
         __typeof (main) init,
         void (*fini) (void),
         void (*rtld_fini) (void), void *__unbounded stack_end)

__libc_start_main (main=0x8048430 <main>, argc=1, ubp_av=0xbfffefa4, 
    init=0x8048450 <__libc_csu_init>, fini=0x80484c0 <__libc_csu_fini>, 
    rtld_fini=0x42bfaa90 <_dl_fini>, stack_end=0xbfffef9c) at libc-start.c:96

At this point a number of functions are called (not shown) that initialize the C run-time environment.

Beginning the main() Function

Next we see the following code:

 /* Nothing fancy, just call the function. */
 result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
#endif

 exit (result);

The above call to main is where the user’s code is started. (All C programs begin with a function named “main”.)

At this point our program’s code, beginning at main() starts to execute at last.  Our program does whatever it was programmed to do (in our case print a message as we saw at the top of this article).

Program exit — Back to the C Run-time Library

The C program can terminate either by calling “exit” or by returning from the main function. Our example program (shown at the top of this article) does the latter. In that case the following code is executed from the run-time library after the return from main:

258       exit (result);

So either our program calls exit or the run-time library calls it for us after main returns.

Many people think exit is a system call, but it is actually a C library function in debug/glibc-2.15-a316c1f/stdlib/exit.c:

98      exit (int status)
99      {           
100       __run_exit_handlers (status, &__exit_funcs, true);
101     }

As you can see, exit is very simple; it just calls __run_exit_handlers which looks like this:

/* Call all functions registered with `atexit' and `on_exit',
   in the reverse of the order in which they were registered
   perform stdio cleanup, and terminate program execution with STATUS.  */
void
attribute_hidden
__run_exit_handlers (int status, struct exit_function_list **listp,
             bool run_list_atexit)
{
  /* We do it this way to handle recursive calls to exit () made by
     the functions registered with `atexit' and `on_exit'. We call
     everyone on the list and use the status value in the last
     exit (). */
...
  _exit (status);
}

(The omitted code simply loops through calling a list of functions to handle process cleanup before it exits.)

The _exit() System Call — Back to the Kernel

As you can see above, the last thing this function does is call _exit, which really is a system call, defined in debug/glibc-2.15-a316c1f/sysdeps/unix/sysv/linux/i386/_exit.S:

    .text
    .type   _exit,@function
    .global _exit
_exit:
    movl    4(%esp), %ebx

    /* Try the new syscall first.  */
#ifdef __NR_exit_group
    movl    $__NR_exit_group, %eax
    ENTER_KERNEL
#endif

    /* Not available.  Now the old one.  */
    movl    $__NR_exit, %eax
    /* Don't bother using ENTER_KERNEL here.  If the exit_group
       syscall is not available AT_SYSINFO isn't either.  */
    int     $0x80

    /* This must not fail.  Be sure we don't return.  */
    hlt
    .size   _exit,.-_exit

This code places the intended system call number (NR_exit_group) into the %eax register to pass it to the kernel.  It then invokes ENTER_KERNEL, which is an assembler macro that expands to this machine-language code:

0x42cc57ad <_exit+9>            call   *%gs:0x10

which calls this machine-language code, the typical sequence for calling a system service in the kernel.

0xb7fff414 <__kernel_vsyscall>          push   %ecx 
0xb7fff415 <__kernel_vsyscall+1>        push   %edx
0xb7fff416 <__kernel_vsyscall+2>        push   %ebp
0xb7fff417 <__kernel_vsyscall+3>        mov    %esp,%ebp
0xb7fff419 <__kernel_vsyscall+5>        sysenter

The sysenter instruction will cause a transition to kernel mode.

Usually the kernel returns to the user’s program when it is finished, but in the case of _exit the kernel does not return. Instead, it deletes the memory space occupied by our program. The process it was running in will be marked “defunct”.

waitpid() — Back to the Shell

The process will be completely deleted once its parent (usually the shell) gathers the defunct process’s completion status:

  pid = waitpid (-1, &status, waitpid_flags);

Once waitpid is called by the process’s parent, both the child process and the program it was running are gone from memory.

Thus the program’s execution life cycle is ended.

Other Ways a Program May Terminate

In addition to calling exit to terminate the process,, as was described above, there are alternative ways in which a program may terminate:

  • The process can be abnormally terminated due to an error or action by another program or by the user.  (In that case the C library cleanup code by the exit() call will not be performed.)
  • The program can issue another execve call which will start a new program in place of the current program.

Conclusion

Here is a summary of the steps described in this article. .

  1. Frequently, but not always, some process (often a shell) forks a new process for the new program.
  2. In the process that is to run the new program, the existing program calls execve.
  3. The kernel releases the old program’s address space and begins building a new address space.
  4. The kernel maps the program into the new address space.
  5. If the program uses dynamic libraries,  then the kernel maps ld.so into the new address space.
  6. If the program uses dynamic libraries, the kernel gives control to ld.so within the process context of the new program.  ld.so then causes any shared libraries to be mapped into memory.
  7. ld.so then transfers control to the new program for the first time at the label, _start.
  8. _start saves some input parameters from the system and then call _libc_start_main which initializes the C run time library.
  9. _libc_start_c_main calls main, the beginning of the application program code.
  10. The program runs until:
    1. it terminates by calling exit or returning from the main function.  (Continue to step 11.)
    2. it calls execve to begin a new program.  (Go back to step 2.)
    3. The process is abnormally terminated.  (Skip to step 13.)
  11. The C run-time library cleans up.
  12. The run-time library calls _exit, the system call that terminates the process.
  13. The kernel releases the memory and other resources of the just-terminated process.
  14. The parent process issues a wait call for the process that just terminated.
  15. The kernel releases the terminated process’s task structure, the last remnant of the program.

 

My Public Key


-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1


mQINBFV+I60BEACYIhFTnfIQYdv7+t+e9qljOh7gnA3uMp9rkYt8mK+mIPtPUyDk
19hYXX8Ub5mcFEvu+Mp3cEDtYBqRSpIQ3sOBz+KXZzx0+bcBiyslW5Ir1S3NDgVm
rDATks1OZLdWJQGgd598OdQRUkFUVcUrbAtqICjAqaGWTVJc+5Awxf1+TxxRmFJ3
ouLGd6HC6ev2/iin8dqlp3IT2N1jGyzU+PAimahY0dzlJ77Ak9mbRtQ0YdAXW9U7
ihA6QAD0arn70DrWcCamC3AmUcjHuMjq+4Dve/wJ4wwW+oWGHQkbbArTwazHJXV9
MS/MH5k6g5M/kWutWnueGfuGo2KZIhf/NByOPVrY6+CB1PdKZrZrPVcWnoVN7aBI
8BosiOwbY+Rv2D4KkuJhmME8Gcewz4MNDsXuAYLkCaMuxZALPF8+0a46oyxKRWbD
dRFKaeiwar+6uXmkF5mYUuMe9zFjBEnJyxS6+Wxj0XEx1ZUSSoYaVYA3daJ0NUT+
0B3CAQJzw7C64FuCBdcnq1T1ZIJFcqKlcvIVYLlI+RnuLo0L5LUFn3/AEt8Y2I/D
HbhvDh+TidvN2qLKjjqcHXVxLu6zg2POQdiiDI8x/25QfVXN9eB9ZRs8jATLgx5i
foEgZ4jmzPzmEjDibwjp7X33CJf8kc1O36RAH7Q99Nz2Ae3Uky9DpUtLAQARAQAB
tCNHbGVubiBTdG9yeSA8Z2xlbm4uc3RvcnlAZ21haWwuY29tPokCOAQTAQIAIgUC
VX4jrQIbAwYLCQgHAwIGFQgCCQoLBBYCAwECHgECF4AACgkQ64t7wglUDvRn5A//
ZGiYqNE9QYCUKCUZx6atfbZ2FxcxGqIbV8GwaP6F6cFvpwTLOcWfc72PZMYiUh5C
cmX+rtTAswz7RhxndzX7qQqI5rnLz8PPggZ0wiLsEb+LMyQ0wB5vNKaC0ouSqDUI
MGSMGsqRa69KZLyxUxSu6HK0oqPdviwntM8wH1zXORVvDjgcuWuXhiMpDXoKr9NC
LX8KQgA9uzj1qIeosLnYJxwtAGOHd21jjsTi/pN/rBvUyHO8WMf4YYwoCZvnI0+w
P6ySIN09wyQCq4gxZgme7FJw+DN+doPtFihQ3gX3kgZ0S2Q0mL/e7UWBAHJUGrll
w61FZd1EtzAr5/uzratGtHTxdGzVmOj0WLTSHdgWdiDHVstq4SwA+wE1VHJNVwui
/ixT1T7qh1LIypW9M7xWV9mqxsKsfKIr+3mFTP4vPQ1FvwtsLbvSXsJWnKoOKxKg
kxcD9+SvZ9snri+sn7MUU6vokEkoCwSyj7PDVahvuh2Q9kXxvaQL/ESGsXeva3W+
XAjW/WE9BWQRsoU5sB9JO9u4YTQoCJvIUzGHdp9XNbbkY3TFumY7PyCDzijXERVE
ukKcJ/uabZfRhwEVkcRrNw+dUgoH8xr4HTIbg8P3mFMCeYJDOKOZAmcYENA8R7Fm
2BCW3Lw7Us/tqyQgKDAw8+dgYLXM2FhX9Vf+TxSSJRS5Ag0EVX4jrQEQAMnJaSNb
JrEGV+pQ2rBIRquVVPJg3aihKV0JJ0CvjBX94hqG+oaevmI/vyIKCMWKjn/REFR3
jBi6AIqktcrDXrHzAk0eA7NU12MCP0h41mFi38mm8onConZBLtThmKuKgaqTHWtB
zdMsgrnmbBlnt05yjhP0CfgO0SF4BTrZ7KLGAzhYwXVRGzJxttH39Ntr1xDERgfK
5R3CM717vbzZv33Z4skwvZ5qTPrjB9eODBZ0GoadF54hzhIvPRA9UlA51AQZMgEl
3n88fJqpfX9SdZL4+vJCKPI71w+BVAGw152HSQh5gK8RS7N98mMmZrKRu0tOuEKo
dYITddfgO2PbWQhRGCys6YOaXZIvjIbJj4OghFIlDDFRvP6bQfriNJYITFa31TLu
oVkDX+fRgE/lqRD8WX4XcDg5Tr8uDclDmQYUy9O/uWsfeEp40iH2IESdvl9AS34q
AtSqq89NXgQVq4hGhvMnExTrPDDulsadZSsHfYSLViYWhWimLVimLklJyDM1eQW/
OvoxH/LB+baEVgi0nd+64Qef7ZjFIH75rRqxOJA/6SyAlVcDbnstS6zuTm3xhl+K
DKmhcAtdLxCZTvTr8+tpJRcNKSjl4MJiNRPncNTcOlJJ397uf6W05AFNJtGtc0yB
Ajk+J3/l+yX6DSqrrdtDWjudxgi9DYGaU2T7ABEBAAGJAh8EGAECAAkFAlV+I60C
GwwACgkQ64t7wglUDvTeug/+NJypsvMGKVSQjqtcplZqwITBoko/PGRGM3yne3q4
GMvPtSggwOHpUob7/5KLD3eoRmVbUACUJ1PXlK1nWdK3meFQ/rEjixlrjaxnkYek
gjyh/eH5PQ8RHVaH2HLfWa8Ht20fsL5ILSqrLnhFSMZo37gBlJOh1H71eiu/FkVL
4Dmq+qYexU+KII+x2v3h3dFEU0k/qBILr/tH6W6Q3ERqk2MAnVxE3Xb7/Apyqzeg
AfbZfz9YriXwUyMkIkPXF++AFCJNyarm8XR9sg4dtrC5zIQV1Elxb1SKfRhdnmxK
7/j8rf2+ngTewfBHA11LVpuyZ9tE7eoojE6Nol4JHM6eZcaPf3xiarNzs1jNb5lL
4mfHPCfiqWdC0Jz7WqlF93Offr4Dbti2SJ0+W74Zy6BtnPYbYBFRgSdykCorbnfV
MTTWX526Kn8smLL5lnylXCXSqBj4lwUaHy7PUcAMRqC/8ot1HCYospGm0tGXMrgd
Hmh/L7yWuJRdUNQsKNvAJWNh8lISz2rx5Y7nVAB44GlSPd1vjgg4f4WH0zFoNO+u
hWi9sPdyTKg+0KZEaTfcbzERLJPAxFJGxXWPAcHjByQMPKr+ybWLLmykvtVKh6tj
g604vgxKxFKdyVrfogVcZcfnDDolbLM1lKejVQ0LA97WW1XuyHpT2irD/kk5B+hq
V5Q=
=JzXU
-----END PGP PUBLIC KEY BLOCK-----

IBM 1401 Resurrected

IMG_0471

The IBM 1401 Computer was one of the first computer systems I worked on.

Among other things, it was the computer used at that time by the U.S. Army computer-programming school which I attended in the fall of 1968.  We were told that whoever finished first in the class would have their choice of next duty stations.  I was the one who came in first and chose the Presidio of San Francisco as the closest option to my pre-army home (Los Angeles).

It was in San Francisco that I met and married my wife.  I told this story recently to my younger daughter explaining to her that if it weren’t for the IBM 1401 she wouldn’t be here.

The Computer History Museum just recently unveiled a reconstructed IBM 1401.  I attended that unveiling, and shot some video which you can see here.

More details about the 1401 and my army experience with it can be found in my computer memoirs.

Shakespeare on Depression

Several years ago (2006, it turns out) I wrote an essay on “Shakespeare on Depression” which quoted some lines from Hamlet.

I have long had the idea making an audio recording of that essay with a real Shakespearan actor reading the lines from Hamlet.
The stars have aligned.  I have a new Macbook which is better at capturing and editing audio.  I’in the process of watching Kenneth Branagh’s version of Hamlet (4 hours–too much for one sitting).  This allowed me to identify where in the movie the quoted lines are.
Then all I needed was the original essay.  I couldn’t find it. Eventually I found it in an archive of my old web site.  So here it is:  the original essay; the audio form and even a Youtube link to the main quoted section.

Kindle “Enhanced” Books – Availability by Devices

photo

As an avid Kindle user (disclosure:  I used to work for Amazon on Kindle development) I was intrigued recently to discover a book that includes full-motion video content.  As an avid collector of tech toys I was curious about what devices this video content would work on.

The title of the book is Dirty Wars and its author, Jeramy Schahill, both wrote a book and produced a documentary of the same name about American covert operations.  The “enhancements” to this book consist of the trailer and selected scenes from the documentary.

This is not a review of that book.  I’ve barely started reading it; I have yet to form an opinion.  Rather this describes on what devices the video is available.

I first downloaded the book to my Kindle Fire, since obviously the book’s “enhancements,” i.e. the video, would work there.  Once downloaded, I started reading until I hit the first video, which turns out to be the movie trailer.  It’s located in the book at the end of the introduction.   Once I hit the video I then downloaded the book to a number of other devices to see if it would work there.  Here’s the result.

Kindle Fire:    Works (as expected).

iPad running iOS 7:  Works, although the video is in a box, not full screen.

iPhone running iOS 6:  Works (full screen).

Kindle Paperwhite:  Extended content doesn’t work.  That’s expected.  eInk technology is not capable of displaying full-motion video.  The basic text of the book is available. What I also noticed is there’s no indication that full motion video is or would be present.  It is completely invisible to the read on this type of device.

Android tablet (Jelly Bean engineering build):  Same as the Paperwhite.  This is a surprise, as both the device and the software environment are capable of supporting video.  (After all, the Kindle Fire is itself based on Android.)  As on the Kindle eInk device, there is no indication that the video is even present.

Cloud Reader (running on Chrome browser on Ubuntu 10.04):  Error:  “This title is not available on Kindle Cloud Reader.”

Cloud Reader (running on Safari browser on OS X Mountain Lion):  Same as above.

Kindle for Windows 8 App:  Error:  “This title is not available on Kindle for Windows 8”.

Conclusions:  These enhanced books work as well as possible on Kindle devices and iOS devices.  Not so much on the web or on Windows.  My main take away is not to read such books on Andorid devices (other than the Kindle Fire) or on eInk devices, as I would miss the extended content without any warning.