Page361

Programming Concepts

Let us begin by understanding some cornerstone programming concepts. As computers have become more powerful and ubiquitous, the process and methods used to create computer software have grown and changed. Keep in mind that one method is not necessarily better than another: As we will see in the next section, high-level languages such as C allow a programmer to write code more quickly than a low-level language such as assembly, but code written in assembly can be far more efficient. Which is better depends on the need of the project.

Machine Code, Source Code, and Assemblers

Machine code (also called machine language) is software that is executed directly by the CPU. Machine code is CPU-dependent; it is a series of 1s and 0s that translate to instructions that are understood by the CPU. Source code is computer programming language instructions that are written in text that must be translated into machine code before execution by the CPU. High-level languages contain English-like instructions such as “printf” (print formatted).

Assembly language is a low-level computer programming language. Assembly language instructions are short mnemonics, such as “ADD,” “SUB” (subtract), and “JMP” (jump), that match to machine language instructions. An assembler converts assembly language into machine language. A disassembler attempts to convert machine language into assembly.

Compilers, Interpreters, and Bytecode

Compilers take source code, such as C or Basic, and compile it into machine code.

Here is an example C program “Hello World”:

int main()
{
    printf("hello, world");
}

A compiler, such as gcc (the GNU Compiler Collection, see http://gcc.gnu.org), translates this high-level language into machine code, and saves the results as an executable (such as “hello-world.exe”). Once compiled, the machine language is executed directly by the CPU. hello-world.exe is compiled once and may then be run countless times. Note that the process of executing (aka running) a program is called runtime.

Interpreted languages differ from compiled languages: interpreted code (such as shell code) is compiled on the fly each time the program is run. Here is an example of a “Hello World” program written in the interpreted scripting language Python (see https://www.python.org):

#!/usr/local/bin/python
print("Hello World!")

This code is saved as “hello-world.py.” Each time it is run, the Python interpreter (located at /usr/local/bin/python in the previous code) translates the Python instructions into machine language. If hello-world.py is run 100 times, it will be compiled 100 times (while hello-world.exe was only compiled once).

Bytecode, such as Java bytecode, is also interpreted code. Bytecode exists as an intermediary form (converted from source code), but still must be converted into machine code before it may run on the CPU. Java Bytecode is platform-independent code that is converted into machine code by the Java Virtual Machine (JVM, see Chapter 4, Domain 3: Security Architecture and Engineering, for more information on Java bytecode).