How Computer Viruses Work

on October 10, 2010 0 comments
Computer viruses are well-named: their behavior bears a striking resemblance to how real-life biological viruses work, and the ways of dealing with them can be quite similar as well! A biological virus infects a host (a creature), using it as a vehicle for life. It reproduces rapidly; one of its primary goals is to spread to other creatures and thereby perpetuate itself. It also tends to move from person to person and eventually, months later, comes back to reinfect the same people again.

Computer viruses work in the same way, including the way they tend to go around from PC to PC and then occasionally return to reinfect months or years later (in fact, there are viruses that have been circulating around since the early days of the PC that, thanks to backward compatibility, still "run" on the latest machines). Computer viruses however, differ from biological viruses in several ways: they are man-made for one thing, and they are easier to get rid of. Also, biological viruses cause symptoms as a side-effect, they do not exist to make people miserable. Most computer viruses do exist specifically to make people miserable.

There are three major types of viruses, each very different from the other. Of course, there are many subcategories within each group as well (and to make matters worse, there are virus hoaxes to confuse the issue):
  • Boot Sector Infectors: Also sometimes called boot record infectors, system viruses, or boot viruses, these programs attack the vulnerable boot program that is stored on every bootable floppy disk or hard disk. This code is executed by the system when the PC is started up, making it a juicy target for virus writers: by installing themselves here they guarantee that their code will be executed whenever the system is started up, giving them full control over the system to do what they wish. They are spread most commonly through infected bootable floppy disks.
  • File Infectors: These viruses directly attack and modify program files, which are usually .EXE or .COM files. When the program is run, the virus executes and does whatever it wants to do. Usually it loads itself into memory and waits for a trigger to find and infect other program files. These viruses are commonly spread through infected floppy disks, over networks, and over the Internet.
  • Macro Viruses: The newest type of virus, these clever programs make use of the built-in programming languages in popular programs such as Microsoft Word and Microsoft Excel. These programs allow users to create programs that automate tasks, called macros. As the macro languages have become more powerful, virus writers have created malevolent macros that, when opened unwittingly, duplicate themselves into other documents and spread just like a conventional virus would. These programs can cause just as much damage as regular viruses, despite the fact that they are very different: regular viruses are low-level machine language programs, while macro viruses are actually high-level interpreted BASIC programs! The most common type of macro virus right now infects Microsoft Word documents.
As time goes on and virus writers get more clever, new strains and variations show up. For example, there are some viruses, termed multipartite, that infect both boot sectors and files. There are some things that are fairly universal, however: a virus is a program, and it therefore can only exist in a form that allows it to be executed as a program by the PC in some form. This means that, in general, viruses cannot exist in data files.

There are a lot of myths about viruses that infect graphics pictures or email messages or recipe files. Just remember that a virus cannot do anything unless you run it, so unless you have a PC that can execute pictures or email messages somehow, it's doubtful that these can contain a virus. (Of course, you can have a virus in a program that is attached to an email message. But here again, this is a program, just in encoded form.)

There are also now viruses that target the BIOS program stored on the motherboard, on systems that employ a flash BIOS. These viruses can overwrite the system BIOS program and leave the PC in an unbootable state. (Some motherboards have jumpers that you can set to disable BIOS flashing except when you want to do it.)

Here is the general way that viruses work:
  1. An infected program is run. This is either a program file (in the case of a file-infecting virus) or a boot sector program at boot time. In the case of a Microsoft Word document the virus can be activated as soon as the document that contains it is opened for reading within Microsoft Word. If the "NORMAL.DOT" document template is infected (and this is the most common target of these viruses) then the virus may be activated as soon as Microsoft Word is started up.
  2. The infected program has been modified so that instead of the proper code running, the virus code runs instead. This is usually done by the virus modifying the first few instructions to "jump" to where the virus code is stored. The virus code begins to execute.
  3. The virus code becomes active and takes control of the PC. There are two ways that a virus will behave when it is run: direct-action viruses will immediately execute, often seeking other programs to infect and/or exhibiting whatever other possibly malicious behavior their author coded into them. Many file-infector viruses are direct-action. In contrast, memory-resident viruses don't do anything immediately; they load themselves into memory and wait for a triggering event that will cause them to "act". Many file infectors and all boot infectors do this (boot infectors have to become memory resident, because at the time they are executed the system is just starting up and there isn't that much "interesting" for them to do immediately.)
  4. What exactly the virus does depends on what the virus is written to do. Their primary goals however include replication and spreading, so viruses will generally search for new targets that they can infect. For example, a boot sector virus will attempt to install itself on hard disks or floppy disks that it finds in the system. File infectors may stay in memory and look for programs being run that they can target for infection.
  5. "Malevolent" viruses that damage files or wreak havoc in other ways will often act on triggers. There are viruses that will only activate on particular days of the year (such as the infamous "Friday the 13th"), or act randomly, say, deleting a file every 8th time they are run. Some viruses do nothing other than trying to maximize their own infection to as many files and systems as possible.
As virus authors have become more "creative", they have devised increasingly more sophisticated viruses that work in different ways. In particular, newer viruses get smarter and smarter in avoiding detection. In most cases these viruses are not necessarily more hazardous than older ones, but they are harder to detect and remove using anti-virus software. Some of the tricks that authors use:
  • Polymorphing: Some viruses are designed so that each time they infect, their appearance and size changes. These thwart simplistic virus scanners that look for predefined patterns and make detection much more difficult.
  • Stealth: A stealth virus actively hides the changes it has made to the hard disk so that it appears that it has not infected the system. For example, a file infector might stay memory resident and misreport the size of infected files so they don't appear to be infected. Boot sector viruses can trap attempts to read the boot sector and return forged data that makes them appear to be "clean".
  • Disassembly Protection: Many newer viruses are designed using programming tricks that make them hard to disassemble (the process of interpreting the code into a form that is easier to analyze so that the virus can be combated.)
  • Directory Viruses: Some viruses now seek to avoid detection by avoiding modifying the file they infect directly. Instead, they change the cluster pointer in the directory entry of the file to point to the virus instead of the actual program. The virus runs its code and then executes the target program afterwards. The virus is thus able to "infect" the program without actually modifying it.

How Bits and Bytes Work

If you have used a computer for more than five minutes, then you have heard the words bits and bytes. Both RAM and hard disk capacities are measured in bytes, as are file sizes when you examine them in a file viewer.
You might hear an advertisement that says, "This computer has a 32-bit Pentium processor with 64 megabytes of RAM and 2.1 gigabytes of hard disk space". In this article, we will discuss bits and bytes so that you have a complete understanding.

Decimal Numbers:

The easiest way to understand bits is to compare them to something you know: digits. A digit is a single place that can hold numerical values between 0 and 9. Digits are normally combined together in groups to create larger numbers. For example, 6,357 has four digits. It is understood that in the number 6,357, the 7 is filling the "1s place," while the 5 is filling the 10s place, the 3 is filling the 100s place and the 6 is filling the 1,000s place. So you could express things this way if you wanted to be explicit:

     (6 * 1000) + (3 * 100) + (5 * 10) + (7 * 1) = 6000 + 300 + 50 + 7 = 6357

Another way to express it would be to use powers of 10. Assuming that we are going to represent the concept of "raised to the power of" with the "^" symbol (so "10 squared" is written as "10^2"), another way to express it is like this:
   (6 * 10^3) + (3 * 10^2) + (5 * 10^1) + (7 * 10^0) = 6000 + 300 + 50 + 7 = 6357

What you can see from this expression is that each digit is a placeholder for the next higher power of 10, starting in the first digit with 10 raised to the power of zero.

­That should all feel pretty comfortable -- we work with decimal digits every day. The neat thing about number systems is that there is nothing that forces you to have 10 different values in a digit. Our base-10 number system likely grew up because we have 10 fingers, but if we happened to evolve to have eight fingers instead, we would probably have a base-8 number system. You can have base-anything number systems. In fact, there are lots of good reasons to use different bases in different situations.

Computers happen to operate using the base-2 number system, also known as the binary number system (just like the base-10 number system is known as the decimal number system). Find out why and how that works in the next section.

The Base-2 System and the 8-bit Byte

The reason computers use the base-2 system is because it makes it a lot easier to implement them with current electronic technology. You could wire up and build computers that operate in base-10, but they would be fiendishly expensive right now. On the other hand, base-2 computers are relatively cheap.
So computers use binary numbers, and therefore use binary digits in place of decimal digits. The word bit is a shortening of the words "Binary digIT." Whereas decimal digits have 10 possible values ranging from 0 to 9, bits have only two possible values: 0 and 1. Therefore, a binary number is composed of only 0s and 1s, like this: 1011. How do you figure out what the value of the binary number 1011 is? You do it in the same way we did it above for 6357, but you use a base of 2 instead of a base of 10. So:

(1 * 2^3) + (0 * 2^2) + (1 * 2^1) + (1 * 2^0) = 8 + 0 + 2 + 1 = 11 

You can see that in binary numbers, each bit holds the value of increasing powers of 2. That makes counting in binary pretty easy. Starting at zero and going through 20, counting in decimal and binary looks like this:
0 =     0
 1 =     1
 2 =    10
 3 =    11
 4 =   100
 5 =   101
 6 =   110
 7 =   111
 8 =  1000
 9 =  1001
10 =  1010
11 =  1011
12 =  1100
13 =  1101
14 =  1110
15 =  1111
16 = 10000
17 = 10001
18 = 10010
19 = 10011
20 = 10100

When you look at this sequence, 0 and 1 are the same for decimal and binary number systems. At the number 2, you see carrying first take place in the binary system. If a bit is 1, and you add 1 to it, the bit becomes 0 and the next bit becomes 1. In the transition from 15 to 16 this effect rolls over through 4 bits, turning 1111 into 10000.

Bits are rarely seen alone in computers. They are almost always bundled together into 8-bit collections, and these collections are called bytes. Why are there 8 bits in a byte? A similar question is, "Why are there 12 eggs in a dozen?" The 8-bit byte is something that people settled on through trial and error over the past 50 years.

With 8 bits in a byte, you can represent 256 values ranging from 0 to 255, as shown here:
0 = 00000000
  1 = 00000001
  2 = 00000010
254 = 11111110
255 = 11111111

The Standard ASCII Character Set

Bytes are frequently used to hold individual characters in a text document. In the ASCII character set, each binary value between 0 and 127 is given a specific character. Most computers extend the ASCII character set to use the full range of 256 characters available in a byte. The upper 128 characters handle special things like accented characters from common foreign languages.

You can see the 127 standard ASCII codes below. Computers store text documents, both on disk and in memory, using these codes. For example, if you use Notepad in Windows 95/98 to create a text file containing the words, "Four score and seven years ago," Notepad would use 1 byte of memory per character (including 1 byte for each space character between the words -- ASCII character 32). When Notepad stores the sentence in a file on disk, the file will also contain 1 byte per character and per space.
Try this experiment: Open up a new file in Notepad and insert the sentence, "Four score and seven years ago" in it. Save the file to disk under the name getty.txt. Then use the explorer and look at the size of the file. You will find that the file has a size of 30 bytes on disk: 1 byte for each character. If you add another word to the end of the sentence and re-save it, the file size will jump to the appropriate number of bytes. Each character consumes a byte.

If you were to look at the file as a computer looks at it, you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character (see below). So on disk, the numbers for the file look like this:
     F   o   u   r       a   n   d        s   e   v   e   n
    70  111 117 114  32  97 110 100  32  115 101 118 101 110
By looking in the ASCII table, you can see a one-to-one correspondence between each character and the ASCII code used. Note the use of 32 for a space -- 32 is the ASCII code for a space. We could expand these decimal numbers out to binary numbers (so 32 = 00100000) if we wanted to be technically correct -- that is how the computer really deals with things.

The first 32 values (0 through 31) are codes for things like carriage return and line feed. The space character is the 33rd value, followed by punctuation, digits, uppercase characters and lowercase characters.

Byte Prefixes and Binary Math

When you start talking about lots of bytes, you get into prefixes like kilo, mega and giga, as in kilobyte, megabyte and gigabyte (also shortened to K, M and G, as in Kbytes, Mbytes and Gbytes or KB, MB and GB). The following table shows the binary multipliers:

2^10 = 1,024
2^20 = 1,048,576
2^30 = 1,073,741,824
2^40 = 1,099,511,627,776
2^50 = 1,125,899,906,842,624
2^60 = 1,152,921,504,606,846,976
2^70 = 1,180,591,620,717,411,303,424
2^80 = 1,208,925,819,614,629,174,706,176

Binary math works just like decimal math, except that the value of each bit can be only 0 or 1. To get a feel for binary math, let's start with decimal addition and see how it works. Assume that we want to add 452 and 751:

+ 751
To add these two numbers together, you start at the right: 2 + 1 = 3. No problem. Next, 5 + 5 = 10, so you save the zero and carry the 1 over to the next place. Next, 4 + 7 + 1 (because of the carry) = 12, so you save the 2 and carry the 1. Finally, 0 + 0 + 1 = 1. So the answer is 1203.
Binary addition works exactly the same way:

+ 111
Starting at the right, 0 + 1 = 1 for the first digit. No carrying there. You've got 1 + 1 = 10 for the second digit, so save the 0 and carry the 1. For the third digit, 0 + 1 + 1 = 10, so save the zero and carry the 1. For the last digit, 0 + 0 + 1 = 1. So the answer is 1001. If you translate everything over to decimal you can see it is correct: 2 + 7 = 9.

To sum up, here's what we've learned about bits and bytes:
  • Bits are binary digits. A bit can hold the value 0 or 1.
  • Bytes are made up of 8 bits each.
  • Binary math works just like decimal math, but each bit can have a value of only 0 or 1.
There really is nothing more to it -- bits and bytes are that simple.