Back to Topics

1. Information Representation

1.1 Data Representation

Binary code visualization

Data representation lies at the core of computing. Every piece of media, every calculation, and every file is ultimately expressed in binary form—streams of 1s and 0s. Understanding how data is represented, transformed, and interpreted is essential for mastering computer science.

Binary Magnitudes & Prefixes

To manage large data sizes, prefixes are used to denote magnitude. There are decimal prefixes (kilo = 10³) and binary prefixes (kibi = 2¹⁰) that sound similar but mean different values:

  • Kibi (Ki): 1024 bytes vs. Kilo (k): 1000 bytes
  • Mebi (Mi): 1,048,576 bytes vs. Mega (M): 1,000,000 bytes
  • Gibi (Gi): 1,073,741,824 bytes vs. Giga (G): 1,000,000,000 bytes
  • Tebi (Ti): 1,099,511,627,776 bytes vs. Tera (T): 1,000,000,000,000 bytes

Understanding these distinctions prevents confusion when working with memory sizes and file capacities.

Number Systems

While humans commonly use decimal (base-10), computers use binary (base-2). Other systems bridge the gap:

  • Binary: Base-2, digits 0 and 1.
  • Hexadecimal: Base-16, digits 0-9 and A-F (A=10, ..., F=15). Great for compact representation of binary.
  • BCD (Binary Coded Decimal): Each decimal digit is represented by its own 4-bit binary sequence. Useful in systems that need decimal precision.
  • One's and Two's Complement: Techniques for representing positive and negative integers in binary. Two's complement is standard for signed integers in modern computers.

BCD (Binary-Coded Decimal) in Detail

BCD is a special encoding system where each decimal digit is represented by its own 4-bit binary number.

Advantages of BCD:
  • Straightforward conversion between BCD and denary numbers
  • Less complex encoding and decoding for programmers
  • Digital equipment can easily display BCD output information
  • Monetary values can be represented exactly
Invalid BCD Numbers:

A number cannot be represented as BCD if any nibble (4 bits) represents a denary number greater than 9. For example, 1010₂ (10) to 1111₂ (15) are invalid BCD digits.

Converting Numbers to BCD:
  • Each decimal digit is converted to a separate 4-bit binary number
  • Example: 23₁₀ = 0010 0011₍BCD₎ (2=0010, 3=0011)
Converting BCD to Numbers:
  • Split the binary number into groups of 4 bits (starting from the right)
  • Convert each group to its corresponding decimal digit
  • Example: 0010 0011₍BCD₎ = 23₁₀ (0010=2, 0011=3)
Common Uses of BCD:
  • Digital clock displays where each digit needs separate representation
  • Calculator displays for accurate decimal arithmetic
  • Financial calculations requiring exact decimal representation
  • Systems where decimal fractions must be accurately represented

Converting Between Number Systems

Conversions are crucial for understanding how data moves between human-readable forms and machine-readable forms:

  • Binary to Decimal: Multiply each bit by 2 raised to the power of its position and sum them. Example: 1011₂ = 1×2³ + 0×2² + 1×2¹ + 1×2⁰ = 8+0+2+1=11₁₀.
  • Decimal to Binary: Divide the decimal number by 2, record the remainder (0 or 1), and read the remainders backward. Example: 11₁₀ → (11÷2=5 r1, 5÷2=2 r1, 2÷2=1 r0, 1÷2=0 r1) → 1011₂.
  • Decimal to Hexadecimal: Divide by 16. The remainder gives the hex digit (0-9 or A-F). Example: 255₁₀ → 255÷16=15 rF, 15÷16=0 rF → FF₁₆.
  • Binary to Hexadecimal: Group binary digits in fours and convert each group directly to a hex digit. Example: 1011₂ = (1011)₂ = B₁₆.

Binary Arithmetic

Binary addition and subtraction use the same principles as decimal but with base-2:

  • Addition: 1+1=10₂. Carry over if sum exceeds 1.
  • Subtraction: Borrowing works similarly as in decimal, but in base-2.
  • Overflow: Occurs when the result can’t fit into the set number of bits, e.g., adding two large 8-bit numbers might require a 9th bit.

Character Representation

Text is stored as numeric codes to represent letters, digits, punctuation, and symbols:

  • ASCII: 7-bit standard (often stored in 8 bits) for English letters, digits, and basic symbols.
  • Extended ASCII: Uses 8 bits for 256 characters, allowing for accented characters and basic graphical symbols.
  • Unicode: A worldwide standard that can encode virtually all languages, plus emojis and special symbols. UTF-8 and UTF-16 are common Unicode encodings.

Unicode ensures global communication and consistent text representation across platforms.

Practical Uses of Hexadecimal & BCD

Hexadecimal is used for memory addresses, MAC addresses, and web color codes (#RRGGBB).BCD is handy in digital clocks, calculators, and other devices that display decimal digits directly.

1.2 Multimedia – Graphics & Sound

Graphics

Graphics turn binary data into vibrant visuals. Two main approaches are used: bitmaps and vectors.

Bitmap Images

A bitmap image is a grid of pixels, each with its own color. The level of detail and color depends on resolution and color depth.

  • Storage Method: Sequentially store color codes for each pixel.
  • Pixel: The smallest unit of an image. More pixels mean higher resolution.
  • Colour Depth: The number of bits per pixel. More bits = more colors, enhancing image quality.

File Header (Bitmap)

  • Identifies the file type (e.g., BMP).
  • Specifies dimensions (width & height in pixels).
  • Indicates compression type (if any).
  • Shows color depth (e.g., 24-bit = True Color).
  • Offset to where the pixel data begins in the file.

Calculating Bitmap File Size

Approximate file size = (width × height × bits per pixel) ÷ 8 + header size.

Effects of Changing Image Settings

Increasing resolution or color depth improves quality but increases file size. Reducing them saves space but may lower image quality.

Vector Graphics

Vector images use mathematical descriptions to draw shapes and lines:

  • Property: Contains data about shapes (coordinates, line thickness, fill color) defining how the object looks.
  • Drawing List: Stores all objects and their properties, acting like a set of instructions to redraw the image at any size without losing quality.

Vectors are ideal for logos, icons, and diagrams that must scale to different sizes.

Choosing Bitmap vs. Vector

Use bitmaps for complex, photo-realistic images and vectors for images that must be resized frequently without losing clarity (like company logos).

Sound

Sound waves are analog, but computers store them as digital data:

  • Sampling: Taking regular measurements of the sound wave’s amplitude over time.
  • Sampling Rate: Number of samples per second (Hz). Higher rates capture more detail but use more storage.
  • Sampling Resolution (Bit Depth): Number of bits per sample. More bits = more precise volume levels, larger file size.

Adjusting sampling rate or resolution balances audio quality with file size. For speech, lower settings might be acceptable; for music, higher quality is often preferred.

1.3 Compression

Compression makes files smaller, speeding up transfers and saving storage. It’s essential for web media, streaming services, and cloud storage.

Lossy Compression

Definition: Permanently removes some data to achieve smaller file sizes. Typically used for images, audio, and video where slight quality loss is acceptable.

Benefits:

  • Significantly smaller file sizes compared to original or lossless methods.
  • Faster upload/download times, crucial for web content delivery.
  • Reduced bandwidth usage, beneficial in limited data environments.
  • Ideal for streaming media platforms and online galleries.

Lossless Compression

Definition: Compresses data without any loss of the original information. Perfect for text, code, and any scenario where accuracy is paramount.

A common technique is Run-Length Encoding (RLE):

  • Identifies consecutive repeating elements (e.g., pixels of the same color or repeated characters).
  • Stores the value once and the count of how many times it repeats rather than storing each instance.

Advantages: The original data is perfectly restorable. Ideal for financial records, legal documents, or any data where accuracy matters.

In practice, formats like ZIP or PNG (for images) use lossless techniques. JPEG (for images) and MP3 (for audio) typically use lossy methods.

Practice Questions

Convert the binary number 1011₂ to decimal.

Which form of compression can be reversed perfectly to get back the original file?

What is the smallest element of a bitmap image?