COL100 class schedule and notes: 16-17 Sem 1

The lab assignments, class schedule, and class policies are available here.

You will learn the following aspects in the course:

What is computing, translation from a problem to an algorithm to a program
How a computer works, how is a program compiled and executed by a computer, binary numbers, computing using 0s and 1s
Programming in C. We are choosing C because it is close to the metal and will help you relate basic programming constructs to how they are implemented in a computer
The importance of good algorithms, how the same problem can be solved in different ways with different degrees of efficiency
Application of computers for simulations, solving of complex mathematical problems, and manipulating large amounts of data
Writing moderately complex programs. Planning first on paper how the program should be structured, and then coded in a programming language
The concept of libraries, basics of data abstraction to build generic libraries and reuse them

We hope that with these skills and strong concepts, you will be able to learn how to use new tools on your own, write your own programs to solve basic computation requirements in your disciplines, and hopefully learn new programming languages on your own.

Class 1: Introduction to computing

What is an algorithm, efficient/inefficient. Example of linear search compared with binary search. To guess a word of n charaters, with m alphabets (a..z, m = 26), linear search on average will take n.m/2 time, but binary search will take n.log(m) time with the log taken in base-2. Proof of binary search by constructing a tree and computing the height of the tree.

Class 2: Introduction to computers

What is a computer. Going from an algorithm, to a program written in a programming language, to converting it into machine code through a compiler. Executable code is loaded into RAM by the operating system, distinction between the instruction portion and the data portion of a program during runtime. How the CPU works, by fetching instructions and data one at a time from the RAM into the CPU, then using the ALU to compute the result and writing it back to RAM. Examples through simple programs. Reference slides that can be used.

Video on what a computer looks like from inside.

Class 3: Introduction to programming in C

Input/output operations using printf and scanf. Notion of variables as holding a value stored at a specific memory address. Notion of data types, integer and float. Notion of operator precedence and left-right evaluation of expressions. Notion of typecasting between integer and float. Reference slides 1-20. Sample program arithops.c to perform addition, subtraction, multiplication, division, modulo.

Self study components: printf/scanf data types (%d/f/lf/x/o/g), display formats (\n\t). Operator precedence, what is the precedence order of the % operator. More assignment operators such as +=, -=, *=, /=, %=

Can run an Ubuntu VM on a Windows computer, to try out C programs on your own: Download the VMware player and an Ubuntu 32-bit so image from the IITD repository. Also watch this video on setting up your VM.

Class 4: Number representation in computers

Binary representation, conversion from decimal system to binary system, conversion of integer portion and fractional portion. Hexadecimal and octal system, conversion from binary to hex/oct, proof of why the method of grouping 4 bits and 3 bits together works for hexadecimal and octal systems respectively. Notion of signed and unsigned integers. Typical range of (unsigned) char/int/long data types. Reference slides 1-23. Proof for conversion from binary to hex/oct.

Nice website on the history of computers and origin of binary systems. Video on how the Jacquard's loom works.

Class 5: Operations on numbers

Addition easy to implement in hardware, but subtraction is hard. Subtraction can be implemented as an addition of negative numbers, representation of negative numbers in 1s and 2s complement. Number wheel and understanding of why this method works. Overflow condition when addition of two +ve numbers gives a -ve number, or addition of two -ve numbers gives a +ve number. 2s complement as an improvement over 1s complement, range of signed char/int/long data types. Reference slides 24-40. Also look st slides 22-38 from Prof Sarangi's book.

Self study: Convert +ve and -ve decimal numbers to 1s and 2s complement, think how many bits would you need for numbers of different sizes, perform addition/subtraction on the converted numbers, trace this over the number wheel to understand why this works and what happens at the crossover from -ve to +ve index positions on the wheel, convert the resulting 1s and 2s complement numbers back to decimal.

Class 6: Essential programming constructs - loops and conditionals

Introduction to conditionals. if/else/else if branching conditions as arithmentic or logical expressions, nested if/else structures and confusion when braces are not used, ensuring coverage of all logical cases through a truth table to ensure correct runtime operation. Reference slides 23-31 on logical expressions and slides 1-21 on conditional statements. Sample programs marks.c to give comments based on performance, max.c to find the maximum of three integers.

Introduction to loops. while statement as repeated execution of a conditional if statement, computations inside a while loop until the repeat condition is met, problem of infinite loops at runtime if all logical cases are not covered. Reference slides 1-17 on examples using the while loop. Sample programs loop.c to find the sum of squares of the first n natural numbers, divisibility.c to test for divisibility by 3 or 9 by summing the digits, infinitewhile.c to run conditionals on arithmetic expressions, primes.c to test whether a number is prime or not.

Self study: Confusion between = assignment operator and == equality operator

Class 7: Essential programming constructs - loops and conditionals, continued...

For loop, break and continue statements in loops. Reference slides 18-46 on examples of loops. Sample program primeswithbreak.c to test whether a number is prime or not.

Introduction to strings as an array of characters. Declaration of strings. Arithmetic on ASCII codes for character manipulation. Reference slides 32-45 on the char data type. Sample programs simpleexpr.c to get a simple arithmetic expression as an input, invertchars.c to convert lower to upper case and vice versa, removespaces.c to remove spaces in a string. Homework: Modify the program to not remove the spaces altogether but compress 2 or more consecutive spaces to just 1 space.

Class 8: Clever things that computers can do

Fun things with nested loops. Sample programs printshapes.c to print triangles and inverted triangles, printcondition.c to print a diamond by evaluating line equations, printfunction.c to print a sine curve by scaling the domain and range to an 80x24 terminal output. Homework: Modify the program so that it can work for any function provided the domain and range are given.

Class 9: More programming constructs - arrays

Understand memory addresses of variables, how to print addresses, overflows in strings. Sample program addressing.c, try with and without the 3 in %3s to cause overflows when entering more than 3 character strings.

Introduction to arrays. Reference slides on arrays. Sample program arraymax.c to store a series of numbers and find the maximum.

Redirection of IO. Sample program rightprog.c and wrongprog.c, to take two integers as input and output the product but without the sign. Test cases: (input1.txt, output1.txt), (input2.txt, output2.txt), (input3.txt, output3.txt). Run both the rightprog and wrongprog as ./a.out < inputx.txt > myoutputx.txt, and then do diff outputx.txt myoutputx.txt to compare your output and the expected output.

Class 10: Computing on arrays. The fun game of life

More arrays. Selection sort arraysort.c, time complexity is n^2. Homework: Sort the array, then find the mode in a single scan of the array.

Fun with the game of life, 1D gameoflife1d.c and gameoflife1d.in to pass through redirection. Notice some new programming constructs that have been introduced, including the #define directive, the conditional operator, and the bitwise XOR. Read your slides for the details!

More fun with the 2D game of life, sample inputs for oscillators and still life, glider, and the gun.

To get the 2G game of life code and compile it and run it to have fun, you will have to decode this! I have used a very simple encoder which is available and simply reads a C program file character by character and does an increment of 10 on the ASCII code for each character, to produce something which appears garbled. Try it out, it works on simple C programs which just have one main() function: compile and run as ./a.out < testprogram.c > testprogram.encode. Be careful when writing these commands so that you don't overwrite your own program by mistake! Read the encoder carefully and notice the wrap around for values more than 127, and the termination character that is used. The decoder is simpler and just 3-4 lines of code to write. A useful program to see the encoded file byte by byte is hexdump which comes pre-installed on Ubuntu: use as hexdump -c testprogram.encode or hexdump -C testprogram.encode. To understand what these options -c and -C mean: use man hexdump which opens the hexdump manual.

Class 11: Leveraging computers for what they are good at - a forest fire simulation

Fun with a 1D forest fire simulation. forestfire.c takes as input the probability of a living tree being hit by fire and runs a simulation over many generations of tree birth and death in the forest. The growth simulations are run for different probabilities of tree birth and the output is a table of three columns: fire probability, birth probability, average number of trees in a generation. Compile and run the program as ./a.out > forestfire1d.0.05.out and enter 0.05 for the input, to produce a table with fire probability 0.05. Similarly run as ./a.out > forestfire1d.0.25.out and enter 0.25 to produce a table with fire probability 0.25. Do the same for fire probabilities of 0.50 and 0.75. Then use forestfire.plot to plot the graph by starting gnuplot, and type load "forestfire.plot" in gnuplot.

Gnuplot would be installed on the lab machines, but if you don't have it on your computer then you can install it by typing sudo apt-get install gnuplot, assuming you are the root user on your computer.

Also start going through the slides to learn about defining functions in C.

Class 12: Making computer programs run efficiently - algorithms

The decoding program to decode the game of life is now available: decode.c. Have fun!

Binary search revisited, with integers and with strings: binsearch.c and binsearchwords.c. You can use the Oxford 3000 words as a redirection input to the binary search program on words: oxford.txt. Note that towards the end the file has an entry for the search word, ie. the word to search for in the dictionary.

Passing arguments to functions by value, passing arrays as arguments. Reference slides.

Class 13: How functions work. Introduction to recursion

More about how functions work, clarity in passing by value and passing by reference. Sample programs functions-int.c and functions-arr.c.

Introduction to recursion. Reference slides. Sample programs factorial-recur.c.

Class 14: More recursion

Binary search through recursion: binsearch-recur.c. Find the time taken in the worst case for binary search in an array of size n.

Towers of Hanoi: hanoi.c. Find the time taken to solve the problem with n disks. A helper program to more clearly understand through visuals how the solution works: hanoi-visual.c

Fibonacci numbers through recursion and iteration: fib-recur.c. Why are the steps taken in the recursive version worse than the iterative version? Can you express the number of steps taken in the recursive version. And find a more efficient recursion model.

Class 15: Even more recursion

Mergesort. mergesort.c. Find the time taken by mergesort, and compare it with selection sort.

Practice programs: Reverse a string - revstr.c, check if a string is a palindrome, check if a decimal integer is a palindrome, print all permutations of a string, find the second largest number in an array of unsorted numbers, find the kth largest number in an array of unsorted numbers, find the gcd of two numbers using recursion

Class 16: Introduction to pointers

Reference slides. Sample program: pointers.c. Write a function to swap two numbers by passing their addresses.

Array of pointers. Program to jumble a list of words: jumblewords.c. Invoke by redirecting oxford.txt to the program.

Side note: scanf treats spaces and \n differently when a series of inputs are taken. Safest way is to read lines using getchar, and then parse the line using sscanf. Sample program: readline.c

Class 17: Writing efficient programs

Solving problems through computers can sometimes not be done with just brute force. Computers have limitations and programmers should write efficient programs that utilize computing resources well. Consider this method to compute x^n in log n steps, instead of the naive method to multiply n times by x, which takes n steps: pow.c.

Consider this sweet problem at Project Euler to compute the last ten digits of the series 1^1 + 2^2 + 3^3 + ... 1000^1000. Even long long int cannot handle numbers that large, 1000^1000 has 3000 digits! Here is a simple way to solve it by not computing more than 10 digits when evaluating each term in the series: selfpowers.c

Also consider this problem which requires us to calculate the sum of the digits of a^b where a and b can be between 1..100. Again, very large numbers! Well, you can actually write your own program to perform these calculations digit by digit, ie. not use the datatypes in C but effectively create your own datatypes where you can work with very large numbers: powerfuldigit.c

Here is another nice problem, to find the prime factors of a number, n = p1^a1 x p2^a2 x p3^a3... If you are not given a list of prime numbers in advance, you can create quite inefficient programs to first find prime numbers and then find which ones of them divide n. Instead, what you can do is to find the (i+1)th prime number given the first i prime numbers, as the smallest number which is not divisible by any of the i primes. And you need to go as far as only the square root of n to find these primes. Further, if the new found largest prime divides n, then you can keep dividing n by the prime, to reduce the range in which you need to find more primes. Reading through the program will be easier to understand! primefactors.c

Class 18: Planning the programs before writing them

Generate permutations of a string assuming all the characters are unique: permute-unique.c. The logic is simple using recursion, that you generate permutations of n characters by generating n unique collections of n-1 characters, and then generate permutations of these n collections. The n collections can be generated by taking out characters one by one from the array of n elements.

Now modify the program to generate permutations with duplicates: permute-duplicate.c. This was actually solved by a student in class! The logic is to ensure that with duplicates, you will not be able to generate n unique collections of n-1 elements by taking out one element at a time. You will generate fewer collections, and you can do this by keeping the elements in sorted order and checking that you take out an element from the array only if in the previous iteration you had taken out a different element. The program given here assumes that the original string has characters in sorted order.

Apply these insights to solve similar problems (mathematically), to find the number of routes in a grid from the top-left to the bottom-right corner. And to find the permutation in lexicographic order for a given position. You can see how the insights gained in writing algorithms to generate permutations, comes in handy to solve these problems mathematically.

Another nice problem to solve is to find the maximum path in a set of numbers arranged in a triangle, from the top row to the bottom row. My solution is here maximumpathsum.c. The logic is to understand that if you know how to reach each of the elements in the ith row from the first element at the top, then you can compute how to reach each element in the (i+1)th row. You need to think what additional data you will need to maintain to compute the maximum paths for the ith row, and then for the (i+1)th row, and so on. If the question is to also print the maximum path, then you need to maintain more data of how each element in the ith row is reached, and update that when you move to the (i+1)th row.

Class 19: More fun - substitution ciphers

A substitution cipher substitutes each alphabet with another alphabet, and only you or your friend may know the exact substitution. Given that there are 26! possible substitutions that could be possible, it can be hard for anybody to use pure brute force to break the cipher. Here is a program to convert a given text into its cipher text: substcipher.c. You can run it on a bunch of essays by George Orwell [1,2,3,4,5] and Mark Twain [1,2,3].

But you can be clever and use some recurring patterns of the English language to be efficient. For example, the most commonly occuring alphabets are e, a, t, i, o... and you can check which alphabets occur most frequently in the cipher text; quite likely that these alphabets will be the same. Similarly, you can look at the most frequently occuring bigrams, ie. two consecutive alphabets, which are th, he, in, an, er... And the most frequently occuring tri-grams, which are the, ing, and, ion, hat... These are what I have experimented with, you can also utilize a bunch of other patterns such as what characters follow an apostrophe, the most frequently occuring 2/3/4/5 letter words, the most common word-endings, etc. The programs give you the most frequently occuring alphabets, bigrams, and trigrams, in a given a piece of text: analysis-alpha.c, analysis-bigrams.c, analysis-trigrams.c.

Try it out on the various Orwell and Twain essays. You can see that roughly there is some consistency, e is the most frequently occuring alphabet, and while a and t are in competition with each other for the second and third places, but in these essays they have at least never taken the fourth place. The fourth/fifth/sixth places are similarly in competition between i, o, n. Similarly, you can see the consistency in the most popular bigrams and trigrams, and through manual inspection you can build a set of rules as follows, for alphabets ([e] [a t] [i o n] [s r h] [l d]), for bigrams ([th] [he in] [an er on] [re ed nd ou]), and for trigrams ([the] [ing and] [ion hat] [tha] [ent] [tio]). You can take any ciphertext and find the most frequent alhpabets, bigrams, and trigrams, to manually find mappings for at least 9-10 alphabets, most likely you will be able to solve for at least e/a/t/i/o/h/n/g/d/r. This program does the analysis for you: substdecode.c.

This of course does not solve the problem, but at least gives a starting point. Now some clever techniques can be applied. You can scan the cipher text to find words in which most of the alhpabets have been decoded, and solve the remaining alphabets in the words. For example, if you have solved for g/o/a/t, and the word gloat occurs in cipher form then you will be able to solve for the alphabet l as well. Similarly, if you have solved for t/e/a then you can solve for the character r if the word treat occurs in cipher form. So in general, you can extract words from the cipher text in which most of the solved characters occur, and check against a dictionary for new words that can be matched. This too may give you conflicts because not all words may be there in the dictionary, especially if you confuse a proper noun for word, and you can use nice techniques like to maintain a quality score for different possible solutions by finding how many words were matched against known English words in a dictionary.

Class 20: Even more fun - navigating a maze using a stack

How can you navigate a maze like the one here? The rules are simple, you have to start at the top-left corner and find a path to the bottom-right corner; you can go left/right/up/down along the *s, ie. traverse horizontally, vertically, but not diagonally along the routes. We discussed at least three algorithms. The first operated similar to Hansel and Gretel's strategy of dropping breadcrumbs -- keep dropping breadcrumbs along the route you take, if you hit a deadend then back trace the bread crumbs to the last point of intersection and take another route, if all routes from the last point of intersection have been taken then back trace the bread crumbs to the previous intersection. If you end up back at the starting point, it means that no route exists. If one does, then you will find it. The program can be structured nicely as a recursion where you can write a function traverse that takes an (x, y) parameter input and another parameter for whether to trace a new path or to back trace an old path. Write out the program as a practice.

The recursion essentially ends up maintaining a stack (LIFO - last in first out) of nested intersections. You can maintain this stack yourself and then write an iterative program. Here is one version, maze.c. The maze is read as a redirected input. Carefully understand the dynamics of the traverse function which essentially traces a path from a given (x, y) point to the next intersection. If the destination lies on the path then it returns a 1, if it ends up in a deadend then it returns a -1, and if it hits an intersection then it pushes to the stack all the outgoing paths from the itersection. The loop which invokes the traverse function simply terminates if the destination was found, else in the case of hitting an intersection it allows traverse to be called one by one for the outgoing paths from the intersection, and in the case of hitting a deadend it pops the last element off the stack and allows the subsequent traversal from the next element on the stack.

Yet another nice method that a student suggested is to eliminate the deadends. Identify the deadends as those points with just one neightbour, and then trace them to the last point of intersection, drop that entire path. In the next iteration, intersections from which only deadends originate can similarly be dropped. Do not treat the final destination as a deadend, and ultimately you will be left with the path from the source to the destination.

Also think about which of these algorithms can work on a maze with cycles.

In the programs above, the maze can be represented as a 2D array of paths. Another method of representing it is to model each deadend and intersection as a node, with edges connecting the nodes that have a direct path to each other. This is called a graph. You can label the nodes of the maze as follows, and read the graph as an input accordingly by first reading the number of nodes and then a list of edges that connect pairs of nodes as follows. The same program can be rewritten easily to work on the graph as an input: graph.c.

The main lessons from the examples we have seen so far is that before solving any problem, first think about the data representation (ie. how to store the data required by the program), then the logic or algorithm of solving the problem, then the broad program structure in terms of whether to use recursion or not, what functions to write, etc, and finally get down to actual coding to write the program. While working on the first three steps, try to only use paper and pen, do not writing code right away.

Class 21: Another application of stacks and structures

Consider you are a tourist and you want to visit a bunch of landmarks, you start from your hotel and want to tour different places and come back to the hotel, without visiting the same place twice. This is called a Hamiltonian Path on a graph. How will you find such a path? You can use the following graph representation tour.txt, where the first line is the number of nodes, then there is a list of the nodes with their names, and then the edges connecting the nodes. This can be solved recursively by starting from the hotel, then for each neighbor traverse onwards from that neighbor recursively. You will need to track the state of the nodes you have visited so far, so that you don't try to visit the same nodes again and end up in loops. The recursive solution is here, and uses structures: tour-recursion.c. The structure helps maintain details for each node, and significantly improves readability and being able to track variables. Notice the path array of pointers to the structures which maintains the path being considered at any point, and trace in the dry run how the entries in the path array grow and shrink.

Reference slides on structures: structures.ppt.

Here is also a non-recursive solution using stacks: tour-stack.c. Much like how we used stacks in the maze example, here too the stack keeps track of nodes yet to be visited. Note that the path is maintained in another array, and carefully understand the difference in dynamics of the stack which keeps track of nodes to be visited, and the path array which keeps track of the path taken so far in the current route being evaluated.

Class 22: Dynamic allocation and writing libraries

Reference slides on dynamic allocation: dynamicallocation.ppt.

Stacks are a pretty useful data structure for many applications, but what if you want to write programs that use many stacks all at once. Or even better, write a library that you can reuse whenever you want to write programs that use stacks. Here is a go at it, the implementation in stack.c and the corresponding header file stack.h give you all that you need -- the function newstack creates a stack data structure and returns a pointer to the structure, plus the functions push/pop/peek provide the necessary LIFO functionality. The stack functions are invoked from stackmain.c, you can see the simplicity created for the programmer who need not know anything about the internals of the implementation. They just need to include stack.h and invoke the required functions. Compile as gcc stack.c stackmain.c.

The coolest part also is that the stack resizes itself automatically. If more and more elements are added to the stack, then dynamic memory allocation is used to grow the size. The programmer need not even specify initially how large a stack they want to create.

In the same way, you can write your own library to manage arrays of integers and arrays of words, which automatically resize themselves: arraymain.c, array.c, array.h. You can access the array elements by invoking functions to get or set values at specific indices in the array. Note the use of calloc for allocating memory to words, calloc initializes the allocated memory to 0 while malloc may leave it unitialized, and we check whether or not a particular element is 0 or not to decide whether to free its memory or not. Also note the use of the assert function, to check against memory access violations.

You should brush up with reference slides for the dynamic allocation of 2D arrays: 2darrays.ppt.

Also look at slides for stacks: stacksqueues.ppt.

Class 23: Data abstraction for generic libraries

What if you had to write the same libraries as above to hold a stack of integers, or a stack of floats, or a stack of words, or even a stack of some arbitary structure? Since the implementation logic will be the same, it will be a pain to copy-paste everything and only change ints to floats or char * etc. A better way is as follows, look at defn.h where you can create a #define DTYPE for whatever data type you want to use for the stacks in your program. The defnstack.h and defnstack.c which contain the actual stack operation definitions and implementations for push/pop/etc just need to refer to DTYPE, and the compilation process takes care of substituting DTYPE with the corresponding #define. The library is invoked from defnstackmain.c, which currently assumes that DTYPE is defined to be an int, and it pushes and pops ints from the stacks. Compile as gcc defnstack.c defnstackmain.c.

To change this to using stacks of floats, you only have to comment the #define DTYPE for int, and uncomment for float. You can then change the invocations in defnstackmain.c to push and pop floats, and accordingly print floats by changing the printf to print %f instead of %d. No changes are required at all in the stack library, so the person who wrote the library may be away on vacation and you as a person who just uses the library need not worry at all about any details of how the library is implemented and don't have to change it to use the stack for floats.

In the same way, to use for a stack of words, you can comment the #define DTYPE for float, and uncomment for the structure containing an array of characters. The library can be invoked from defnstackmainstrings.c. Compile as gcc defnstack.c defnstackmainstrings.c. In fact, you can make any arbitrary structure and use the stack library for it.

All good so far. But what if in the same program you want to use stacks of ints, floats, strings, etc, all at once? The #define DTYPE method will not work because DTYPE can be substituted for only one of the data types at one time, so if you have defined it for int then you can have as many stacks for int as you want, but you cannot have a stack for floats in the same program. The funny union datatype of C can come in handy here. Only leave uncommented the union related lines in defn.h which declare the flexi data type, and invoke the library from defnstackmainflexi.c. Compile as gcc defnstack.c defnstackmainflexi.c. Understand the invocation file carefully, as the programmer it is entirely your choice which stack to use as an int stack, which one as a float stack, and which one as a string stack. In the int stack, you need to initialize the integer element of the union when pushing, and access the integer element upon popping. If you want to use a stack for floats, then you need to initialize the float element of the union when pushing, and access the float element upon popping.

We have rewritten the non-recursive programs for the maze and tour programs using this stack library. Use the #define DTYPE position in defn.h for the maze.c program, and the #define DTYPE landmark* for the tour.c. The accompanying stack library files are defnstack.c and defnstack.h. Compile as gcc maze.c defnstack.c for the maze and gcc tour.c defnstack.c for the tour programs, and run as ./a.out < maze.txt for the maze and ./a.out < tour.txt for the tour. The input files can be downloaded from here: maze.txt and tour.txt.

Class 24: A scheduling simulator for hospitals

We want to help a hospital work out its doctor staffing requirements. What is given is the number of patients per day the hospital receives, the probability distribution of how sick the patients are likely to be (classified into 5 levels of priorities), and the amount of time it takes a doctor to service a patient depending on the priority of the patient. You can assume a uniform arrival rate of patients, of one patient arriving in each unit of time. The hospital may have a certain number of doctors who will service the patients one by one. They will first service patients who have the highest priority of 5, then if there are no patients of priority 5 they will service patients of priority 4, and so on. We will keep track of the waiting time of each patient, calculated as the difference between the time when the patient was assigned to a doctor and the time when the patient arrived at the hospital. The objective is to have as few doctors as possible but avoid very long average waiting times for the patients.

We will first build a workload generator that takes the given parameters as input and generates a sample traffic for the simulator. workloadgen.c does the job, and takes as a redirected input names.txt which contains a random list of names for patients. Compile and invoke as ./a.out < names.txt > traffic.txt. The output is redirected to traffic.txt, here is a sample output which contains the patient-id followed by the priority and then the name.

Now let us think how to design the actual simulator. We clearly have two entities, patients and doctors. Whenever a patient arrives, we first have to look at the priority for that patient, and then assign the patient to wait alongside other patients with the same priority level. For each doctor, we will have to keep track of when the doctor becomes free. And whenever a doctor becomes free, we should assign a patient to that doctor -- this patient should be picked up from among the highest priority patients who are waiting, if there are no highest priority patients then from the next lower priority, and so on. We therefore have to maintain the set of patients for each priority level, keep adding patients to the appropriate priority level as they arrive, and remove patients one by one from their set starting from the earliest patient who arrived for that set. To maintain this set of patients for different priorities, we will use a new data structure called a queue, which unlike the stack data structure follows a FIFO (First In First Out) policy. A separate queue can be defined for each priority level, new patients can be enqueued into the queue corresponding to their priority, and whenever a doctor becomes free a patient can be dequeued starting from the highest priority non-empty queue and serviced by the doctor.

The design can therefore be as follows. We will maintain a clock variable to keep track of a virtual time. On each clock tick we will read a new patient as long as there are patients in the traffic workload, and enqueue the patient into the appropriate queue based on their priority. We will then loop through all the doctors to check if they are busy or free. If a doctor is free, then we will dequeue a patient from the highest priority non-empty queue and assign the patient to the doctor. We will maintain a variable for each doctor to keep track of how long the doctor will be busy for, based on the amount of time it will take the doctor to service the patient for that priority level. Then if the doctor is busy, we will decrement this time remaining by one, on each clocktick. We will keep incrementing the clock and looping, until all the patients have not been fully serviced.

Here is the scheduling.c program, which will also include defn.h containing the definitions for the patient and doctor structures. The queue library is implemented in arrqueue.c and defined in arrqueue.h. Note that so far we have not looked at the queue library at all, and how it is implemented. Go through the slides to read about different ways to implement queues. Compile as gcc scheduling.c arrqueue.c, and run as ./a.out < traffic.txt.

Class 25: Queues and linked lists

Just like stacks, queues are a handy data structure that are used in several applications. One obvious way of implementing queues is in a circular array, as is done in the arrqueue.c and arrqueue.h files. Another way is as a linked list, with the implementation in llqueue.c and llqueue.h. In the hospital scheduling simulator above, you can simply uncomment the #define QUEUE llqueue in defn.h, and include llqueue.h in scheduling.c, to use the linked list version of the queue implementation. The rest of the program does not need to change at all.

Some suggestions for practice questions on stacks, queues, and other fun things:

Evaluate a given postfix expression using stacks. Slides 1-15 explain the concept well.
Convert an infix expression into postfix. The same slide deck explains this well.
Like the tour example, read a list of people and friendship relationships between the people. Use stacks to find a path between a given pair of people, which one person can follow to get introduced to the second person. Modify this program to find the path lengths between all pairs of people, and calculate the average path length. You can use the social networks datasets here, especially the karate club, dolphin, and Les Miserables datasets. The data is in GML format and the parser in C is also available on the same website.
Implement the stack library using linked lists.
Think how you would simulate a traffic intersection with 3 incoming roads. All the roads are bi-directional, and you are given the arrival rate of cars on each road. Assume a uniform arrival rate along each road, and the cars take a random direction on to any of the other roads at the intersection. We want to find out for how long should we keep different routes on the intersection open, so that the average waiting times for all cars is minimized.
We stored the linking information between a graph of nodes as a 2D array of edges. Think how you would store a graph using linked lists. Can you write your own library to store graphs, which will provide functions such as create_graph(N) to create a graph of N nodes, addedge(X, Y) to add an edge for node X to node Y, getedges(X) which will return all edges from node X, isedge(X, Y) which will return whether an edge exists between nodes X and Y, etc. Implement these functions first using a 2D array to store the edges, and then using linked lists. What kind of calls are more efficient with one method compared to the other?