Lecture - 12 Data Modeling - ER Diagrams, Mapping -Transcript

Introduction

We’ll continue our discussion on process modeling. In the previous lecture, we talked about functional decomposition as a first step in process modeling. Given the complex process, we should try to decompose it into sub-processes or smaller processes which can be better understood in terms of what actions they do.

Data Flow Diagrams

We’ll now continue with the process modeling and look at another important tool called Data Flow Diagrams. Data Flow Diagrams are a very popular tool for describing functions which are required in a given system. And these functions are specified in terms of processors as well as the data used by these processes. So this is one important difference from function decomposition diagram where we do not show data explicitly whereas in the case of Data Flow Diagrams both the processes and the data which will flow among those processes would be shown. That’s why they are called Data Flow Diagrams.

We may do function decomposition diagrams before doing data flow diagram. However we may also do data flow diagrams directly but it is better to think about function decomposition beforehand and it would be a good practice to do function decomposition before, because this function decomposition would be anyway required when we do data flow diagrams. So having done it earlier before coming to data flow diagrams would always be more fruitful. Data flow diagrams have more content than function decomposition diagrams obviously because now we will be explicitly showing the flow of data.

Data flow diagrams are very simple pictorial tools. They represent the functional and the data flow in the form of a diagram. And therefore they are very easy to understand by all people, users as well as managements. So they have become very popular in the analysis phase for representing the functions performed by a particular application.

Data flow diagrams are also unambiguous and concise. They can describe processing both at the physical level as well as at the logical level. Remember that at physical level we describe the way things are done rather than what needs to be done. So we can have both data flow diagrams. In fact, usually when you’re studying the existing system, you are studying it at the physical level. Therefore if you represent these in the form of a data flow diagram, the diagram will be at the physical level representing both what is done and how it is done currently.

After doing this, you will move towards preparation of a logical level data flow diagram where we emphasize what needs to be done and not necessarily how it should be done because the how part is really the part to be addressed during the design phase.

DFDs facilitate top down development. In fact, that is the strength of the tool so that you can introduce more and more details as you do step-by-step decomposition of these diagrams. They permit outlining of preferences and scope. So when you are discussing different alternatives with the users or the management, you can clearly mark those alternatives on the data flow diagram itself so that the people can understand the scope of the application software that we are proposing. So it’s another use for data flow diagrams.

DFD Notations

Here is the notation that we use in diagramming the data flows. In the diagram, you show data flow through an arrow. Usually the arrow will be labeled with the kind of data which flows on that. Then we show the sources of data or the sources which use this information. These are generally the external entities and these entities are shown using either a rectangle like this, or it may be a double-edged rectangle. I am giving here two different representations and both are used in the industry. You can choose one of them. The one on the left is simpler to draw when we are doing the data flow diagrams by hand. So we will probably prefer this. But whereas when you use tools for doing data flow diagrams any one of them could be used by us. So sources and sinks of data and information typically are the users of this system. And these would be shown as external entities.

Then processes are either shown as a circle which is also called a bubble, or a rectangle with the rounded corners, and we label them with some number for easy reference. So process is shown. And then finally we also show data either through a pair of lines or by a small box which represents a collection of data. And that box may also be labeled with a number for ease of referencing.

So primarily the data flow diagram provides only four symbols. One is the arrow for flow of data. One is a rectangle that represents an external entity which would either supply some data to our application, or which will receive some results from our application. Then we have the processes represented as bubbles, and we have a data store which is represented by a pair of lines. So this is the simple rotation that we will use for drawing data flow diagrams.

Here is a simple example. So let’s begin by an example and you’ll also appreciate how simple they are to read and understand. Now in this we see two rectangles, both are labeled same, so it’s a single entity called a traveler. So traveler is a external entity which will be using this application which we have called as airline reservation. Now some data flows from this traveler and comes to a process called make reservation. Now that data would be the date and time and destination where the traveler wishes to go, and he wants to buy a ticket if it is available for this.

So the first process which handles this data is the make reservation process. Now if you look at this process, this process not only takes this input but it takes another input from a data source called flight database. This arrow, the direction of arrow indicates that the data is being taken by the process as the input of data.

So we take two inputs here. One is from the traveler about his time, data and destination. The other is the flight database, and then we prepare a suitable reservation. The reservation is also recorded. We may have to consult the existing bookings and see whether there is a space available. And if available, we make a reservation.

After making the reservation, this process produces outputs for two other processes. So one output goes from make reservation to a process called prepare ticket. Now ideally we should be labeling all these arrows. The label of the arrow will indicate the data which is sent by make reservation to the prepare ticket. But from the context we can easily make out that in order to prepare the ticket we will have to obtain traveler’s data as well as the flight data so that a ticket can be made. And the ticket is an output of this process. The ticket goes to the traveler. So this is a physical output produced by our software for the traveler.

Then there is another process to which the make reservation process supplies some output and that process is the billing system. Again we can really see that billing system must receive some inputs from make reservation so that the cost of the journey or the cost of the ticket can be calculated. And this billing system will produce a bill for the traveler and will also note that in the accounting file. Subsequently the billing system will also handle the payment from the traveler.

So in this airline reservation we have defined three processes called make reservation, prepare ticket and billing system. We have identified one external entity who is the user of this software or this application. And we have identified the data stores. These data stores contain the data relevant to the application. So these data may be related to the flights. The data may be related to the customer himself so that we can keep the billing information for him and also the data about the bookings that we have been – we have made. So airline reservation system would consist of such processes.

So you see here that the data flow diagram can be read in terms of external entities, the data that they supply or the results that they receive. And we can — through the names that we’ve selected for these processes we can try to understand what happens in this application. So it’s — again the naming is very important. We name the bubbles as well as the data sources properly, and when we do that, a data flow diagram can be understood easily without any additional explanation from the analysts. This is the advantage of the data flow diagrams. They are understandable on their own.

Now when we start the designing or developing the data flow diagram, we can generally show the entire application as a single process itself. This is the first step in preparing the data flow diagram. And such a diagram where the entire application is shown as a single process is called a context diagram. It identifies all the external interfaces of the application we are developing. So context diagram is a very important step and the focus here is not so much in the details of the process itself but its external interfaces. What are the external entities it is going to interact with? What are the outputs it will produce? What are the existing data stores that it might have to interface in terms of obtaining the data or updating that data? So this is usually the starting point and it’s also called fundamental system model or the Level 0 Data Flow Diagram.

So you do the data flow diagramming in steps by successively refining the different processes, by successively decomposing those processes and in this you add more and more details. But the starting point is always the context diagram in which the focus is on the external interfaces of the software.

Here is a simple example of a context diagram in which the whole software application that we are developing is shown as a single process or a single bubble. And we identify the users, the inputs and the outputs that the system either receives or produces. We also identify existing sources of data. These existing sources contain the data which is useful for our application but they exist outside. Now by showing it in the context diagram, we are clearly seeing that this data store will be assumed to be existing and it will not be part of our development and design effort. That is the boundary. We are defining clearly the boundary of the software that we want to develop. We also identify other external sources which may be necessary for interfacing our application with other applications. So these may be messages or they may be data stores which will be interfacing with external systems. So context diagram is a very important first step in preparing the data flow diagram.

After we have done the context diagram, we now decompose the process into its sub-processes. So here is the process decomposition now coming in the picture. When we do this, we replace the process by its constituents of processes. In this we may reveal additional data stores or additional external interfaces. S we are adding now more and more details. And we also develop some kind of a simple numbering system through which we can readily show the constituent processes of a process which we have decomposed. So generally we use the decimal numbering system. So if we are decomposing process one, then the sub-processes of that would be numbered as 1.1, 1.2, etc. exit. This is for ease of understanding the decomposition relationship between the processes.

ALSO READ: Louisa Williams Discusses Heart Disease and Biological Dentistry IABDM 2011 Carmel (Transcript)

At each level of decomposition we should complete the data flow diagram in its all respects. We must clearly understand the data which is flowing. We must know what exactly goes from one process to another process, or what goes from one data store to a process. This must be properly labeled. We must also label processes very meaningfully. In fact, we had earlier mentioned that processes are best named by a verb and object, and we have seen examples of this while talking about function decomposition. So the same kind of naming rules or guidelines should be used for labeling of these processors as well as the data stores and data flows. So all components which appear in a data flow diagram must be named meaningfully in order to convey the purpose and the meaning of the diagram.

We continue decomposition and add more and more details. So when do we stop? We stop when the processes have become quite well defined. They are not too complex now. They can be developed and understood, can be briefly described. And we also stop when the control flow starts surfacing. So if we are now – if we subsequent – decomposition if it is going to introduce looping or repeated execution, or it is going to introduce conditional execution, then naturally now the control flow has started to surface. And at this point we can stop the decomposition, because the data flow diagrams do not show flow of control. It’s assumed that the processes are executing and they are receiving data and they are producing outputs. So there is no flow of control that is shown explicitly in the data flow diagram.

So we refine processes until they are well understood, then the processes are not complex. All the important data stores have been now created. And we have identified what they need to contain. So once we have reached this level we say that the process refinement is now complete. So in this successive decomposition, we may go through multiple steps and at each step we would be creating a data flow diagram for the process which we are focusing on for the purpose of decomposition.

DFDs do not show flow of control. This is a very important thing we must remember. DFDs also will generally not show one-time kind of things, like initializations. We do not show processes which initialize or create files, or create databases. They instead show processes which are running in a steady state. So data flow diagram can be imagined in terms of processes which are continuously executing. As soon as they receive the data, they produce their output and hand over that to the next process or update a data store or some such action takes place. So we do not generally show the one-time kind of activities but show processes in their steady state.

DFDs show only some important exceptions or errors. These are shown in order to introduce some specific business requirements to deal with that. So for example, if the inventory has fallen below a certain level, this may be treated as exception which is associated with some business rule that some reordering has to be done because our inventory has fallen very low. So such exceptions would be shown. But otherwise routine type of errors are generally not shown in the data flow diagram. So for example, we will not show things like the airline number which is given by the customer is wrong, or the destination that he has given, no such city exists in our database. Now we assume that such errors will naturally be handled by our software but they are routine type of errors where data validity has to be done. These are not shown as a part of data flow diagram so that we concentrate on main functions and main processes rather than get distracted by routine type of exceptions.

Processes must be independent of each other. So again here we refer to our thumb rule that cohesion and coupling are the guidelines we always use for any decomposition. So when we define sub-processes we should ensure that the new sub-processes that we have created are cohesive in terms of what they do and there is minimum interdependence between them. In fact, the only way the processes or sub-processes interact with each other is through data. So work of a process should depend only on its inputs and not on the state of another process. So processes are independent in that sense and this is an important point we must observe when we are doing the refinement. Only needed data should be input to the process. Now this is again an obvious requirement that a process should receive inputs which it needs for producing the outputs which are the responsibility of that process.

As we do refinement we must also ensure consistency at different levels of refinement. Here is an example where on the top we show a data flow diagram in which process F1 has been defined as having input A and producing output B. Now this process F1 itself may be fairly complex and this process may be now decomposed into different sub-processes. Now this is a process one, so we are decomposing it into 1.1, 1.2, 1.3, and 1.4 as four sub-processes with this kind of relationships among them. So this decomposition here shows that a complex process such as F1 gets decomposed into four processes which have been now named as 1.1, 1.2 and so on to indicate that they are part of process one here.

And in this case, the consistency among the levels require that the inputs here should match with the inputs in this process. So inputs and outputs must match. On the other hand, new data stores may be shown. So for example, here in this level of data flow diagram, we did not show the data store but when we decompose F1, a new data store might surface because it needs to supply some history data or pass data to one of the processes. So important point in refinement is that there must be consistency among levels in terms of inputs and outputs on level 1 should be same as the inputs and outputs at level 2.

Physical DFD

A physical DFD indicates not only what needs to be done but it also shows how things are being done. So it shows some implementation details. Now these details will naturally depend on the environmental of the application. For example, you might show the names and locations of places where things are getting done, or how the data is actually stored. For example, the data may be stored in a library in terms of card indexes which are stacked in drawers. So this is a physical way but that will be shown in the physical data flow diagram. It’s how things are done at present. That is what you want to convey when you want to draw a physical data flow diagram.

You may also indicate the way the tasks are divided in terms of being assigned to different people. For example, two different persons may be dealing with undergraduate and post-graduate students. Now this is a present way of doing things. That’s why this may be shown in a physical data flow diagram. But when you analyze the physical data flow diagram in order to develop your application, you will notice that these are implementation details which are details about the existing scenarios. And you do not want to carry them further and bias your design and implementation subsequently. You would like to convert such a physical data flow diagram into a logical data flow diagram where such implementation details are filtered out.

This is, as we said earlier, useful for describing the existing system so that it can be readily validated with the users. This needs to be converted into a logical data flow diagram after we have validated and purpose of converting the logical data flow diagram is to remove these implementation biases so that we can now consider different ways of implementing what is required for the application.

One example of data flow diagram is to clearly show the boundaries of automation. So when you have a large data flow diagram like this, you can clearly mark the scope of the system that you propose to develop so that users clearly get the idea of what exactly they can expect from the software system, which functions and processes would be automated and how they would interface with the rest of the requirements or the rest of the environment of the user’s application. So boundaries can be conveniently marked on a data flow diagram.

Let us now take one example where we are addressing the payroll application. We will assume that we have already done the context diagram and we are now decomposing that first level of context diagram or the zero level DFD into the first level DFD where we have shown five sub-processes. We have numbered them as one up to five. We have identified employee as the external entity. And while doing the first level DFD we have identified a few data stores. If you look at the data stores and what data they contain, it will be clearly understood that such data would be required for a payroll application. We have a data store here which contains the data about employees. We have a data store here which gives details of taxation. So what tax rules are applicable. And here is a data store which contain the payments which have been made to the employees.

Timecard

So let’s now look at the sub-processes shown here. So we have employee who supplies data which indicates his working hours, or working days. We indicate that through the data called timecard. The timecard goes from employee to the validation process. The validation process would send this data to a process called calculate pay. The calculate pay may refer to tax tables and it produces the payment output. The payment output is sent to two processes: one which goes to process five whose responsibility is to print the paycheck. So check is printed and the details of payment are stored as well as the physical check is handed over to the employee.

The calculate pay process also sends the payment details to a process which is supposed to update the year-to-date kind of data. So the employee data here will also contain the records of all gross salaries which have been paid to the employee in the whole year. It will keep accumulating the payments as well as the deductions which we have made. So update YTD where YTD stands for year-to-date. These are the employee details relating to payments, which this process will update. So you can see the direction of this arrow. The arrow is going towards the data store. It means that new data is being added to the employee data.

Now the validate process, as you — if you look at it, it also, after getting the time data in the process of validation, it sends the employee ID to a process called get employee data. And this get employee data gets the employee data from here and the relevant data is sent to the validation process, so that the validation can be completed. And both the time data as well as the other useful data can be sent to the calculate pay process. So these are the four – five processes that we have shown here which do the payroll at some organization based on the various rules of the organization.

ALSO READ: Alain de Botton, Philosopher & Author on The Theory of Everything (Full Transcript)

We’ll take a few more examples. Here is an example which we had briefly mentioned here. It refers to a second-hand dealer who buys and sells old cars. So let’s first understand the application requirements. So purpose here is to assist the owner of this to buy and sell cars. He has fairly large number of these cars in the stock. So there are different types of models, make, the year of manufacturer and so on, all the details need to be kept. After the owner buys a car, he does some repair work so that he can get better value for the car. Records for all the repairs have to be kept. So this old car mart, it has its own garage where these repairs are done. Naturally the repairs have to be constant and the nature of changes made also have to be kept for future reference.

This owner also advertises periodically in newspapers so that he can attract customers for the cars that he has put on sale. And he has hired a few sales people on some commission basis who will handle the customers who visit the shop and who would negotiate a suitable price based on naturally the various factors. At what cost we had purchased the car? What repairs were made? How long it has been standing in the garage, or how long it has been waiting for sales so that you may have to appropriately decide the selling price for a car. Now all these things are done to some extent manually based on the guidelines that the owner would give to the salesmen.

What does – what do we need from our application? We need besides keeping all these data and helping us to advertise and paying commission to the salespeople and also helping salespeople to find out what kind price negotiation they should do, we also need to prepare some regular reports for the owner of the car mart so that he can get a good idea of what kind of profits he is making, what kind of payments he has to make. So all these details are important part of the application.

ER diagram

What we want to do is to prepare a data flow diagram and the ER diagram. So what would happen in developing applications like this is that after you have done the analysis and you have understood what is happening in the user’s application domain, you understand the processing, you understand the data, you would now convert these understandings into the models and you will prepare a suitable data flow diagram and a suitable ER diagram.

We always should keep in mind that these two diagrams are really complementing each other. ER diagrams show the data which is there in the application. It identifies this data in terms of entities and relationships. Now the same data that we see in the ER diagram should also naturally be seen in the data flow diagram in some form. Data flow diagrams actually indicate the data stores. So what we store in the data stores in a data flow diagram is the information domain or the data domain of the application and that is what should be modeled in the ER diagram. So the two models should be compatible in terms of data that they show. And this is an important part that we need to address as we prepare these two diagrams. Generally you would do them sequentially but you will also crosscheck each of them with the other.

So let’s look at the data flow diagram for the car mart. This ideally again we should have started with the context diagram. But let us go to the first level data flow diagram where we see the entities and we also see the data stores. And five processes are shown here. So let’s first look at the entities. Now these external entities are person who comes and sells the car to us from whom we are buying a car. Then we have a garage. Now garage can be treated as an external entity because the only thing that we need from garage is the data about repairs.

Then we have the salesman as external entity. We have a buyer who comes to the shop for buying a car. And we have newspaper as an entity to which we send new advertisements that we want to release. Now what are the processes here? The first process is concerned with buying old cars. So we – seller supplies the inputs about the car that we are buying from that person. The data about all the cars purchased are stored in the car data store. So car data store is almost the central data store in our application which contains data about all cars which are available for buying and selling.

The data about the seller is also stored here but this may be an important requirement, even the legal requirement so that we always know from whom we have bought the car. Then we have a process called do repairs. This is running periodically. It looks at the data of the car and decides on repairs to be done. The repairs – repair details are obtained from the garage and a detailed record of the repairs is also kept.

Then there is a process three which is an important process that decides on the prices at which the cars can be sold. It takes into account the purchase price. So purchase price should have been there in the car data store. The repair cost is obtained from here. Based on this, this process would decide the selling price. So as you go through the data flow diagram you will start getting good idea of what this data store should contain. So car data store should contain not only the car details but also the price at which it was purchased. The repair details should contain not only the different repairs that were carried out but the price or the costing of those repairs. So we take all these details into account and decide on the price.

Then there is a process four which periodically creates new advertisements. It has to refer to the cars which we have for sale. It has to look at the past advertisements so that you don’t advertise the same car again and again and keep advertising those brands which received lot of attention from the future buyers. So these advertisements are prepared and sent to the newspapers.

Then the process five which does the selling. Here it interacts with the salesman, buyer. Naturally it has to obtain the car data from the car data store. It may have to obtain the repair data in case the buyer wants to get the repair history. And after the deal has been made when the car has been purchased by the buyer at some negotiated price, the buyer details will also be shown in the –will also be stored in the buyer data store. Again this may be for future requirement or maybe even the legal requirement to know whom we have sold the car. And we may also update the car data store when the car has been sold so that now we can also keep in our data store the selling price. So these five processes together describe all the processing requirements of the car mart organization here. And you can see that these five sub-processes taken together carry out all the application requirements.

You can think of these five processes as running together and performing their tasks as and when they are required. So a diagram like this with proper naming of all the elements of that diagram conveys the entire processing scenario to the reader of the diagram.

Here as I said, as we go through the diagram, as we trace it, we understand the data stores well in terms of various details. In fact, if you use a good tool, you can use that tool to describe each data store that we had identified in the diagram. So the purpose of the data store, all the different data items it contains can be clearly defined. Besides defined with data store, you should also clearly mark the data flows indicating what type of data goes from one point in the data flow diagram to another point.

Second Level DFD

Some of the processes here you may want to refine further and decompose and draw a second level data flow diagram. Here is an example. The process five which we had shown in the previous data flow diagram, we might want to decompose it. So this process achieves a sale. It is making a sale of a car. How can we decompose this? What does it consist of? So we have listed here sub-processes which make up the process five. And using these sub-processes which are listed here you should be able to draw a decomposition of this as a data flow diagram of second level. So the constituents of process five would be the following: Take buyer requirements and other details about his name, address and so on and validate this. Then list cars which match the customer’s requirement. So he may want to purchase a Zen with the red color. So you should be able to make this query to your database and list all cars which are matching in their age, color, make, manufacturer and so on.

So out of these, then the customer will consider a few. So the second sub-process is the one which lists cars matching requirements. The third sub-process is showing the repair history and the car history. The fourth sub-process is registering the sale and the negotiated price. And the fifth process is computing the commission for the salesman. So these are the five sub-processes which make up the process five, which we had shown in the previous first level data flow diagram. I’ll leave it as a simple exercise for you to prepare the decomposition of this process into a second level data flow diagram where these processes would be shown as 5.1, 5.2, etc.

We also need to prepare the data model. Now we have completed this module and you should be able to understand the ER diagram which is shown here for the car mart example. So what are the entities here? We have car as the most important entity. We can also create various required attributes for this. From car we are having a specialization called sold car. Then we have advertisement as an entity. We have repairs as an entity, and we have customer as an entity, besides also having salesman as the entity.

Car and advertisement have a relationship called advertised cars, and it is shown here as one too many, which means that an advertisement may contain many cars. It also means that a car is advertised only once. Now this may be what the real world is in this case but it could also be many too many. That means a car may be advertised multiple times and a single advertisement may indicate many cars which are available on sale. Then a car may have repairs, so this could be – this would be one too many type of a relationship. Then a car is purchased from some customer, so we must remember from whom we have purchased this car. This is captured by the purchase relationship.

ALSO READ: Alibaba's Jack Ma Reflects On 12-Year Journey at China 2.0 Conference (Transcript)

And finally we have the sale relationship which shows customer, salesman and the sold car as being related for noting down not only to whom the car was old but also who participated in its negotiation and who needs to be paid commission. So this is the ER diagram where activities are not shown, only the data is shown. And here the entities are in a way related to the different data stores that we have in the data flow diagram. So you would now validate that this ER diagram actually defines the same content or the same information domain that was present in the case of the DFD, or the data flow diagram where data stores were shown containing the same data. Of course, there need not be a one-to-one match between the ER diagram and the data flow diagram. We need not, for example, show each one of them as a data store there. But essentially all the information that was implicitly indicated in the data flow diagram should be present in the ER model.

We’ll look at another example. In fact, I will stress here the importance of preparing the data flow diagram yourself. Data flow diagrams once prepared can be easily understood and can be used as a good learning exercise. But unless you do a few of them yourself, you will not understand the challenges of preparing good data flow diagrams.

So here we are taking an example of a book supplier. He supplies books to customers. In fact, he doesn’t keep any stocks. So as he receives the orders these orders are processed and they are sourced from different publishers and the orders are made. So here is some kind of an agent who receives orders from customers and he fulfills those by directly sourcing the books from the publishers. So we can start off by preparing a context diagram where the entire order processing is shown as a single process, and we identify two important entities here: the customer entity and the publisher entity. And we also identify the important inputs and outputs from these two entities.

We receive an order from a customer and we also send a shipping note to the customer when we have dispatched the books to him. So the shipping note will tell the customer that the books have been dispatched. And in future we will receive a payment also from the customer. As marked here, all inputs and outputs are not shown. So only the few are shown here but you can indicate all the other details.

Context diagram

Similarly when the publisher is shown here as an entity, we are showing some important outputs going to him. So the order processing will prepare a purchase order and send it to the publisher. Publisher would send us the books and along with the books we will also receive the shipment details, what books and in which quantities that have been sent to us. And naturally we will have to make a payment to the publisher. So the context diagram here identifies the important entities and important inputs and outputs from these entities. This is the context diagram.

Now we will decompose this in more details and in successive levels. This is the first refinement where we have decomposed the whole application into four processes. The customer and the publisher entities are shown here. Now we have created a few data stores also which will be required by these processes in order to carry out the processing. We must remember that lot of data needs to be stored in this application and therefore some data stores will have to be identified from one refinement to another refinement.

So let’s again understand this diagram in terms of what exactly is happening here. So we receive an order from the customer. Now the first thing would be to verity that we are accepting a proper order. Verification consists of checking the credit rating of this customer because we are supplying him against which we will receive payment in future. So we must be sure about the credit rating of this customer. So we keep a database of past customers. And we also keep their record that they have been paying regularly. So we give them some credit rating. So credit rating is obtained from the customer data store. The book details are obtained from the book data store, and we verify that yes, the order can be accepted. It is coming from customers who are having credit rating with us and it is for books we are dealing with. So we verify the order and we create a data store called pending orders. So pending orders will contain the orders which have been accepted.

We do not show the rejected orders because that is an exception which can always be incorporated in the required processing. Now these pending orders are periodically picked up by the assemble order process. This assemble order receives a batch of pending orders. It also receives the data from publisher’s data store. So these books which the customer wants to buy, we have to find out who are the publishers and for that publisher and for this bunch of orders, we will assemble the order for the publisher. So this purchase order is then sent to the publisher and the details of purchase order are also stored here.

The publisher will then send the shipment details to us. Those shipment details will be first verified against our own purchase order. So verify shipment is a process which validates the shipment notice that we received from the publisher against our own order that we had sent to him. After the shipment is processed, now we have a process which will assemble the consignments for the customer. We have their pending orders with us. We have now received the material. So we will now form a shipment to the customers. These shipments will be as per the shipping note. So this shipping note we will send it to the customer. It will, of course, also go to our internal dispatch section who will send the books also. So these are the four processes which carry out the order processing for the book agent application. Together they complete the processing. And we’ve also identified important data stores here.

Now we are –excluding the process two which was shown in the previous data flow diagram where we are preparing the purchase orders for the publishers, so how do — what are the sub-processes of process two? Those are shown here. We receive the pending orders and now first thing we do is we collect these orders by publisher. So we bunch stronger those orders which are coming or which can be made from the same publisher. For example, it could be Prentice Hall. So all orders which can be made through the publisher, they are bunched together, the same publisher. So we process it publisher by publisher.

Then we also have a process which calculate the total copies per title. It’s possible that the same title is ordered by many customers. So we bunch them together and get total copies. Then these are then sent to a process which actually prepares the purchase order. Now for preparing the purchase order, we need publisher data, the address and the payment terms and so on. So those data are obtained from the publisher and we prepare the purchase order. That purchase order goes to the publisher.

The same data is also sent to a process called store PO details. And this process creates the purchase order data store. This data store will contain all the purchase orders which are currently under processing. And finally, we have a process which updates the stocks so that now that the pending order has been already processed, we flag it off here saying that we have placed orders for that particular pending order. So this is how you create a level two data flow diagram for process two which we have shown in the first data flow diagram.

You can do some more refinements on this example. For example, one extension we can do is in receiving payments from the customers. The customers will pay against the shipping notes that we are sending them. So when we deliver the books we receive the custom payments from the customer. So we will have to create a data store called account receivable. And account receivable basically would indicate what money is supposed to be received from those customers. When we actually receive the payment, the account receivable will have to be updated. And periodically we will have to do evaluate the credit rating of the customers again because they may not be sending payments in time and so on, or there may be defaults. So the payment from customer is a fairly complex process by itself requiring us to maintain the account receivable data store.

Remember that payments may be received by a single check or by a multiple check. Similarly we may have to extend the previous data flow diagram for payments that we need to make to the publishers. So when we raise a payment order and we receive a shipping note from the customer, we will also receive his invoice. And that invoice will indicate that we have to make some payments, we will create accounts payable data store. And we will be checking invoices with our purchase orders and then we will make payments based on some kind of payment terms that we may have with the publisher.

So if we pay within fixed time we may even get incentives for early payment. So depending on our cash flow position and so on, we might release these payments to the publishers. So we need to extend that previous data flow diagram for this additional functionality and we will leave this as exercise for you.

Let us now conclude the process modeling part. In the process modeling, one of the most important issue is to decompose complex processes into sub-processes. Data flow diagrams are very popular tools for this. They show the data flows, data stores, processes and so on but they do not show the control flow. Proper naming is very important and we have emphasized this both for ER modeling, function decomposition diagramming and also the data flow diagrams. You must name the data stores, processes very meaningfully and indicate all the important data that flows from one data store to a process or to — from process to an external entity. All of these should be readable and understandable.

Lecture – 12 Data Modeling – ER Diagrams, Mapping -Transcript