Professor NL Sarda Discusses Data Modeling – DFD, Function Decomp Part 2 (Transcript)

September 15, 2013 4:05 pm | By More

Let us continue with the data modeling topic. So far we have seen the basic features of ER model, where to use it, what advantages it offers. We studied the concept of entity attribute and the key attributes. We also looked at the relationship concept. And then we looked at a simple diagramming notation through which you can show some entities, some relationships among them. This is what we call an instance diagram. And it is always good to draw an instance diagram to illustrate the understanding, or to illustrate the concepts we are modeling in our diagram. So let us proceed further.

We were talking about relationship cardinality as an additional constraint on a relationship. It tries to characterize the relationship more precisely by specifying how many entities of one set may relate with another entity in another set as a part of the relationship. So primarily, by cardinality, we are indicating how many entities in terms of one or more are related to entities from another set in forming a relationship. This is a very useful concept for binary relationships where two entity types or two entity sets are related.

If you consider a relationship named R between two entity sets A and B, then what are the possible cardinalities that we can indicate? The first one is of type one-to-one. It means that one entity in A is associated with at most one entity in B. So here it’s a one-to-one relationship. One-to-many relationship is where one entity in A may be associated with zero or more entities in B. So when we go from A to B, we may meet many entities from B which are associated or connected through the relationship with the given entity in A. So one-to-many is also an indication that if we start traversing or navigating in the database, when we start from A we may reach multiple instances of entities in B. This is characterized as one-to-many relationship. Obviously in a given relationship like this, if you now go from B to A, then one entity in B you would reach at most one entity in A. So one-to-many and many-to-one are really the inverse of each other.

Then finally, you have the many-to-many relationship. In the many-to-many relationship, one entity in A may be associated with more than one entities in B and also the reverse. So here one entity in A might be reachable from many entities in B. And similarly, an entity in B may be reachable from many entities in A. So this is a many-to-many example of relationship cardinality.

Let us look at some examples. Take the example called teachers. The teachers relationship exists from entity teacher and course. So we have a teacher entity. We have a course entity. Between them we have the teachers relationship. Now what is the cardinality of the teachers relationship? Here we are saying that it is one-to-many. That is, one teacher may teach many courses. But if you go from course to teacher, then it would be for a given course, there is one teacher. So one-to-many indicates that a teacher may teach many courses whereas the course is always taught by one teacher. So obviously as you can see here the relationship is giving you more information. It tells you about how many entities are relating to each other in a given environment.

Now the teachers relationship is from teacher to course we say, that is how that we understand it from the readability point of view. Now the inverse of teachers relationship, that means when you want to go from course to teacher, I mean you may prefer to name it as taught by. But in ER diagrams that we will draw, a relationship carries only one name, whether you go from one entity to another or from the other entity to the first one. So the relationships will have the same name irrespective of the direction you are traversing the relationship.

The other example is the study relationship. The study relationship exists between student and course. So what is the cardinality of this binary relationship? A little thought will tell us that it is a many-to-many relationship. What it means is that one student may be studying many courses and one course may be studied by many students. So again if you draw an instance diagram and you show a few courses and a few students, then if you start from one student, you would reach on the other end multiple courses. Similarly, if you start from one course, you may find many students who are studying that course. So this is a many-to-many relationship. So we can characterize relationships further by indicating the cardinality and it is a very useful concept and you must specify this at least for the binary relationships that you define in your data model.

The next concept is that of existence dependence. Again this is a constraint. We specify this in order to indicate what is permissible in our data model. So it’s an important constraint. What it specifies is that existence of an entity A may depend on the existence of another entity B. That means that although they are entities in their own right, they are distinguishable from each other. So the entity still has the same definition that we gave earlier that they are unique and they are distinguishable. But in our database, in the real world, we are saying that entity A must exist in association of another entity B. This is called existence dependence.

In this case, B is called dominant entity and A is called subordinate entity. An entity which depend on another entity for its existence is naturally called a subordinate entity. Now in this case, both the entities are proper entities. They will have their own attributes. They will have their own keys. But still when they are created, when they exist in the real world, they exist in association with another entity. So a simple example would be the dependency between teacher and department. Now teacher and department are two entities and they have a relationship between them which we may call as employee. So we say teacher is a employee in some department, or teacher is employed in some department.

Now whenever a new teacher is appointed, the teacher has to be appointed in a particular department. What it means is that teacher cannot exist without being associated with some department. So teacher depends on department. Teacher cannot exist without being associated with some department. So there is existence dependence between these two entities where the teacher is a subordinate entity and department is a main entity. Again I’m emphasizing this that the subordinate entities do have their own key and they can participate in additional relationships. So they are entities in their own right. We are only expressing the real world requirement that entity like this must exist in association with another entity.

Let us now look at the diagramming notation for ER model and look at some more examples. Here are the diagramming notation symbols we will be using. An entity is represented by a rectangle and we will write the name of the entity within this rectangle. A relationship is represented by a diamond. These are basically notations for entity set and the relationship set. An attribute is written down in a small circle which is generally connected either to an entity or to a relationship. Because attributes don’t exist independently, they are associated with entity or relationship. And the key attribute may be underlined.

Here are some more notations. The cardinality of the relationship is indicated by writing a number. Usually we write one or many, or N, also for many. So we may say the cardinality is many-to-many in which case we may write a letter N here and a letter M here or we may write the letter M at both the places. So this is how the cardinality for binary relationships are shown. Now these two symbols are used when we encounter weak entities. So we will see their usage in more details subsequently. We have not yet seen those concepts.

A multivalued attribute is shown by a double circle. So recall that we had taken example of such an attribute like author of a book. So author will be written in a double circle and would be connected to the book entity to show that this is a multivalued attribute of the book entity. So these are the diagramming notations. There is no universal standard for these notations, but these are the symbols which are very commonly used.

Let’s look at this simple example. We have shown here an entity called student. This entity has two attributes: roll number and name. So we indicate entities and we associate attributes in this fashion. And we may also say that roll number is the key for the student entity. Another entity in this example is the course entity which has again two attributes: the course number and the title. And the course number may be the key for this entity. These two entities are related through the enroll relationship. So student enrolls in a course, right. And this enroll relationship has the cardinality many-to-many.

So a diagram like this you can see is simpler to read and it’s also very precise and it represents the real world by defining appropriate constraints. So here, for example, we have identified a binary relationship called enroll, and we have also defined the cardinality. We have identified the keys. We have defined the entities. So this diagram is very precise. It is unambiguous and it can be read by many people with the same understanding. There is no different interpretation of a diagram like this. So this is the advantage of ER model that it represents the real world unambiguously and very precisely. The other advantage is that once you have prepared an ER diagram and if you’ve named the entities and attributes where anybody should be able to read and understand the diagram.

Let’s look at some more examples. In this example, we have a few entities and relationships. It’s best to read the diagram and first understand what are the entities here. So we have a course entity, book entity and the publisher entity. From the names of these entities we have a clear understanding. We have a good understanding of what they could be implying and what this modeling could be for. So obviously we are talking here about books published by different publishers and which are prescribed as texts for various courses. So you can see here that the connection between the entities that we have here is that through two relationships: the text relationship and the published by relationship. So we have three entities and two relationships among them.

Obviously this diagram is not complete. You need to add a few attributes here which are relevant to your application. Then you need to characterize the relationships further. What are the cardinalities? So consider the relationship published by between book and publisher. So what should be the cardinality? So what should we indicate on the book side and what should we indicate on the publisher side? So if I go from the book towards publisher, how many publishers am I likely to encounter? Usually all books have one publisher. So I would generally write that a publisher – I’ll put 1 here and I’ll say that if I go from book, I am likely to meet only one publisher. But if I go from publisher towards book, naturally a publisher publishes many books. So I may put here many to indicate that there are more books published by a single publisher.

This is our understanding of the real world. Of course, in the real world, it’s possible that a book may be published by more than one publisher and they may sell it in different market segments. Now in that case, this characterization of the published by relationship may not be appropriate. And we may have to make it many-to-many, which what it really means is that the model has to clearly reflect the real world. And ER model provides you enough facilities by which the real world can be correctly captured. So we will leave this as an exercise for you to complete this diagram.

Another diagram here from our familiar university environment, it shows two entities: student and course. And it shows two different relationships between the same two entities. This is possible. We have a student who study different courses, and we have students who assist in different courses. So, this study and assist are entirely independent relationships. Such multiple relationships may exist among the same set of entities. And the attribute grade is an attribute of the study relationship.

Here is an interesting aspect of the modeling where we are showing self-relationships. We have a course. Now a course may have some prerequisite courses, which means that the course is actually related to other courses which may act as prerequisites for a given course. This is a self-relationship. What we are saying then is that a course is related with itself through a relationship called prerequisite. This relationship is also a binary relationship. So what are the cardinalities here? A course may have more than one prerequisite. So for example, a database course may have data structures as a prerequisite. It may also have one more prerequisite, say programming languages. So a course may have more than one prerequisite.

Similarly, a course may be prerequisite for more than one courses. So in both the direction it is many-to-many. So self-relationships can be shown in the ER diagram and we can even have some attributes associated with such relationships. For example, here what should be the minimum grade a student should get in a prerequisite course so that he can study a follow-up course? So this can be captured by the attribute minimum grade associated with the prerequisite relationship.

Now let’s look at little more complex diagrams which are giving a more larger scenario and which are modeling entities and relationships among – or identified in that particular scenario. So the point that we made earlier, given a diagram like this, you should all be able to read and understand in exactly the same way as I would read and understand them.

So let’s try to understand this. We can first look at entities showing here and try to get a good idea of what they may represent in the real world. There is a student entity which has some well-known attributes, roll number, name, date of birth and the hotel number. Then we have a course relationship, which is identified by a course number, has a title and has credits associated. Then we have teacher entity and we have department entity. So we have four entities here.

Now having understood the entities and also the attributes that they have, let us look at one relationship at a time. So we see here the relationship, which says the department employees teachers. It is again obvious to understand, because the name employees conveys what the connection between the two entities is. Now here we have a ternary relationship. This ternary relationship says that student study courses under teachers. So a study relationship associates a teacher and a course and a student. And it has attributes, grade and the semester. What it would mean is that a particular student maybe studying a particular course under a particular teacher in a given semester and he would get a particular grade. So we are able to capture the study aspect of the real world by a ternary relationship between student, teacher and course.

What would be the key of this relationship? We had said earlier that the key of the relationship is the combination or concatenation of the keys of the participating entities. So the key of study would be roll number from student, course number from course and employee number from teacher. So the three of these attributes together would form the study relationship’s key. So, this allows us to model a real world where same course may be run in the same semester by different teachers.

When we model this, we have to be very careful whether this model takes care of the real world requirements. Just now we saw that it takes care of one important requirement where I may run different divisions of the same course with different teachers.

But let’s take another example. Does it allow you to represent a case when a student fails in a course and he studies the same course again maybe with the same teacher, but in a different semester? Now this is possible. It happens. A student might fail and may get a grade, let’s say an F grade, failed grade. And then he has to register for the same course again and the next time he may have the same teacher. Now does this model allow you to capture this kind of a situation? I leave this as a exercise to be thought over. We will come back to this question little later in the context of another example that I will give you.

This is another useful example and again you should be able to read and understand this diagram. I’ll just quickly tell you about the entities which are shown here. I have the supplier entity and I have the part entity. And naturally the supplier supply these parts. This is a standard example where we need some parts and we identify some suppliers and we know who supplies what parts. So we may receive some parts from a given supplier in a given quantity and on a particular date. So let’s say supplier S1 may supply to us a part called P1 in quantity 100. So there is example where we buy or we receive parts from different suppliers.

Then we have different projects, and we have warehouses. The parts are stored in different warehouses and projects need these parts. And these parts may be supplied to projects from different warehouses. So it’s possible that a part P1 is given to project from a warehouse W1. The same project may get the same part from another warehouse also. So this is shown as a ternary relationship between part, warehouse and project. Now it has a specific implication, and it is trying to model the real world. Our ideal in modeling should be to capture the real world for faithfully as possible. It should be correct. It should represent the real world correctly.

So the best way to verify whether our model is correct or not is to take some examples and see whether those examples are clearly possible with respect to the model that we have put here. And then we have employees who are working in the projects. And one of the employees may act as in-charge of the project. So this is the ER model for a company which might be managing projects and which needs certain parts, which we are supplied by different suppliers.

Now we’ll again look at some specific issues in this model and check whether it does what I need in my real world, whether we have done it correctly or not. So let’s concentrate on the part on the left hand side. And we’ll discuss here some issues and see whether those are properly handled here. Now we say that the supplier may supply part. What is the cardinality of this relationship? It’s obviously many-to-many as a supplier can supply many parts and a part may be supplied by many suppliers. So we can call it as a many-to-many relationship.

Now in this case, let us ask the question. Can the same supplier supply the same part twice or three times maybe on different dates and in different quantities? Now this is something which is quite possible in the real world. We may find that a particular supplier is best suited for a particular part and we may repeat orders with him. So we say that you supplied some part P1 earlier. We would like to place more orders for the same part. So can the same supplier supply the same part again to us? Is this covered by the model here?

Now again to answer this question you must clearly understand what this relationship means and what are the keys of that relationship? When we say that the relationship supply, is it identified by the key of supplier and by the key of the part, it means that I cannot have a situation — I cannot have another instance of the relationship supplies with the same key. That means the same supplier and the same part. What it means is that this model therefore cannot represent a multiple supply from the same supplier for the same part. Now this may be not appropriate. This may not be whatever real world needs or what it has routinely in the real world. So how to change this? We need to modify. When we discuss this with the user, user would indicate that this part of his requirement or this part of the real world is not captured by the model correctly.

So how should we handle this? How should we modify it? In fact, in a way this problem is related with the problem that we discussed in the previous diagram where we said, can the student repeat the same course with the same teacher, possibly in another semester? Now once we try to understand the problem we will be able to come up with a solution. And in both the cases, you see here that the two models that we have are not handling the situation that we want that a supplier can supply the same part again.

So how should we model? What is the problem here? Basically, we want to distinguish between the different supplies made by the supplier. So supplier may supply parts. Each one of them is a shipment by itself and what we need to do is distinguish the shipment on its own rather than distinguish it through the primary keys of the entities which are participating. What it means is that now I need to distinguish the instances of the relationships on their own and therefore then it cannot be modeled as a relationship. We need to think of this in a different way which will allow us to distinguish one shipment from another shipment irrespective of what it contains and where it has come from.

So we need to change this part of the diagram so that it captures this requirement. And I have already given a hint that we need to create an additional entity, because entities are distinguishable in their own right and by creating a separate entity, we should be able to address this. Here is the solution we can propose for the part which we did not find satisfactory. As you see here, we have created an entity called shipment and this entity shipment has its own key called shipment number. Now with this key we can distinguish instances of shipment from each other.

How is this shipment related to parts and supplier? I have now two relationships. I said the shipment consists of parts and shipments are obtained from different suppliers. So shipment is now having the relationships connecting to part and connecting to the supplier. What should be the cardinalities here? Obviously a shipment can consist of multiple parts and each part can have a quantity associated.

Similarly, a shipment comes from a supplier but we are assuming that it always comes from a single supplier. So if I go from shipment to supplier, the cardinality should tell me one. But what if I go from supplier to shipment, is one correct here? In fact, it is not. Supplier can have multiple shipments. So ideally I should make it as a relationship of type many on the side of shipment. So supplier to shipment is of one to many type but from shipment to supplier it is many to one. So we can correct this situation and now our model is quite precise, and it allows us to handle multiple shipments irrespective of what they contain and who they come from.

Let us now look at ternary relationships in some more detail. Although in the previous examples, we have already encountered them. We need to ensure that we are modeling the real world correctly. And when trying to do that, some relationships may appear to be binary, a few may appear to be ternary, or even more complex, they may be four-way or five-way and so on. Ternary or higher-order relationships are harder to understand. And you must verify that the given situation demands a ternary relationship.

So for example, we had taken the example of study earlier, and we had justified that the study relationship should be a ternary relationship involving teacher, student and course, because courses are taught in divisions concurrently. So here the ternary relationship is justified.

Ternary relationship is not same as two-binary relationships. In fact, this is an important point. And therefore it has a implication on modeling. You must capture ternary relationship correctly and we will see this through a small example. Here is the example which shows a ternary relationship between parts, supplier and project. And let’s assume that this reflects my real world correctly. That means that parts are supplied to projects from different suppliers. So we have shown it as a ternary relationship.

Now can I show the same using two-binary relationships, one between part and project and the other between part and supplier as has been done in the diagram B here where we are trying to show two-binaries, one between part and project and one between part and supplier? In fact, now to verify this you again should try to walk through the diagram. So you say let me take one project. From this project, I can go by this relationship and I’ll find out all the arts which have been supplied or which have been received for this project. So I can go to part say P1, P2, P3. Now once I have reached here P1, P2, P3, I’d naturally forget from where I have come because I am now only at the part instances. So now I take a P1 and this P1, if I now follow through I can find out who are the suppliers. All the suppliers who have supplied this part P1.

But now because I have broken it into two parts, I cannot say that this is the supplier who has supplied for the given project, because if I now start from that supplier I go to many project parts, I take on part, and I come here and I reach many projects. So in general, I will not be able to now tie down which part was supplied in what quantity for which project. So if we are now talking about the same real world which was shown earlier as a ternary and now it has been shown as two-binary, then it is not the same. One of this has to be incorrect. So if the ternary relationship is the correct one, then replacing it by two-binary is a modeling error and it’s not the same. And this is the case in whatever you may do, you may add additional ternary – additional binary relationship even between project and supplier. But it will not capture the main point that we are making that I want to know what was supplied in what quantity, to which project by which supplier.

Now that is properly answered by the model where we used the ternary relationship. But it is not properly handled by the two-binaries. So two-binaries are not same as one ternary relationship and it’s very important to understand the real world thoroughly well maybe taking a few examples and decide whether a correct model is a ternary relationship or a set of binary relationships.

Let’s now go to the next concept – the weak entities. There are times when you encounter entities which do not appear to have a primary key on their own. Such entities are called weak entities and they are always related to some other entities which are normal entities which we may call as strong entities which have their own primary key. So these entities which do not seem to have primary key attribute on their own seem to exist in the context of some other strong entity. We will see some examples. So such weak entities may be distinguished in the context of a strong entity but not on their own. This is a situation where we have to use the concept of weak entity.

Here are a few examples. The first one is about the branches of a bank. Now let’s consider a bank like State Bank. State Bank is a strong entity because it has a name State Bank and no other bank can have a same name. So you may have State Bank, you may have Canara Bank. These are the names of the banks which distinguish one from the other. So bank is a strong entity.

But what about branches? Banks have branches. Now how are branches distinguished? Most banks give some serial number or name to the branches. So State Bank may have a branch whose name is IIT Powai branch. Similarly Canara Bank may also have a branch whose name is IIT Powai branch. So branches on their own do not have unique names. Here now we have both branches called IIT Powai branches but they belong to different banks. So this is not — branch name does not appear to be the primary key for branches.

Banks may otherwise use some serial numbers and these serial numbers may distinguish branches from each other in the context of a bank. So we can talk of branch one of State Bank, branch one of Canara Bank. So these serial numbers would also be in the context of the bank. Therefore branch appears to be weak entity. A bank may have many branches. For example, State Bank may have more than 10,000 branches. So I need to treat branch as a entity on its own but it appears to be a weak entity. Why should I treat branch as a entity? Because people open accounts in branches, not in bank. So you may have an account at IIT Powai branch or you may also have account at another branch of the same bank. So usually customers deal with branches. So I would like to represent branch as an entity but it appears to be weak entity. It appears to exist in the context of a bank.

Another interesting example is the example where candidates appear for interviews with the different companies. And interviews are important in their own right. But how do I distinguish one interview from another interview? It appears to be weak entity. It depends on the candidate and the company. Why do I want to distinguish an interview from another interview? Because all interviews may not be successful and only some interviews might be related to job offers. Other interviews may not be related to job offers.

So if I want to capture this offer entity and relate it to interviews, I must represent interview as a entity. Ultimately relationship exists only between entities. So here you would see that I need to capture interview, show it as entity, but it seems to be a weak entity. So, we’ll try to now show you how weak entity can we represent in the diagram and we will take both of these examples.

Here is the bank entity. It has bank name and this could be the key and it may have some other attributes. Similarly, we have another entity called customer. These are all normal entities. They are strong entities and they have their own key. So customers have, let’s say, customer name which may be unique. Let’s for the time being assume that customer name is unique. So these two are strong entities. Now we need to introduce branch as entity. Branch as we said earlier is a weak entity. So we show it by a double rectangle and the fact that branch is a weak entity and depends on bank is shown by a double diamond connecting the strong entity and the weak entity. So this defines the context. It’s branch exists in the context of bank. Although branches can be distinguished among themselves but they can be distinguished only within the context of a bank. And you may have branch name as attribute which can be used for making the distinction.

Now this branch can have a relationship called has accounts. So a customer has accounts with the branch. So a branch may have many customers. A customer may have account with many branches. Now he may have account with say five branches. Now these five branches which banks they are. So we have to actually go by this relationship. So out of this five, two may be with State Bank and another three may be with Canara Bank and so on. So that is captured by going from customer to branch and from branch to the bank. So here branch is a weak entity.

This is the example where we had branch name as a partial key. It’s called the partial key, or also called the discriminatory attribute. A weak entity such as a branch that we saw can participate in further relationships and we had that example of having accounts with branches. A weak entity can also have weak entities depending on that. So there can be a chain of such weak entities and you can see examples of these. For example, a large organization may have departments. So department is a weak entity. Departments may further contain sections. So section is a weak entity. So a section one of department one in organization A.

Similarly, organization B may have department one. Now within department one, I may have section one. Another department two may also have section one. So section also is a weak entity and you can have a chain of such weak entities and they would be existing in the context of some strong entity somewhere.

What is the primary key of weak entity? Primary key of weak entity is a combination of the primary key of the strong entity on which it depends plus the discriminating attribute. So, the key for branch – a primary key for branch would therefore be consisting of branch name and the name of the bank. These together are unique and this is how we define the primary key for the weak entity.

Here is the other example which we have taken from this well-known book on data modeling and databases. Here you have the weak entity called interview which depends on the candidate and the company, both. So here is a weak entity which depends on two primary entities. And some of these interviews may result in a entity job offer. So job offer may be an entity in its own right. You may have some attributes with it and there may be a key attribute. Assume that we have defined all of them, because any time we draw a strong entity, you must clearly conceptualize it. You must always define attributes and you should always answer this question that it must have its own primary key, because one job offer must be distinguishable from another offer. So for example, the position may be the attribute which distinguishes one job offer from another job offer.

Now in this particular diagram, can we represent the same in different way and will it convey the same meaning? For example, can I have a ternary relationship between candidate, company and job offer? So replace all of this by a single ternary relationship. That obviously would seem to be wrong, because every interview doesn’t result in a job offer. So in that case, we’ll not be able to capture those interviews which did not result in a job offer. So it will not be capturing it correctly. Therefore we cannot replace this by a ternary relationship.

You can’t have a binary relationship between candidate and company because in that case, how do we relate that binary relationship with the job offer? So considering all this, you will find that this is a correct model for the situation we were describing.

We have seen the basic concept so far. We have talked about the most important concepts of entity and relationship. These are the core concepts. And then we had other related concepts like attributes, cardinality, and so on, and the key attribute.

This ER model has been further extended to capture more meaning. Primarily the new concepts which have been added are the concepts which are somewhat related to the concepts from object-oriented models. These concepts are the concepts of generalization and aggregation. And they also introduce the concept of subset hierarchies. So we’ll see that these concepts give you more flexibility in creating the data model for a given situation and they capture the semantics much more comprehensively. And the kind of new concept that we will see here are the concepts of inheritance, which are the concepts in object-oriented data model and notion of a composite object or a complex object where an object may contain other objects.

So we will study the extended ER model subsequently and see how these concepts facilitate the data modeling exercise.

 

Category: Events & Presentations

Comments are closed.