Professor NL Sarda Discusses Data Modeling - DFD, Function Decomp (Transcript)

In this module we will talk about data modeling. You will recall that when we introduced software engineering and software development methodology, we were talking about requirement analysis where we mentioned that we perform both the data modeling and the process modeling as we go through the requirement analysis as we understand the application environment of the user.

So let us start by asking again why we model. We build models of complex systems, because it’s difficult to understand any such system in its completeness at one shot. We have to therefore build a model and try to understand it in terms of its complements and how those complements relate with each other. So one of the important reasons for modeling of complex systems is to improve our understanding of such system because we cannot understand them entirely in one single instance.

We need to develop a common understanding of the problem so that we can proceed toward the solution. This common understanding is between all the people involved and also the users for whom we are trying to propose a solution. We cannot afford a trial and error approach. In fact, a model will clearly establish that we are proceeding along the correct direction and that our understanding of the user’s environment is correct. And this will be reflected in the model. This will remove the trial and error kind of approach. And it will also reduce the risk in the overall development.

A model is also extremely useful to communicate the required structure and behavior of our system. We try to capture that in the model and then put it in the form which can be understood and which can be verified by others. So these are the reasons why we model.

Let us see how we model. We choose an appropriate modeling concept or an appropriate modeling paradigm. This should be such that our solution can be properly expressed. So this choice of the right model is extremely important, and it has considerable influence on shaping the solution we propose for the problem. So we chose a model for the kind of purpose we have at hand. This may be modeling of data, or this may be modeling of the processing defined in a given application.

No single model is sufficient. In fact, this is an important point that in most analysis phases, we try to build different types of models which represent the different perspectives of the same environment or the same application. It is important to approach a complex system from different points of view which might be best represented by using different modeling techniques. So a single model may not be sufficient.

In fact, we have been talking about two independent models already. One is for the modeling of data and one is for the modeling of processing. Even in the object-oriented model that we had mentioned earlier, different perspectives are taken. One could be defining the object model which is static kind of a model which reflects the different objects which are present in the user’s environment. And then there is a dynamic model which defines the interactions among these objects. So we do take different perspectives and we try to use an appropriate modeling concept or appropriate modeling paradigm to represent these perspectives.

The best models are connected to reality. In fact, the purpose of the model is to abstract important aspects of the reality and represent them very clearly. So naturally they must meet the requirements as we want to analyze for a given real world situation. So these are the different issues that we must keep in mind when we define our modeling exercise. What model should we choose?

So in this particular module, we are talking about data modeling. We will define the notion of data modeling. We are going to build these models in terms of the important concepts of entity and relationship. In fact, the model that we are going to discuss in detail is the entity relationship model or the ER model in short. We will look at the diagramming concepts which are available in this modeling technique. We will talk about other related concepts of keys, weak entities. Then we will also talk about extensions to the ER model. So these are the different topics we’ll cover in this particular module.

Let’s begin by seeing the purpose. The purpose of the data model is to represent the operational data in the real world. These operational data describe the various events, entities and activities which take place in the business environment for which we are proposing the solution. So remember that we are trying to represent the operational data and there may be a lot of these data which describe different entities, different activities which happen in the business environment, different types of events which take place. So all these data need to be captured in solving that application problem. So objective of data model would be to represent this operational data.

The model may be described at various levels. The model may be at the logical level or a physical level. Physical level naturally will address not only what data we have but how that data is stored and retrieved and updated and things like that. So this would be the day in which the data would be actually handed.

Very often we first try to understand the data at the logical level. The model may also be at external level or conceptual level or internal level. What we really mean here is that when we say model is at external level, it might define the model as seen by a particular user who is the user of the application. Naturally his view of the data may be a subset of the overall data content in the application whereas the conceptual model represents the data in its totality at a level which represents the important concepts in the application.

Internal data model actually is more of a physical representation of the data. Data may be stored in terms of files and so on.

So this would be an internal model. And internal model generally would take into account efficient processing of data whereas the conceptual model purely concentrates on the concepts and how those concepts are interrelated without being concerned about the efficiency issues. So we may model the data at various levels.

In this particular module, we are going to focus on the conceptual data model. In fact, this is the beginning where we try to understand a given application domain in terms of all the important concepts and try to understand those and try to interrelate them. Subsequently when we proceed in the design, we will try to consider the physical representation of this conceptual model, the internal representation, how to make user processing more efficient and also how to restrict the views of different users so that they access only that data which is relevant to them. So these external, internal or physical aspects of the model are generally worked out when we proceed into the design and implementation.

Our first objective in the development of the solution is to build a conceptual data model. A good data model should meet some important requirements. And these are quite straightforward. A good model naturally should be easy to understand, because we will expect users to validate the kind of model we are trying to build to represent his application domain. Users can understand it if it is a simple model, it has only a few concepts and it can be specified in top-down fashion so that you don’t give all the details together but you give details in step-by-step fashion.

In fact, the top-down specification is very important in any modeling because we cannot comprehend a complex system at one point. We have to see it at different levels of details and that is permitted by top-down specification. So these are the characteristics of a good model.

Let’s also define what we mean by a model. What does a model consist of? A model consists of a few important concepts. It will also give us some form by which those concepts can be represented. We call this constructs. So construct is a representation for a concept. But model primarily offers a set of concepts and it also offers a few operations on those concepts. So this is what a model is made up of.

Another requirement for a model is that it should capture the meaning of the data. Naturally we are trying to understand the real world. We are trying to understand the data which is present in that real world. And then we are going to put it in the form of a suitable conceptual model. This model should capture the real world meaning of the data. Otherwise it will be difficult to interpret the data. So we have a notion of what we call data semantics. Data semantics is concerned with the meaning of the data. How we do convey the meaning of data? And when we prepare a data model not only it should give us some techniques for representing the data but it should also give us a way of conveying the meaning of the data. So both of these, the concepts and the meanings, what we generally call as syntax and semantics, would be important aspects of a data model.

Now how do we capture semantics? How do we convey the meaning of the data? There may be different ways of doing that and we will see that in the ER model the different methods are provided by which we can convey the meaning of the data which we are modeling. Some of the ways that you will encounter are given here. For example, the proper naming of the data is very important. The name of the data itself conveys a lot of meaning. So if I tell you that an item has a price. I use the word “price” for obviously indicating at what cost you can purchase. So the word “price” has a specific meaning, and by choosing the right word for the right aspect of data, I’m conveying that meaning.

So in fact, the most important aspect of conveying the meaning of data is proper naming. And this is what we must keep in mind when we are building a data model that the different types of data that we will define or the different representations that we will choose are all named very meaningfully and also named in the context of the business environment which we are modeling so that anybody which has a familiarity with that business environment will quickly understand what data we are talking about and what role it has in the real world.

ALSO READ: Oprah Winfrey's 2008 Stanford Commencement Address (Transcript)

Then of course, there are many other constraints that we can define on the data, basically constraints capture the meaning, what are the permitted values for the data? Is the data unique? Do they have some interdependencies? And many other restrictions that we may impose, all these restrictions and other constraints and proper meaning, they are very important part of conveying the meaning of the data. In any model we’ll have to provide such facilities by which the meaning is captured. And as we go along, we will see that the models, such as ER model, provide facilities for these different types of expressions through which the semantics can be put down as a part of the model.

So let’s talk about the entity relationship model in details now. ER model has a few concepts. It’s simple and easy to use. It permits to some extent a systematic top-down approach so that various details can be controlled and you need not put down everything at one piece. At one point you can convey it in a step-by-step fashion. And because of these reasons, it’s an excellent tool for communication with the users. So that as we collect our requirements and we prepare the model, we can verify our model with the user so that we know our conceptual design is correct. Our understanding of data is correct.

For all these reasons, the ER model is known as a conceptual data model. It does not describe how the data will be stored or represented. It only talks about what kind of data concepts are there and how they are interrelated. And it provides a powerful diagramming notation through which the model can be represented and this diagramming notation and the simple concepts of entity and relationship which are very entity concepts. This is what makes the ER model very simple tool for communication with the users. So we will try to understand that now the different concepts that are present in the model.

Let us begin by looking at the first concept, the concept of an entity. Entity simply means an object that exists. This is an object which exists in the real world. An object must be distinguishable from other object. And finally, the object may be something physical, it may be a concrete entity or it may be an abstract entity. So it doesn’t have to have a physical dimension.

Here are a few examples. For example, this course on software engineering, this weekend treat is an entity. It is an abstract entity. It exists as a course. It is indistinguishable from other courses. So it satisfies all the three characteristics of an entity. Therefore we say that this course on software engineering is an object. We can think of it as an entity. Another example is Ganesh. Ganesh is a student. Now Ganesh is an object. It exists in the environment of, let’s say, the university we are trying to model. We can distinguish it from other students because Ganesh is only one such object and it’s also physical in this case. So course may be an abstract entity but Ganesh is obviously a physical or a concrete entity. So an entity is something which exists in the real world, in the application environment of the real world which we are modeling.

An entity set, on the other hand, is a way of putting together or clubbing together a set of similar entities. These similar entities that we group together, they form a set, and these entity sets need not be disjoint. So example, we can have a set of suppliers, or we can have a set of customers. There can be a supplier who is also a customer. So we are now talking about various suppliers forming one set, various customers forming one set and these two sets maybe disjoint or may also be overlapping. But an every entity in those sets will be either a supplier and it will play the role of supplier or it would play the role of a customer. And entities which are of similar type will form a set out of them or a collection out of them, because we always are looking for similar entities.

Because when we are trying to create a model there may be thousands of students in the university. Now we are not going to model each one of them, because they are all similar from our application point of view. They do similar things and they execute similar transactions. Their data has to be stored. So naturally we are more concerned not with individuals but collection of similar entities, what we call an entity set.

Another example, the set of books in a library. So there may be thousands of books but we treat book as a set. And each book in the library is an entity. Okay. So we know now the difference. The book as a entity set is a collection of all books. And these are available for students to read and borrow. But when a student goes to library, he issues a particular book. That means he actually issues a particular entity from the collection of all books. So we should keep these important concepts in mind that entity is a specific entity or specific book or a specific student or a specific course whereas entity set is set of similar entities, like all books, all students, all customers, they form an entity set. It’s a collection.

Entity set is also called entity type or entity class. These are alternative terms for the same concept called entity set. An entity therefore is an occurrence or an instance of some entity type. So student is entity set, or it’s an entity type whereas Ganesh is an occurrence of student, or Ganesh is an instance of student. So we must clearly understand instance is a real world object like Ganesh. Whereas student is a concept — is a conceptual entity, it represents all students together in the application domain.

We often use the word entity to mean entity set. This is because when we are modeling, as I said earlier, we are not distinguishing different entities because their behavior is similar in a given entity set. So we talk of student in general rather than talking about Ganesh in particular. Therefore we use the word entity just to mean actually the entity set. Our modeling will be concerned more with entity sets rather than with entities.

Entity sets are named using singular common nouns. Now these are very important point, proper naming of entity sets is very important to convey the meaning. Here are some examples, book, student, course, these could be useful entities in a university environment.

The next concept is the concept of attribute. An entity has a set of attributes. Attribute defines some property of interests. For example, the entity book has a attribute price. We know that books are having a well-defined price at which they can be bought or sold. So therefore the book has price as a attribute. Every attribute is given a suitable name. Again the name should be meaningful. So just now we said price is an attribute. So price is the name of the attribute associated with the book entity. This is how we put the concepts together and try to give a very precise meaning. So we now say that price is an attribute. It belongs to the entity set called book. So all books now will have some price value.

Attributes have a value for each entity and this value would be naturally different from entity to entity. So a particular book may have a price of 200 rupees, another book may have a price of 300 rupees. So every attribute has a value and every entity which has that attribute will be associated with the value. Value may of course change over time. For example, salary may be an attribute of an employee. Naturally the salary changes.

Same set of attributes are defined for entities in an entity set. And that is how we say that entity set is a set of similar entities. They are similar because same attributes are generally applicable to all of them. So if I have 1000 students in a University, all of them share same attributes because they are of interest to me from same point of view. So naturally in a given entity set all of the entities will have same type of attributes.

So let’s take an example. Here is the example of entity called book and this entity has the attributes which are listed here. Every book has a title, has a accession number, has a publisher, has a price, has an ISBN, which is a unique book number, then has an author and also year of publication. So all books have these attributes. We have listed these attributes because they may be of interest to us in the context of the application. Of course, the book has many other attributes. For example, number of pages, the language in which the book is written, and so on. But we have not listed them as attributes here possibly because they are of not much interest to us in the application that we are developing. But they may be in some other application.

So when we talk of an entity we must always keep in mind the context of the application we are developing and in that context we should list the attributes which are of interest to us for which the data will be obtained and will be stored. So what we are saying here is that in the application we are building we need the following data about the book. Therefore we have defined attributes for the book — for all the books in about library. So these attributes will apply to all the books.

Naturally as we said before, every book in the library would have a value for each of these attributes. An attribute may be multi-value. What it means is that an attribute may have more than one value for a given entity. And a good example of that is the author attribute. Some books may have more than one author. What it means then is that under author for a given book we may list more than one name. All of these names represent values for the author attribute. Therefore we say that author attribute is a multi-valued attribute.

ALSO READ: Panasonic IFA 2014 Press Conference at Berlin (Full Transcript)

An attribute which uniquely identifies an entity is called a candidate key for that entity set. Remember that this is important, and this has to be present because every entity must be indistinguishable from another entity. Every book must be distinguishable from another book. Therefore book must have an attribute which uniquely identifies every book in the library. Such attributes are called candidate key, or the key in short. Key attribute is an attribute which uniquely identifies an entity.

Some attributes can be composite attributes. That means they contain multiple values of different type. For example, the date may contain month, year or the day. Address may contain city, pin code and so on. So these are called composite attributes, because they can be decomposed further, they contain other parts and these parts can also be named if necessary, or we may not be interested in decomposing them further but they do have a composite value.

Then the next concept is that of a domain. Domain allows us to define a set of permitted values for an attribute. What kind of values an attribute can take? This may be defined by listing the values or by defining the type. For example, the date of joining for a given employee can be of type date. Marks obtained by a student can be of type integer number. Whereas an attribute like grade may be listed in terms of values it can take. Grades may take values A, B, C, D, E, F. Maybe these are the only six values permitted. In that case, we list those values.

So domain is again an important concept in defining our application. We list the set of permitted values for a given attribute. This is again part of defining a constraint, what is called an integrity constant, validation constraint. This is important for validating the values in the database. If we have a domain for an attribute, such as say the marks attribute. Now marks attribute must have an integer value. Any other value which is not integer cannot be accepted. So domain, when we define like this, is actually defining the integrity constraint or validation requirement of the data.

Next, let’s come to the concept of primary keys. We have just now defined the concept of candidate key or key in general. Basically this is a related concept, it’s just a slight extension of that concept. Purpose is again the same. We want to distinguish occurrences of entities in a given entity set. So we may have a entity set called student. There are now 4000 students in this Institute. How do I distinguish one student from another student? This would be done using the concept of a key. Distinction therefore is always made using value of some attribute or attributes.

A set of one or more attributes can uniquely identify the entity. In that case, this set of one or more attribute is called its candidate key. And here are some examples. Roll number for a student. So we can call roll number as a candidate key attribute for the student entity. Similarly accession number for a book in a library is its candidate key. Note that title of the book is not a candidate key, because in a library we may have many copies of the book with the same title. Therefore the libraries give an accession number to every book and every book in the library has a different accession number. Similarly, the roll number of a student. Every student has a unique roll number using which we can clearly identify the student we are talking about. So this could be the candidate keys.

No subset of candidate key is a key itself. So in case a key contained more than one attribute, then all those attributes must be required, and we cannot do away with a subset of that. We cannot drop any attribute from that composite key. So a candidate key may be a single attribute or a multiple attribute but when it is containing more than one attribute it should not have any redundant attribute. All of them must be required in order to identify a entity.

An entity may have multiple candidate keys, which means that I may identify people or books or whatever entity we are talking about in more than one way. Consider the example of say employees of a small organization. In that we may have employee numbers to identify them, but names may also be adequate, name may also be unique. In that case we say that the employee has two candidate keys. One is the employee number, and the other is the employee name.

Now how do we define primary? Basically primary key is a candidate key chosen by the designer as the principal means of identifying an entity. So you say that although both employee number and employee name are candidate keys, I will prefer employee number as a key. And therefore employee number will be designated as a primary key. So in fact, all of them can be also loosely called primary key but we are just naming a candidate key as a primary key, as a main way or the primary of identifying the entities. So very often we just call all candidate keys also as a primary key but the way we have defined it now it should be unique and there should be only one primary key for a given entity set.

Let’s look at an example, which is taken from an university or a college situation. We’ll identify a few entities and their attributes. Many of these are obvious and you will be able to understand them quite easily. So here is a student entity with the attributes, roll number, name, hotel number, date of birth. Here is a course entity, which has attributes, course number, name, credits for the course. Here is the teacher entity, which has employee number, name, rank, room number, telephone as its attributes. Finally, we have the department as an entity and the department has only two attributes, name and the telephone. As a small exercise, you can try to identify primary keys for these entities.

Let’s note some points about this example. This example we will further refine as we proceed, and we’ll also take many such simple examples from the university environment, because it is all familiar to all of us. So we can always prefer to take examples from a university environment. But we will refine these four entities that we just now introduced. But there are some very important points to be noted.

The first point is that our focus could have indicated more entities. We must remember that in the college environment, there can be additional entities. Hostel could be an entity. Semester could be an abstract entity. Or instead of naming teacher as an entity we could have named teacher as an attribute of course so that we say that course has a teacher. Now how do we decide in a given modeling situation whether something is an entity or not? This would depend on the focus of our design or the focus of our application.

And how do we perceive the reality? Hostel in the real world is obviously an entity, it’s a physical entity. It has rooms, it has an address, it has a warden and so on. It has many interesting attributes. But in the application we are developing the purpose of the hostel may be only to act as a attribute for a student to tell us in which room number or in which hostel number the student is staying. So in that case we are not interested in hostel directly but we are interested in the hostel through student entity. So we will have to decide what is the focus of my application? Do I need to know how is the warden of the hostel? Do I need to know some other details of the hostel? In that case hostel will be an entity.

But if the role of the hostel is only to tell me where a student stays, then I can treat it as an attribute of student. So this important point must be kept in mind that what is the perception we have. What is the focus of our application development? And based on that you will identify the entities and the attributes. We cannot lose this perspective throughout the modeling exercise.

Here is an exercise for you. Given a hospital environment most of us are quite familiar with this environment. You identify a few entities and attributes.

Let’s go to the next concept: the concept of relationship. Relationship represents some association among entities. In the real world entities exist and they interact with each other. They get associated with each other. We want to capture that association through the concept of a relationship.

Let’s take a few examples. A particular book can be prescribed as a text for a particular course. Remember that book is an entity. Similarly course is an entity, but now we are relating book and the course. We are saying that the book called Database Systems by author CJ Date is a textbook for the course identified as CS 644. Now we are talking about a book here called Database Systems. We are talking about a course here called CS 644. Now these two are related through the concept of a textbook for a course. So textbook is actually defining a relationship between book on one hand and a course on the other hand.

Similarly another example would be student Ganesh who has enrolled for the course CS 644. Ganesh is an entity, course is an entity. The fact that this student has enrolled for this course is captured through the concept of a relationship. So you can see here the purpose of relationship is quite distinct from the purpose of the entity. Entity identifies independent objects in the real world. These objects interact with each other. So a student enrolls for a course. Ganesh enrolls for course CS 644. This will be captured through the concept of relationship.

Now again we don’t talk of specific relationship instances. We want to capture the relationships of similar type and this we do through the concept of relationship set. So this is same as before when we talked about entity and entity set, now we are talking about relationship and relationship set. And whenever we generally talk of relationship, we basically imply relationship set. Because in modeling we are interested in set of similar relationships and not specific relationships. Therefore these two words, relationship and relationship set are often used interchangeably.

ALSO READ: The Psychology of Evil by Philip Zimbardo (Transcript)

Now relationships exist between entity sets. A binary relationship exists between two entity sets. A ternary relationship exists among three entity sets and n-ary relationship in general exists among n entity sets. So when we define n relationships we will have to identify which entities are involved in that relationship.

A binary relationship as we said just now exists between two entities. So here is an example. A relationship which we are naming it as a study exists between student and course. So this is a relationship and this relationship has been named as study, and it exists between two entities, a student entity and a course entity.

Of course we may view it differently depending on the context or the application and we may find that the relationship study actually is a ternary relationship among student, course and teacher. So what is the difference? The two statements we have just made are they same or is there any important difference between them? Because even in the first case the teacher entity may be there in the domain that we are modeling. So assume that the domain that we are modeling contains student, courses and teachers.

Now this study relationship, it is between student and course as a binary relationship, or is it a ternary relationship between student, course and teacher? Is there a difference or it doesn’t matter. In fact, there should be a difference. Otherwise we have a modeling situation where people may arbitrarily model it as a binary or a ternary relationship. The advantage of ER model is that it precisely defines the difference between these two situations. And which one is correct will depend on the application environment. It will depend on the university you’re modeling.

So let us see the difference. If you consider the first one, it means that as far as the student and course are concerned, I do not need to find out which teacher is teaching that course. Given that a student is learning a course, I can always find the teacher who is teaching that course independently. Whereas in the second case we are saying that it is not enough to say that student Ganesh is studying course CS 101. You have to also say under which teacher he is studying that course? So this is important from what is the basic undivisible fact.

What it really means is that in the second case, we may have a university where the same course is possibly taught by many teachers concurrently and it is not enough for Ganesh to say it that I am studying CS 101. He has to say that I am studying course CS 101 under teacher Deepak or whatever. So this is the important difference. In the first case, it is not necessary to mention the teacher. We can find it out given the course. In the second case, it is not adequate. Given the course I cannot find out which teacher student Ganesh is studying with. This is an important difference and this is the strength of the ER model that it is able to convey such finer aspects of the semantics and it can distinguish between these two situations. So depending on the situation we are modeling, we have to define binary relationships or ternary relationships as appropriate so that the association among entities are clearly captured.

Another important thing is that a relationship itself may have attributes. The study relationship has attributes grade and the semester. The grade will tell us what grade the student has received in that course. So suppose student Ganesh is studying course CS 101 and he gets a A grade. Now, this grade A must be treated as an attribute of study. It is not an attribute of student Ganesh, because just saying that student Ganesh got A grade is not enough. You have to say in which course he has got that grade. Therefore it is neither the property of student nor the property of course but it is the property associated with the relationship between the student and the course. This is an important point.

So grade cannot be defined as attribute of student, nor it can be defined as attribute of course. But it must be defined as attribute of study relationship. Similarly the semester in which the student has registered in that course will be an attribute. So we always define these attributes for the relationships as we see them in the application domain.

Relationships are named using verbs or nouns. Here are a few examples. Study is a relationship as we have seen, enroll, or order, a customer may place an order for a part. So order will be a relationship because customer is an already existing and defined entity, part is an entity. Order relates customer and part. So these relationships should be named using either verbs or nouns and they should be chosen very carefully so that the meaning of the relationship is conveyed to the user.

Continuing the exercise which was mentioned earlier, you should now identify the different relationships which may exist in the hospital example for which you had identified entities earlier. This will clearly explain to you whether the concepts that we have discussed you are understanding them properly or not. So you should try to work out this exercise and identify entities and relationships in that hospital example.

Let us next see how we can depict the relationship. How do we visualize it? It’s very important concept and often mistakes are made in understanding and modeling a relationship. So first, we will try to see how we can show a relationship in some simple diagramming form.

We will show entity set as a collection. We will show entity instances by some small circles within those collections, and we will show relationship by connecting the entities which are involved in a relationship. So we will simple – we will use a simple diagramming notation to understand the important concept of a relationship.

Here is the example which shows the study relationship. You will see a entity set here. This entity set is the student entity set. And here are four students shown. So these are the instances. This is the entity called Ram, entity called Sita. So this is a student entity set. This is a course entity set. We have listed a few courses here like course on database management system, a course on data structures. So these are the instances of course. These are the instances of student and now we want to capture the study relationship between these.

These study relationships are shown by small rectangles and by connecting the two entities which are related. So this small rectangle here indicates that the student Ram is studying the course Cobalt. He is also studying the course DBMS. So in this pictorial representation we are showing the relationships as something which connects entity from student and entity from course. In fact, we can think of relationship as navigation parts to allow us to go from one entity to another entity. So I want to find out which courses Vinod is studying. So I start at Vinod, follow this path and I know that the student Vinod has registered for one course. Sita has registered for three courses and so on.

So this is a very useful way of visualizing entities and relationships. Such diagrams are called instance diagrams. They show entity sets, they show individual entities as example entities like Sita, or Ram or courses like data structures, and it tries to visualize relationship among them. One important thing here is that one rectangle can only connect one student and one course. The same rectangle cannot connect the same student Ram with another course DBMS. That must be shown by another rectangle. Because this is a set and every element here is uniquely connecting one course and one student. So this is how we can represent entities and relationships in a simple diagramming fashion to convey our understanding of the entities and how they are interacting in the real world.

What is the primary key for relationship? Just as entities have key attributes, can we define the concept of key for a relationship? So this is defined in a very simple way. A primary key of a relationship is made up of primary key of the participating entities. So a relationship does not have a key of its own directly but it is made up of primary keys of the participating entities.

So if you look at the study relationship, what is the key of study? The key of study relationship consists of a composite attribute set — it’s consisting of two attributes, roll number and course number taken together. Remember that roll number is key of student, course number is the key of course. So together these two attributes define the key for the relationship study. Besides this key of course it can have other attributes as we mentioned earlier like grade and semester.

We will next consider the important concept of relationship cardinality which is identifying some constraint over the relationship. This constraint is captured by indicating how the relationship connects entities between the entities among which the relationship is defined. It characterizes the relationship further and it is given by indicating how many entities of one entity set participate in a relationship. We will take many examples of this. It’s a very important concept, it’s related to the relationship concept. Cardinality is a way of characterizing the relationship further and it is indicated in terms of number of entities which participate in a given relationship.

Just to mention an example, if you see the study relationship, if you take on student, then how many courses he may be studying? He may be studying more than one course. So we say that as far as courses are concerned there may be many courses related with on student entity. So this is how we try to indicate occurrences of different entities in a given relationship. And this is captured through the concept of cardinality.

Professor NL Sarda Discusses Data Modeling – DFD, Function Decomp (Transcript)

Related Posts