In this module we will talk about data modeling. You will recall that when we introduced software engineering and software development methodology, we were talking about requirement analysis where we mentioned that we perform both the data modeling and the process modeling as we go through the requirement analysis as we understand the application environment of the user.
So let us start by asking again why we model. We build models of complex systems, because it’s difficult to understand any such system in its completeness at one shot. We have to therefore build a model and try to understand it in terms of its complements and how those complements relate with each other. So one of the important reasons for modeling of complex systems is to improve our understanding of such system because we cannot understand them entirely in one single instance.
We need to develop a common understanding of the problem so that we can proceed toward the solution. This common understanding is between all the people involved and also the users for whom we are trying to propose a solution. We cannot afford a trial and error approach. In fact, a model will clearly establish that we are proceeding along the correct direction and that our understanding of the user’s environment is correct. And this will be reflected in the model. This will remove the trial and error kind of approach. And it will also reduce the risk in the overall development.
A model is also extremely useful to communicate the required structure and behavior of our system. We try to capture that in the model and then put it in the form which can be understood and which can be verified by others. So these are the reasons why we model.
Let us see how we model. We choose an appropriate modeling concept or an appropriate modeling paradigm. This should be such that our solution can be properly expressed. So this choice of the right model is extremely important, and it has considerable influence on shaping the solution we propose for the problem. So we chose a model for the kind of purpose we have at hand. This may be modeling of data, or this may be modeling of the processing defined in a given application.
No single model is sufficient. In fact, this is an important point that in most analysis phases, we try to build different types of models which represent the different perspectives of the same environment or the same application. It is important to approach a complex system from different points of view which might be best represented by using different modeling techniques. So a single model may not be sufficient.
In fact, we have been talking about two independent models already. One is for the modeling of data and one is for the modeling of processing. Even in the object-oriented model that we had mentioned earlier, different perspectives are taken. One could be defining the object model which is static kind of a model which reflects the different objects which are present in the user’s environment. And then there is a dynamic model which defines the interactions among these objects. So we do take different perspectives and we try to use an appropriate modeling concept or appropriate modeling paradigm to represent these perspectives.
The best models are connected to reality. In fact, the purpose of the model is to abstract important aspects of the reality and represent them very clearly. So naturally they must meet the requirements as we want to analyze for a given real world situation. So these are the different issues that we must keep in mind when we define our modeling exercise. What model should we choose?
So in this particular module, we are talking about data modeling. We will define the notion of data modeling. We are going to build these models in terms of the important concepts of entity and relationship. In fact, the model that we are going to discuss in detail is the entity relationship model or the ER model in short. We will look at the diagramming concepts which are available in this modeling technique. We will talk about other related concepts of keys, weak entities. Then we will also talk about extensions to the ER model. So these are the different topics we’ll cover in this particular module.
Let’s begin by seeing the purpose. The purpose of the data model is to represent the operational data in the real world. These operational data describe the various events, entities and activities which take place in the business environment for which we are proposing the solution. So remember that we are trying to represent the operational data and there may be a lot of these data which describe different entities, different activities which happen in the business environment, different types of events which take place. So all these data need to be captured in solving that application problem. So objective of data model would be to represent this operational data.
The model may be described at various levels. The model may be at the logical level or a physical level. Physical level naturally will address not only what data we have but how that data is stored and retrieved and updated and things like that. So this would be the day in which the data would be actually handed.
Very often we first try to understand the data at the logical level. The model may also be at external level or conceptual level or internal level. What we really mean here is that when we say model is at external level, it might define the model as seen by a particular user who is the user of the application. Naturally his view of the data may be a subset of the overall data content in the application whereas the conceptual model represents the data in its totality at a level which represents the important concepts in the application.
Internal data model actually is more of a physical representation of the data. Data may be stored in terms of files and so on. So this would be an internal model. And internal model generally would take into account efficient processing of data whereas the conceptual model purely concentrates on the concepts and how those concepts are interrelated without being concerned about the efficiency issues. So we may model the data at various levels.
In this particular module, we are going to focus on the conceptual data model. In fact, this is the beginning where we try to understand a given application domain in terms of all the important concepts and try to understand those and try to interrelate them. Subsequently when we proceed in the design, we will try to consider the physical representation of this conceptual model, the internal representation, how to make user processing more efficient and also how to restrict the views of different users so that they access only that data which is relevant to them. So these external, internal or physical aspects of the model are generally worked out when we proceed into the design and implementation.
Our first objective in the development of the solution is to build a conceptual data model. A good data model should meet some important requirements. And these are quite straightforward. A good model naturally should be easy to understand, because we will expect users to validate the kind of model we are trying to build to represent his application domain. Users can understand it if it is a simple model, it has only a few concepts and it can be specified in top-down fashion so that you don’t give all the details together but you give details in step-by-step fashion.
In fact, the top-down specification is very important in any modeling because we cannot comprehend a complex system at one point. We have to see it at different levels of details and that is permitted by top-down specification. So these are the characteristics of a good model.
Let’s also define what we mean by a model. What does a model consist of? A model consists of a few important concepts. It will also give us some form by which those concepts can be represented. We call this constructs. So construct is a representation for a concept. But model primarily offers a set of concepts and it also offers a few operations on those concepts. So this is what a model is made up of.
Another requirement for a model is that it should capture the meaning of the data. Naturally we are trying to understand the real world. We are trying to understand the data which is present in that real world. And then we are going to put it in the form of a suitable conceptual model. This model should capture the real world meaning of the data. Otherwise it will be difficult to interpret the data. So we have a notion of what we call data semantics. Data semantics is concerned with the meaning of the data. How we do convey the meaning of data? And when we prepare a data model not only it should give us some techniques for representing the data but it should also give us a way of conveying the meaning of the data. So both of these, the concepts and the meanings, what we generally call as syntax and semantics, would be important aspects of a data model.