Information Utilization in Theory
Rainer von Königslöw
University of British Columbia
Presented at the Annual Meeting,
Canadian Sociology and Anthropology Association
Ottawa, June 1967
This paper arises out of the recent efforts in the field to formalize and axiomatize theory and to examine the philosophical justification of measurement, and the introduction of mathematical models, both at the measurement and at the theoretical level.
The basic assumption of the argument we want to develop is that.theorjes express relationships among concepts which in turn directly or indirectly refer to attributes of objects or events. Thus the theory is expressed in terms of a series of assertions specifying how the given concepts or variables are related for a given set of objects or events, i.e. a series of propositions asserting what attributes of the objects or events are expected to be associated by concurrence or consecutive occurrence. Conceptualization of the theory, of the relationship between the given concepts thus depends on the conceptualization of the concepts or variables used by specification of what attributes or characteristics these variables refer to, and on conceptualization of the relationship between the concepts by specification of how the given attributes are expected to be associated.
Another assumption is, that to develop and express precise and unambiguous relationships among variables we have to develop theoretical calculi that are formalized and expressed in terms of the techniques of logic and mathematics. Use of such logical and mathematical calculi can be seen clearly in the development of theory in the physical sciences. This is consistent also with the recent emphasis on axiomatic theory and mathematical modelling in the social sciences. The problem, however, is that we do not as yet have many concepts that lend themselves easily to formalization in terms of logic and mathematics.
In taking the argument a step further, we shall contend that the meaning of a concept is bounded and prescribed by its operationalization, that what we can mean by the use of a given variable is limited by what we can measure. The problem of developing more precise concepts thus becomes a problem of measurement.
In terms of this analysis, then, the difficulty in the development of more precise theories in the social sciences is that we do not, as yet, have many concepts and related operationalizations and measurement techniques that yield good mathematical scales or logical categories. That means that the variables we do use are far from ideal in logical coherence of definition and operationalization so that we cannot identify and specify precisely just what attribute or characteristics of the given set of objects or events we are dealing with. Also the measuring techniques we do have lack in precision and reliability so that we cannot be certain of our measurement of the objects or events assigning them a definite attribute or characteristic, represented by a specific category or numerical assignation, with only a negligible error.
Also, most of the concepts and variables in the social sciences are conceptualized and operationalized in terms of nominal or ordinal scales that cannot be mapped simply into unique numerical representation, so that relations between the concepts or variables they represent could be handled in terms of equations and other mathematical calculi. Thus we are limited in what theoretical calculi we can employ by the limitations in the conceptualization and operationalization of the concepts or variables. The problem therefore is to develop and apply precise logical and mathematical calculi that can handle relationships between variables yielding only nominal or ordinal orderings between the attributes and that therefore do not easily yield unique numerical representations that can be handled in terms of standard mathematical operations and transformations.
Another related problem is the test of the usefulness or significance of the theory, that means of the relationships between the variables posited by the theory. This presumably would be a test of what the theory does accomplish in comparison to what it is expected to accomplish. Thus before we can devise tests of usefulness or significance we have to conceptualize whet the purpose or goal of a theory is supposed to be.
One such goal is that the hypothetical relationship between the variables posited by the theory is supposed to express a ‘true’ relationship between the attributes of the objects or events referred to. Thus we have to be able to make a decision about the truth value of the hypothetical relationship. This decision commonly is made by positing and testing a second proposition, namely that there is no such relationship between the given variables and that any associations observed between the attributes of the objects or events occur by chance. This second proposition is commonly called the null hypothesis. The theory then is tested by evaluating how likely the null hypothesis is to be false for the associations between the variables observed. This approach has been developed quite highly with the construction of increasingly flexible and sophisticated statistical techniques.
This test of the truth value of the theory by estimating the probability of the occurrence of the associations of the states of the variable by chance, and comparing it to the observed occurrences of the associations, is not an exhaustive test of the ‘truth’ of the theory, in that it still allows for exceptions and differences between the associations between variables posited by the theory and those observed. If, for instance, the propositions about associations between the variables, as given by the theory, are taken as logical implications of the form: ‘if any event belonging to the class E has attribute a then it also has attribute b’ or, ‘if and only in an event belonging to class E has attribute a then it will have attribute b’, then any single observed instance of a given other association of attributes, such as: ‘it was observed that event e belonging to class E had attribute a but did not have attribute b’ would disprove the proposition and thus the theory, Now while it may be assumed that these exceptions are errors due to measurement rather than due to the theory, this would not necessarily be a valid assumption and would also play havoc with the assumptions underlying any test of the theory in that we would not know whether to ascribe the difference between the expected and observed associations between variables to errors in the theory or errors in the measurement.
Some of these problems can be dealt with by changing conceptualization of the purpose or goals of a theory. We may say, for instance, that it is not necessary for a theory at this state to express the ‘truth’ about the relationship between the given variables, but rather that it is supposed to give us some information, increase our knowledge about the relationships between the variables, where the underlying ‘real’ or ‘true’ relationships may be seen to be much more complex than those posited by the theory. In this case the theory would be evaluated in terms of how much it adds to our knowledge about the observed objects or events, how much it helps to predict what attributes a given set of objects or events will have. Tests of theories conceptualized in this way will be in terms of goodness of fit of the theoretical predictions to the observed occurrences, and thus in effect test how much information we have gained by use of the theory. However, in evaluating differences between expected and observed associations, we still have the problem of accounting how much of the error is due to the measurement and how much is due to the theory.
A further problem in the development of precise theories in the social sciences, therefore, is the conceptualization of error in measurement as well as error in the theoretical propositions.
To restate our argument, then, development of more precise theories in the social sciences requires solutions to the following conceptual and meta-theoretical problems:
1. Conceptualization of the concepts or variables in terms of a measurement operation yielding unique assignments of numerical or categorical representations of the states of the variable for any given object or event.
2. Conceptualization of the error in the measurement in terms of the uncertainty in the numerical or categorical assignment of an attribute to the given object or event.
3. Conceptualization of the theoretical relationship between the given concepts in terms of a logical or mathematical calculus, taking into account the categorical or numerical representation assigned in terms of the measurement operation, as well as the error or uncertainty in the assignment.
4. Development of an adequate test of the usefulness or significance of the given theory, taking into consideration the conceptualization of the relationship posited by the theory. Such a test would presumably take into consideration the gain of knowledge from the hypothetical relationship, evaluating it in terms of the goodness of fit of the expected associations to the observed associations between variables, in reference also to the uncertainty or error in the measurement of the variables.
Above we have raised the problem of theory construction in the social sciences in terms of a broad perspective, arguing that the development of more precise theories is dependent on finding adequate solutions to the four conceptual and operational problems outlined above. While we have to find solutions for each theoretical problem we might want to deal with, there are general meta-theoretical paradigms that can be used in attempting to find solutions to the above problems.
In this paper we shall attempt, in a very tentative and exploratory manner, to develop such a paradigm based on the notion of information. This notion has been developed in cybernetics and information theory, in terms of signal transmission and noise in electrical and other networks, but it has lately been applied also to the theory of measurement.
We shall use the notion in terms of the assumption that the purpose of both measurement and theory is to yield and increase information about the attributes of objects and events. In the second part of this paper we shall apply this paradigm to an attempt to find solutions for the above problems for measurement and theory for a particular substantive issue, the socio—emotional involvement of persons in small group interactions.
Again, before proceeding to the main body of tie paper, we would like to emphasize that this discussion is meant to be tentative and exploratory, and that therefore many of the issues are not adequately dealt with or left unsolved.
Information, as we shall use it in this paper, is defined as the knowledge gained by an assertion about something, an assertion that a proposition of the nature: event e has attribute a, is true. As the logical positivists have pointed out, to be meaningful this assertion has to imply the negation of the complementary proposition: event e does not have attribute a. Thus, in logical positivist terms, meaningful information has the nature of a choice between two or more mutually exclusive propositions. In terms of the information content then it is just as important to know what propositions have been negated as to know the propositions that have been asserted to be true for the given reference event.
Let us assume for the moment that we are interested in which one of a set A of attributes (a1, a2, a3, a4) is applicable to our given event ‘e’. Let us also assume that the attributes forming the elements in the set are mutually exclusive and complementary such that one and only one of the attributes is applicable to event e. Such a set is called a ‘measure’ in set theory. In common scientific nomenclature set A is called a ‘variable’ on reference event e with a1, a2, a3 and a4 constituting the ‘states of nature’ or ‘values’ of the variable. One of these attributes might be a null element, the attribute that none of the other alternative attributes specified applies to the given event. We could also express this as a set of four propositions: event e has attribute a1, event e has attribute a2, etc., in which case one and only one of the propositions can be true and the others must be false. The problem of selecting the attribute applicable to the given event thus becomes the decision about which of the propositions shall be asserted to be true.
In terms of the notion of information as the choice between mutually exclusive propositions, to have no information about the attribute applicable to event e means therefore to be unable to make the choice between these propositions. In other terms it implies that as far as we know each of the propositions is equally likely to be true. To the extent then that we have more knowledge, we would judge one of the propositions to be more likely to be true than the others. It can be seen how we can express the state of uncertainty through assignations of probability estimates to the assertions that a given proposition be true. Since we are certain that one of the propositions must be true, the sum of the probability estimates must equal 1.00. The state of maximum uncertainty would correspond to equiprobability for the assertions, while the state of certainty, or minimum uncertainty would correspond to one assertion having the probability estimate 1.00 and all the others the probability 0.00. The state of uncertainty thus is given by the probability distribution over the alternative assertions.
A single estimate of the state of uncertainty can be obtained by use of the convention developed in information theory where H -Ei Pj log Pi summed over the estimates for the alternative assertions, where H stands for uncertainty. H = 0.00 in the case of certainty, and maximal for the state of maximal uncertainty, f or any given set of alternative propositions.
We thus have a method for estimating the amount of uncertainty we have in our knowledge about e. As we shall see however, this is not the only factor determining how much information we have about e. Let us use an example. Let us assume that we are certain that attributes a1 and a2 do not apply to e but that a3 is three times as likely to apply to e as a4. We thus have the probability distributions:
attributes a1 a2 a3 a4
probability estimates 0.00 0.00 0.75 0.25
maximal uncertainty 0.25 0.25 0.25 0.25
H =0.25 H =0.60
This, however, is not the only way we could treat our problem. We may want to reformulate it by grouping our attributes so that we would have fewer alternative propositions to choose between. Taking (a1 or a2) and (a3 or a4) as the two exclusive alternative attribute subsets, we can reduce our problem to a choice between the two propositions: e has attribute (a1 or a2), and, e has attribute (a3 or a4). According to our case above we would now be certain that the second proposition is true, so that we would obtain the distribution:
attributes (a1 or a2) (a3 or a4)
probability estimates 0.00 1.00
maximal uncertainty 0.50 0.50
He 0.00 Hm 0.30
As we can see above, it is possible to reduce uncertainty by grouping the attributes. That this does not occur necessarily we can see through alternatively grouping (a1 or a3) and (a2 or a4), obtaining the distribution:
attributes (a1 or a3) (a2 or a4)
estimated probability 0.75 0.25
maximum uncertainty 0.50 0.50
He = 0.25 Hm 0.30
These examples then point out the significance of how the attributes are grouped. The relative uncertainty about the attribute applicable to event e is seen to be a function of how we group the attributes. As we discussed above, however, grouping the attributes is essentially a question of redefining the attributes and determining the differentiations we make in the dimension or attribute set A. How much information we can obtain about a given event e is therefore dependent on how we define and how we differentiate among the alternative values of the given variable.
Let us go on now to discuss in more general terms the information we can obtain about a given event from a given variable or set of attributes. We may conceptualize dimension A as a continuous variable, i.e. as a set containing an infinite number of alternative attribute elements: A = (a1, a2, 83 . . . . . an..). Since in practice, however, it is impossible to differentiate and discriminate between an infinity of alternative attributes, we have to group or partition this infinite set. Thus our first formulation above with the four alternative attributes can be seen as a partition on the infinite set: (a1 . . . an), (an+l... ap) and so on. Each one of the alternative attributes discussed above refers therefore to a proper subset of A, where the intersect of the subset has to be empty to give us our condition of mutual exclusiveness, and the union of all the subsets has to be the universal set, i.e. set A, to fulfill the condition that at least one of the subsets be applicable to event e.
Now besides the importance of how we partition the set, discussed above, how much information we can gain about the given event also depends on how many alternative attribute sets we have in our partition.. Another way of conceptualizing this factor is by looking at information in terms of how many statements we can make about the event. For instance, in our first formulation, the assertion ‘event e has attribute al’ not only asserts that the proposition ‘event e has attribute a1’ is true, but, by implication, it also asserts that the propositions ‘event e has attribute a2’, ‘event e has attribute a3’ and ‘event e has attribute a4’ are false. We thus by a single assertion have gained knowledge about the truth value of four propositions. For the other two formulations, by similar reasoning we obtain knowledge about the truth value of only two propositions. Thus the first formulation differentiating between four attributes of the dimension allows us to make twice as many statements about event e than we can on the basis of one of the other formulations that differentiates between only two alternatives. We can therefore define the maximum amount of information obtainable about a given event with a given attribute set as a function of the number of differentiated elements in that set. According to a convention developed in information theory, the maximum amount of information a variable can generate about a given event is given by the logarithm of the number of states of nature of the variable, 1A log nA
Now as far as the preceding discussion goes, any function that increases monotonically with the number of alternatives could have been used as a measure of the maximum amount of information obtainable with a given variable. The usefulness of this convention, however lies in it additivity. Thus if we have two variables with four states each, (81, a2, a3, a4) and (b1, b2, b3, b4) that are independently applicable to the given event, i.e. if they refer to different dimensions and thus could vary independently, we could formulate sixteen propositions of the form ‘event e has attributes a1 and b1’ for all the combinations of the states of the two variables. Thus we could define a new variable or attribute set made up of the sixteen pairs (a1, b1). Now the maximum amount of information that can be gained independently from the two variables is IA = log 4 = 0.60 and ‘B a 0.60 while the combined variable yields IAB = log 16 1.20 which is equal to the sum of the other two.
Let us now examine how the maximum amount of uncertainty we can gain from a given variable relates to the uncertainty we have about which of the attributes is applicable. Again the maximum uncertainty we can have about which of a set of attributes is applicable to a given event is a function of the number of attributes we differentiate about, in the sense that if we have only two attributes to choose between then we can only be uncertain about which of two propositions is true, while for a variable with four attributes we may be uncertain about which of four propositions is true. Now the measure of maximal uncertainty developed above (based on the notion of propositions being judged as equally likely to be true in the case of maximum uncertainty) for a variable with n number of alternative states would be Em - 1/n log 1/n which yields = - log 1/n. But since log 1/n a - log n we obtain Hm = log n which of course is identical to the measure for the maximum amount of information obtainable, from a variable with n differentiated states of nature, Im = log n. So in essence our measure of the maximum amount of information obtainable from a given variable is a measure of the uncertainty we can have about which of the propositions is true, i.e. a measure of how likely a given proposition is to be true in the absence of any knowledge about it. (Hm # - log 1/n). The reasoning behind this is that we can gain information about something only to the extent that we might be uncertain about it. Again in terms of the logical positivists, we can gain information about something if it is possible that we could be wrong. Or, in terms of the examples used above, we can gain information about the attribute applicable to a given eventmly if there is more than one attribute that could be alternatively applicable to the event. The more attributes are alternatively applicable to the given event, the more unlikely is it that a given attribute is applicable, in the absence of knowledge about it, and the more information do we have if we happen to know that the given event does have that particular attribute. The maximum information obtainable from a given variable about a given event is therefore an estimate of how much more information we would have if we knew which state of the variable applied to the given event, in contrast to the state of total uncertainty.
Let us now return to our example to consider how we can apply our measure of the amount of information gain. As discussed above, we want to know how much more information the knowledge that we do have about the attributes applicable to event e gives us compared to total uncertainty. The measure we shall use then for information gain is the difference between the estimated uncertainty for our knowledge about event e and the maximum uncertainty, Ig = Em - Me As a measure of the efficiency with which the given partitioning was used, we use the ratio of the actual information gain obtained to the maximum information gain possible with the given variable, Ie = Ig/Im Applying these measures to our examples above, we obtain:
I attributes a1 82 a3 a4
probability estimates 0.00 0.00 0.75 0.25
maximal uncertainty 0.25 0.25 0.25 0.25
He = 0.25, Em = 0.60 Ig = 0.35 Ie = 0.58
II attributes (81 or a2) (a3 or 84)
probability estimates 0.00 1.00
maximal uncertainty 0.50 0.50
}IeO.OO Im = Hm = 0.30 Ig = 0.30 Ie - 1.00
III attributes (81 or a3) (a2 or a4)
estimated probabilities 0.75 0.25
maximal uncertainty 0.50 0.50
He=O.25 I H =0.30 I =0.05 I =0.17
m g e
As we would expect, even though formulation II has the least uncertainty about its attributes, it has less information gain than formulation I. Because formulation II has least uncertainty, however, it is the most efficient use of the possible information gain from the variable.
So far we have attempted to get a measure of the information we may have about something in terms of the estimated probability of some alternative propositions to be true, or some alternative states of nature of a variable being applicable to a given event. Before going on to substantive propositions let us briefly discuss possible sources of information. On what basis, then, can we make the choices between propositions, i.e. assign probabilities estimating their likelihood of being true. The three alternative sources of information we shall discuss here briefly are: measurement, manipulation and prediction. There is a rather extensive literature on the theory and justification of measurement. Rather than getting entangled in the rather complicated issues involved, let us just point out some of the differences between the three sources of information.
Measurement involves some operation on the event, such as observation and comparison, as well as some set of decision rules, on the basis of which we can make some judgment about the attributes applicable to the event. An essential restriction involved in measurement is the assumption that the relevant attributes of the event do not change under the operation of measurement so that the judgment about the attribute applicable to the event is not determined in part by the operation of measurement.
Manipulation and experimental control, on the other hand, involve some operation on the event in question that is designed to determine the attribute applicable to the event. (Such as heating an object to determine its temperature). In this case, therefore, the essential restriction is the assumption that the operation on the event has actually produced the attribute desired. Both measurement and manipulation then are operations on the event itself, in the presence of the object or event in question.
Prediction, however, does not necessarily involve the presence of the event in question or any operations on it. Rather it is the utilization of other sources of information to make the judgment about the attributes of the event. Most prediction, however, is conditional in the sense that the judgment about a given set of attributes in reference to the specified event is made in terms of and dependent on information about other attributes of the given event. The restriction in this case is that the judgment about the attributes applicable to the event is made independent of and without information due to measurement or manipulation of the particular attributes under consideration.
Predictive theory, then, is the set of rules, statements and propositions that specify how to make the given prediction, how to make judgments assigning probabilities to the truth value of the alternative propositions. Since most prediction is conditional in the sense that the judgment about a given set of attributes is made in terms of and dependent on information about other attributes either of the given event or of related events, one of the functions of predictive theory is to specify what attributes shall be used. Another of the functions of the theory is to provide the rules or lists that specify how to convert the information about these attributes into judgments about the given set of attributes.
Let us go a bit further into this notion of the predictive or information transducing theory. Let us assume that we want information about event e in terms of attribute dimension B with states (b1, b2, b3). Let us also assume that the theory specifies that information about attribute dimension A, (a1, a2, a3, a4), can give us information about variable B. (The attribute dimension we want information about is usually called the ‘dependent’ variable, while the dimension which provides the knowledge to make the prediction is called the ‘independent’ variable.) Let us also assume that both attribute dimensions apply to event e. Having some knowledge about which of the attributes (a1. . a4) is applicable to event e should now help us select, with use of the specifications provided by the theory , which of the attributes (b1 … b3) is also applicable to event e. Let us assume for the moment that the proposition ‘event e has attribute a’, is true. Our theory, then, has to specify which of the propositions ‘event e has attribute b1’, ‘event e has attribute b2’ or ‘event e has attribute b3’ is true. Again, the simplest case would be if the theory gives certain knowledge about which of the propositions is true, such as ‘event e has attribute b1’. In this case the selection rules of the theory could be expressed in terms of the conditional propositions: ‘if event e has attribute a1 then event e also has attribute b1'. Applying the same reasoning to each of the states of the independent variables, we would get one such conditional proposition for each of the independent variables, associating it with one of the states of the dependent variable.
Now this list between states of the independent variable and states of the dependent variable could also be represented as a transformation from the states of the independent variable to the states of the dependent variable:
independent variable a1 a2 a3 a4
dependent variable b1 b2 b2 b3
The theory then would represent the operator or ‘transducer’ that transforms states of the one variable into states of the other.
Still assuming that we are certain that event e has attribute al we may not be certain that it also has attribute b1. Rather, we may want to assign probabilities estimating the likelihood of occurrence for each of the states of the dependent variable, as discussed above.
Given that ‘event e has attribute a1’ is true:
dependent variable b1 b2 b3
probability estimates 0.80 0.20 0.00
maximal uncertainty 0.33 0.33 0.33
He = 0.22 Im Hm = 0.48 g a 0.26 te = 0.54
In this situation Hm indicates the uncertainty if we had no knowledge about which of the attributes of the dependent variable event e would have. He is the uncertainty remaining even after the application of our theory, while 1g indicates the gain in information due to application of the theory.
For this sort of case, our theory would have the format of a matrix giving the transformation probabilities from the states of the independent variable to the states of the dependent variable:
a1 0.80 0.20 0.00
independent a2 0.35 0.33 0.32
variable a3 0.10 0.25 0.65
a4 1.00 0.00 0.00
Now for each state of the independent variable we can calculate the residual uncertainty and thus the information gain by the theory, as well as its efficiency.
H H I I
e in g e
a1 0.22 0.48 0.26 0.54
independent a2 0.48 0.48 0.00 0.00
variable a3 0.27 0.48 0.21 0.44
a4 0.00 0.48 0.48 1.00
Average information gain: Av. I = EI = 0.24
Average efficiency: Av. Ie = E Ie = 0.50
From this we can calculate the average information gain for the theory, which is 0.24 and compare this to the maximum information gain possible for the given definition of the dependent variable, Im = 0.48. The 0.48 information gain would be obtained if the predictions did not include any uncertainty, i.e. if they were of the conditional proposition form as discussed above.
For the above case we have assumed that we had definite knowledge about the state of nature of the independent variable applicable to the event considered. That assumption of course is not necessarily justified since that information itself has to be obtained by means of measurement, manipulation or prediction, each of which has some inherent uncertainty. Let us therefore deal with the case where we have uncertainty about the state of the independent variable applicable to our given event e. Let us assume we have been able to assign the following subjective probability estimates:
independent variable a1 a2 a3 a4
probability estimates 0.45 0.35 0.20 0.00
maximal uncertainty 0.25 0.25 0.25 0.25
H — 0.48 = I = 0.60 i = 0.12 I = 0.20
e in g e
Applying these to our theory, i.e. to the matrix giving the transformation probabilities, we simply follow the rules for conditional probabilities as the above probabilities give the probabilities for the occurrence of the respective states of the independent variable, while the transition probabilities are the probabilities of occurrence of respective states of the dependent variable for given states of the independent variable. We therefore multiply each transition (row) distribution by the probability of its occurrence, and these sum the probabilities for each state of the dependent variable.
variable b1 b b
estimated ( ) (0.45)a1 (.45)(.80) (.45)(.20) (.45)(.0O)
probabilities ( ) (0.35)a2 (.35)(.35) (.35)(.33) (.35)(.32)
for independent ( ) (0.20)a3 (.20)(.10) (.20)(.25) (.20)(.65)
variable ( ) (0.OO)a4 (.00)(.OO) (.00)(.00) (.0O)(.O0)
Summing probabilities for each of the states of the dependent variable gives us the predicted probabilities for the states of the dependent variable for event e given the set of estimated probabilities for the states of A, the independent variable, and given the theory specifying transformation probabilities from A to B. This then gives us the predicted probabilities of B for event e:
dependent variable b1 b2 b3
prob. predicted for e 0.50 0.26 0.24
max.uncertainty 0.33 0.33 0.33
H = 0.45 H = I = 0.48 I - 0.03 = 0.06
e in m g e
Ig = 0.03 then, represents the net information gain from an application of the theory with uncertainty in the measurement. It compares with an average information gain of 0.24 for the theory without uncertainty in the independent variable, and 0.26 for information gain in the independent variable.
It is clear, then, that the usefulness of the theory depends very much on the residual uncertainty in the independent variable, that means that a theory that would provide a substantial gain in information about a given dependent variable on the condition that we could accurately and reliably measure the states of the independent variable loses most of its usefulness if there is too mcuh uncertainty in the assignations of the independent variable. This in turn would also apply that a theory of this nature could be tested reliably only to the extent that we can provide measures with sufficiently low residual uncertainty.
So far we have discussed theory as the problem of making predictions and decisions about the attributes of a single event. At the same time we have used probabilities as a measure of the likelihood of alternative propositions about the attributes of the event as being true. A single event however can only have one of the alternative attributes as they are mutually exclusive. So on basis of the probabilities we still have to make a judgement about which of the attributes we will predict for the event. This judgement can be arrived at on the basis of a simple decision rule which selects the attribute that is most likely to apply, i.e. the proposition that is most likely to be true. In this case we would use our probability distribution as well as our information gain to tell us something about the likelihood of having made the wrong choice.
In most situations, heaver, we deal with classes of event rather than with a single, unique event. In this case the measure, the set of alternative attributes serves as a partition function, on the set of reference events, grouping or partitioning them into subsets with the same attribute. We still, however, have the problem of assigning the single event to one of the subsets. In this case the probability distribution can be used to estimate the uncertainty about the ‘true’ attributes of the events assigned to a given class.
In the case of assignment by measurement or control the probability distribution represents an estimate of the reliability and validity of the measuring or manipulation operation. The values of the probability distribution might then be assigned through tests of reliability of the operation or through comparison with other, independent measuring or control operations.
In the case of theory, the distribution is assigned through the specification of the theory and its application to the measurement. In this case the distribution can be used as an error function if we use the theory to make definite, determine! predictions, where, as discussed above, the predicted value is determined through maximization of the probability value. Both selection of the attribute and the residual uncertainty estimate can then be tested through independent measurement of the dependent variable.
On the other hand, the probability function generated by the theory and its application to the measure can also be used as the prediction. In this case the uncertainty is part of the prediction and can be compared with the uncertainty in the values generated by the measure. This comparison, however, is somewhat more complex since besides the uncertainty contained in the probabilistic prediction there also will be an uncertainty generated through application of the measuring operation on the dependent variable of the set of reference events. The test therefore has to take into consideration the uncertainty about the predicted probability distribution generated by the measurement.
As has become evident in the above discussion, there are quite a number of issues and problems, especially in regard to the assignation of the probability estimates that still have to be worked out for this model. For instance, we do not always start with total uncertainty about a given set of alternative propositions, but rather we may want to examine the information gain due to relation with a given other variable, to see whether the variable should be included in the theoretical model under consideration. But these issues go beyond the scope of the present paper.
Before going on to the substantive considerations, it is hoped that it has remained clear how the above model presents a method of conceptualizing variables and the uncertainty of their measurement assignations, as well as conceptualizing relationships among variables as might be posited by a given theory. It is also hoped that it has remained evident how the above model solves the problem of measurement and theoretical error by integrating the notion of error into the notion of measurement and prediction in terms of the uncertainty in assignation of attribute to the given event.
The problem of test of the theory has not been discussed adequately above, but it is hoped that the line of argument one would take, in terms of average information gain, as well as efficiency, is clear.
In this section of the paper we shall explore the possibilities of constructing theories about the socio-emotional behavior of persons in small group interactions. According to the initial argument we shall have to attempt to find solutions to the four problems, involving conceptualization of the variables in terms of the measuring operation and its attendant assignation of attributes and categorical or numerical representation. The measuring operation we shall consider is the Bales’ Interaction Process Analysis’ technique for scoring conmiunication in small group interactions. We shall conceptualize some alternative theoretical relationships of other concepts, derived from considerations of role-status notions as well as from considerations of sequential processes interaction, with the variables expressing the socio—emotional behavior of persons in the interaction. To solve the conceptual and operational problems we shall employ the above meta-theoretical paradigm of measurement and theory as information gain. We shall thus be concerned to gain information about the supportive and rejective behavior of members in a small group interaction. Let us go on, therefore, to consider this variable in more detail.
Socio-emotional involvement refers to a cluster of dimensions or attributes. In the Bales scoring technique it may refer to an evaluation of what the other person has said or contributed to the group task, it may refer to a spontaneous emotional response such as in the show of tension or tension release, or it may refer to an attempt to bolster or reduce the other persons status in group integration. It is beyond the scope of this paper to differentiate between these various dimensions, so that we shall cluster all communication in the socio-emotional area as defined by the Bales scoring technique, and differentiate only between positive, (support) and negative, (rejection) communications between the members of the group. We do not, for instance, differentiate whether the support is intended as support of the person himself or strictly as support for the ideas advanced. Also we do not deal with felt evaluations or preferences for other members as in sociometry but restrict ourselves to evaluations that are expressed in the course of the interaction, in the flow of communication.
In terms of the Bales system, the broad differentiation we shall make is between task oriented and socio-emotional communication. In terms of the constraints of this context, then, our problem is to investigate the characteristics of the socio—emotional aspect of communication in group interaction. The question we shall be most directly concerned with is how different persons differ in their socio-emotional communication in the group interaction. In other words how do people compare in their socio-emotional communication. Another restriction on the way in which we shall consider socio—emotional communication is that we shall deal only with the amount of the communication but not with its intensity. So our problem reduces to finding information about how persons compare in the amount of socio-emotional communication they originate and receive in the small group setting.
The amount of communication, again derived from the Bales scoring system, is conceptualized in terms of the definition of a unit of communication as the smallest discriminable, separately meaningful bit of communication. This unit under the assumption that it has a single meaning, is classified by speaker and addressee, and under the assumption that it has a single meaning by content in one of 12 exclusive categories, which we shall group into three: task, support and rejection.
To restate the discussion above, we have redefined our notion of socio-emotional involvement in terms of the notion of the amount of supportive and rejective communication as generated by the Bales scoring technique. Now the reader may want to question the usefulness of redefining our theoretical concept or attribute in terms of the highly restrictive Bales system. Unfortunately at the present state methodology in the social sciences we do not have measuring devices that are not highly restrictive and thus lose much of the original content of the variable or attribute set. On the other hand, it is useless and meaningless to define an attribute unless we can make decisions about its application to events by some sort of measuring, observation operation
Let us go on now to consider how we can further specify our communication dimension to compare the participation of the members in the group interaction. We shall begin by specifying how we can partition our attribute ‘amount of socio-emotional communication’ and then go on to discuss some of the restraints in the measure.
As discussed above, the basic event of reference is the single unit of communication, which defines the universal set of reference for the interaction. The interaction thus is conceptualized as a sequence of units of communication. These units are classified in terms of speaker, recipient of the communication, content in terms of the three categories of ‘task’, ‘support’ and ‘rejection’, and in terms of the sequential occurrence of the unit in the interaction. Thus while our original problem is defined with the group members as the basic set of reference, and socio—emotional communication as an attribute to describe the involvement of these persons in the group, the Bales system has the single unit of communication as the set of reference, with speaker, and recipient, content and time order as measures on this set.
Thus we have:
a) the universal set, made up of all the units of communication in the interaction: U=(u1, U2. . . . . . . . . . . . un)
b) the measures, or attribute sets defined on this universal set:
1. the set of speakers or originators of the given unit:
sp = (0, A, B, C. . . . .)
where A,B,C etc., are group members, and 0 represents the group as a whole. The assumption is that a unit is either originated by one and only one of the group members or by the group as a whole. Thus the elements of this set are seen as alternative and inclusive, thus fulfilling our condition for a measure on the set.
2. the set of recipients of the given unit:
REC = (0, A, B, C. . . .)
where again A, B etc. are group members and 0 the group as a whole. Again the assumption is that a given unit is addressed either to one and only one of the group members or to the group as a whole, so that this set also fulfills the conditions of a measure.
3. the set of alternative contents of the unit of communication:
CB=(l, 2 . . . . . . . . . .12), the set of 12 Bales categories, for which again we have the assumption that a given unit has one and only one type of content. We have partitioned as follows:
S = (1, 2, 3); T = (4, 5, 6, 7, 8, 9) and R (10, 11, 12)
where S is support, T is task, and R is rejective communication. We therefore now have the partitioned content set:
CONT = (5, T, R)
4. the set of time—sequential ordering of the units in the interaction:
TS = (1, 2, 3 n)
Here the assumption is that all the units are ordered over time so that for any two units we can establish that one came before or after the other. This yields an ordinal scale which we have represented (mapped) into the set of integers greater than zero, so that a greater integer implies a later occurrence, where later than is represented by ‘greater than’. This gives us a one to one function with the numbered set.
Now the three attribute sets of speaker, recipient and content each are one to many functions on the universal set, i.e. they are partitioning functions on the set of units of communication, so that each attribute of the given attribute set defined a proper subset of the universal set.
partition by speaker: Universal set:
for SP (0, A, B, C, D) Sp-O Sp-A Sp—B Sp-C Sp—D units of
where subset Sp—A is the subset of U, the universal set, whose units are characterized by being originated by A, an element of the set of speakers.
partition by recipient: Rec—O --------U
for REC (O,A,B,C,D,) Rec-A
where subset Rec-A is the subset whose units are characterized by being addressed to A, an element of the set of recipients.
partition by content: Ct—S --------U
for CONT=(S,t,R) ct-t
where subset Ct_S is the subset whose elements are characterized by being of content type S.
Coming back to our problem, we wanted to investigate socio-emotional communicative behavior of persons in group interaction. Above, we have restated the problem in terms of gaining information about how persons compare in the amount of socio-emotional communication they originate and receive in the small groups setting. For this problem, therefore, the group members form the set of reference, while amounts of socio-emotional communication originated and received is the attribute applicable to them, about which we want to gain information. Let us discuss therefore how we can transform information obtained through the Bales scoring technique about the attributes of units of information into information about the amount of socio-emotional communication originated and received by persons in the group.
Let us start with the dimension: ‘amount of communication’. This particular variable can be defined as a measure on any given set containing units of communication as elements of the set. It yields a single, unique value by taking the cardinality of the set, i.e. by counting the number of units of communication contained in the given set. This measure, then, while not an attribute of a single unit of communication, is an attribute of a set of units of communication characterizing the quantity of communication contained in the set. This particular measure is somewhat different from the attribute sets discussed above in that it specifies the set of integers, an infinite set, as the set of alternate attributes. Since however it yields a single value, i.e. since one and only one of the &ternate integers out of the set of integers can characterize the cardinality of the given set of units of communication, it satisfies the conditions for being a measure. The only difference then of this particular measure is that instead of having to make a decision about the truth value of a finite set of propositions, in this case we have an infinite set of propositions. To the extent that there is no uncertainty in the choice, this does not present special problems. It makes it very difficult, however, to arrange a probability distribution over the alternative values of the variable in cases of uncertainty. We shall come back to this later in discussing modifications and redefinitions of the variables to deal with this problem.
Now the universal of units of communication for a given group interaction is partitioned by the three variables, speaker, recipient and content as discussed above. Taking the partition by speaker for instance, and selecting the attribute ‘originated by person A’, the partition function specifies the proper subset of the universal set of units of communication whose units are characterized by being ‘originated by person A’. Applying our above measure of ‘amount of communication’ to this subset, we obtain a value for the ‘amount of communication’ that characterizes this subset of units that is also characterized by being ‘originated by person A’.
We thus have two attributes of this particular of units of communication. On one hand we know the “amount of communication it constitutes, and, on the other hand, we know that all the units of communication within it are all ‘originated by person A’. This information about the attributes of this set of units of communication, however, can be converted into information about an attribute of person A. Thus we can define the dimension: ‘amount of communication originated’, as an attribute dimension characterizing person A, the relative participation of that person in the given interaction. The cardinality of the set of units of communication characterized by its elements having been originated by person A then is a measure of that attribute dimension. Since, as discussed above, the cardinality of a set yields a single value, it fulfills his conditions for a measure.
We can now take the intersection of the two sets ‘units originated by speaker A’ and ‘units whose content is supportive’. By applying our measure of ‘amount of communication’ to this intersect, we get a value for the amount of communication that is both ‘supportive’ and ‘originated by speaker A’. By reasoning similar to that above we can use this as a measure of the attribute ‘amount of supportive communication originated’ as applied to person A. By similar procedure and similar reasoning we can also generate information about the attribute dimensions: ‘amount of rejective communication originated’, ‘amount of supportive communication received’ and ‘amount of rejective communication received’, which gives us four measures or attribute dimensions specifying something about the socio-emotional involvement of the persons in the group interaction in terms of their communicative behavior.
It should be evident that by similar reasoning we can define and measure the attributes: ‘amount of task communication originated’ and ‘amount of task communication received’. Since the attribute ‘amount of communication originated’ applies to all the units of communication, regardless of content, originated by a given speaker, we shall call this attribute the ‘total amount of communication originated’. Similarly we can define the attribute ‘total amount of communication received’.
This set of attributes derived from Bales Interaction Process Analysis is by no means exhaustive. So far, for instance we have not made use of the information about the temporal ordering of the units of communication in the interaction. But we shall be coming back to this later.
At this juncture, let us delve a little deeper into the notion ‘amount of communication’. The problem with this measure, as presently defined in terms of the cardinality of a set of units of communication is that it specifies an infinite set as the set of alternative attributes. As discussed above, this only becomes problematic if there is uncertainty in the choice of the attribute applicable the given set. For instance, for the case of total uncertainty, with an infinite set, we would have a probability of zero (l/~.€.~) for each given alternative element.
We can, however, through partitioning and grouping the alternatives, transform the set containing an infinite number of alternatives into a set continuing a finite number of collections of alternatives. We could then, have a finite number of propositions about an event having an attribute contained in the given subset of attributes. We would thus have redefined our attribute set into one containing a finite number of alternative collections of attributes. As long as this set fulfilled the condition that any given attribute belonged to one and only one of the subsets, it would fulfill the conditions as measure or attribute dimension.
The next problem would be how to define these collections of alternatives. That means that we have to specify rules of assignment determining which attributes belongs to which of the alternative sets. Essentially this can be done by listing the attribute elements for each of the sets, In the case of an ordered or continuous variable, such as our ‘amount of communication’, this can be done more conveniently by specifying bounds for each of the sets. By bound we mean the specification of minimal and maximal elements such that the elements contained in the given set have to greater than or equal to the minimal element and similarly less than or equal to the maximal element. Determination whether a given attribute value belongs to a given set then occurs by determining whether it is greater than the minimum and less than the maximum specified for the set.
The problem with our attribute set ‘amount of communication’ as defined above, is that it has only a lower bound, namely zero, but no upper bound. This makes it impossible to set up subsets that are bounded, there would be values in the attribute dimension that would not be contained in its subsets, i.e. those values that are greater than the upper bound of the subset containing the largest values.
Let us discuss for a moment, then, why it is important that the subsets be bounded. First, as discussed above, the bounds of the given subset are crucial in determining whether a given attribute belongs to the particular subset or not. If the subset is bounded then we can say that the given value belongs to the set if it is greater than or equal to the lower bound of the set and less than or equal to the upper bound of the set. Now this consideration by itself does not eliminate the possibility for a given collection of attribute subsets to contain one or two subsets that are bounded only at one end. Thus the subset specifying values that are larger than values specified in the other subsets would not have to have an upper bound in that the decision rules for membership in that particular subset might be simply that the value of the attribute must be equal to or greater than the value of the lower bound of the given subset. Similar reasoning might apply to the attribute subset containing the smallest values of the attribute.
With an attribute set containing unbounded, i.e., infinite, subsets a problem arises in the attempt to assign probabilities, especially for maximal uncertainty. As stated above, the rules for assignment of probabilities for maximal uncertainty are that each alternative attribute is equally likely to occur, and that the sum of probabilities for all the alternative attributes of the given dimension equal one. This implies that the probability that the attribute c.f. the given event belongs to any of the subsets specified is equal to the proportion of all attributes in the dimension that are contained in the subset. This would fulfill the condition that the sum of the probabilities assigned to the subsets equal one. The problem then reduces to determining the proportion of the attributes belonging to the given subset. While this is comparatively easy for finite sets, it presents difficulties in the case of infinite sets. In the case of an attribute set containing both bounded and finite, as well as unbounded and therefore infinite subsets, this proportion would be zero for the finite sets, regardless of how the boundaries were determined.
But this is precisely the case for all the attribute sets derived from and depended on the measure ‘amount of communication’, since this measure maps the cardinality of the given set into the set of integers, a set which has a lower but no upper bound. Let us go on to discuss some ways of avoiding this dilemma by using some information about the particular interaction under consideration and by redefining our attribute dimensions.
Let us consider for example, the attribute dimension ‘amount of supportive communication’. This refers to the cardinality of a set containing units of communication whose content is supportive. This set also is a subset of the universal set containing all the units of communication for the interaction of the specific group session under consideration. The cardinality of this universal set, therefore, the ‘amount of communication’ referring to the whole interaction must be an upper bound for the dimension ‘amount of supportive communication’ referring to the same interaction. So if we consider as an attribute dimension the ‘amount of supportive communication’ relative to the ‘total amount of communication’ for a given interaction, we do obtain a set that is bounded at both ends.
We now have a finite set of alternative ‘amounts of supportive communication’, in specific the set of integers bounded by zero at the lower end and the ‘total amount of communication’ for the whole interaction at the upper end. We can therefore assign uncertainties to the individual estimates, as well as partition the set. For convenience, we can also express the new relative attribute scale as the proportion of the total amount of communication that is supportive. This measure maps into the set of rational numbers between zero and one, and with the cardinality of the universal set as the largest common denominator.
This new attribute dimension, however, is different in nature from those discussed above, in that it is not a set of attributes of a single dimension applicable to a single event, or set of events. Rather, its relative nature implies that it specifies some characteristics of the nature of the relationship between one or more events or set of events. So in this case we no longer deal with the characteristics of the event or set of events under consideration, but rather with the characteristics of the relationship between the given events or set of events.
Since this type of relational attribute dimension deals with the characteristics of the relationship between two or more objects or events, it is of course not meaningful to apply it to a single object or event. A more common sense example of such an attribute dimension might be the spatial arrangement of two objects A and B. Thus we might have two alternative attributes: ‘to the right of’ and ‘to the left of’. Assuming that the two objects cannot occupy the same space and that we are considering only one spatial dimension, horizontal and at right angles to the direction of observation, these two alternative attributes are not only mutually exclusive but also are complimentary, i.e. exhaust the possible relationships. Thus they fulfill the conditions for an attribute set as discussed above. That means that we can set up alternative propositions about the objects, which have to be true or false. As discussed above, however, it is not meaningful to assert the attributes of a single object, (e.g. A is to the right of ), but only of the two objects in combination: A is to the right of B. Also, again in contrast to the usual form of alternative attributes, these two alternative attributes yield not two, but four propositions about the given objects: ‘A is to the right of B’, ‘A is to the left of B’, ‘B is to the right of A’, and ‘B is to the left of A’. Since however, the meaning of the proposition ‘A is to the right of B’ is identical to the meaning of the proposition ‘B is to the left of A’, and similarly for the other two propositions, the two alternative attributes still give us the choice between only two alternative characteristics of the relationship between A and B. The information content of the relational attribute dimension, then, is determined by the number of alternative propositions we have to choose between, by the number of mutually exclusive alternative states of the relationship between the given objects or events it specifies. It is this set of alternative relationships that determines the set of alternative attributes applicable to the set of objects or events of reference. Uncertainty, again is expressed in terms of the likelihood of the given specified relationship as being applicable to the given set of objects or events, with maximal uncertainty if all the alternative states of the relationship are equally likely to apply.
Let us return now to our consideration of the ‘amount of supportive communication’ in relation to the ‘total amount of communication’. In terms of our discussion above, we first have to locate the sets of events which are specified in the relationship. In this case we are talking, then, about the relationships among the cardinality of a collection of subsets making up the universal set. In this case the subsets are defined in terms of the partition function of content on the units of communication making up a given interaction.
If we let XA stand for the cardinality of any given Set A, then we know that: + XCtT + Xu, or, in terms of the proportional measure discussed above, Xct_s/Xu + + = 1. The possible combinations of the alternative states of the relational attribute dimensions therefore are given by the triple (XCt_S/Z1J, XCt_T/XU, Xct_R/XU) that satisfy the restraint given by the equation above. We also know that each of Xct-s, XCt_T and Xct_R refer to attribute dimensions of the content of the interaction where for instance refers to the ‘amount of supportive communication’ in the interaction as a whole. We also know that each of these attribute dimensions are mapped in the set of integers from zero to Xu as lower and upper bounds, so that there are + 1 possible alternative states of nature for each of the attribute dimensions.
Even though the relational attribute dimensions are restrained in terms of the states of the triple that fulfill the equation above, we do want to use the variables separately, and therefore have to calculate the uncertainty associated with each of the possible states of the given variable in terms of the combinations of the triple, as proportion of all combinations possible, that correspond to the given state of the variable. We are not, however, differentiating among the variables at this point, since we are not interested in the possible orderings among the specific variables but rather in the different relationships possible, so that we can derive an expression for uncertainty that is applicable identically to all three variables. That means we want to calculate the possible combinations of values, regardless of their ordering within the triple, that satisfy the restraints. These combinations of values give us the different relations possible among the three variables. The probability of one of the variables assuming one of the specified values is given by the relative frequency of occurrence of that particular value or state.
Let us assume for illustrative purposes that = 6. The different triples we could obtain are: (0,0,6), (0,1,5), (0,2,4), (0,3,3), (1,1,4), (1,2,3) and (2,2,2) so that the alternative values yield the following relative frequencies: 0 = 5/21, 1 = 4/21, 2 = 5/21, 3 3/21, 4 = 2/21, 5 = 1/21 and 6 = 1/21. This then would give us the probability estimates for maximum uncertainty about the rates of the relationship among the three variables. The higher probabilities are at the lower end of the scale as one would expect. To generate the actual probability distribution for a particular interaction, and thus f or a particular value of Xu we shall use simple numerical methods on the computer.
The particular probability distribution for maximum uncertainty of course applies to all the variables involved in the particular relationship involved. It would change considerably, however, if we started with or assumed some knowledge about, i.e. some added constraints on the relationship between the variables considered. It should be clear as well that the maximum uncertainty probability distribution remains the same for the proportional relational measure.
The next problem is the partitioning of the particular relational attribute dimension we want to use, such as the ‘amount of supportive coimnunication in relation to the total amount for the interaction’. These are two simple methods. One is to divide the attribute dimension into exclusive sets containing equal or nearly equal numbers of alternative states. For the example above, we might partition in the following manner:
attribute dimension 0 (1/6,2/6) (3/6,4/6) (5/6,1)
prob. for maximal uncertainty 5/21 9/21 5/21 2/21
In cases where we have many more alternative states, we can partition the attribute dimension into subsets containing more nearly equal numbers of alternative states. For specific problems, we can of course, find other, more useful ways of partitioning.
The other method is to partition the variable into subsets of equal maximum uncertainty, so that under maximal uncertainty each attribute subset is equiprobable. Because of the small number of alternative values, this is difficult to do for our example above, but for illustration we could group the above attribute set as follows:
attribute set 0 1/6 2/6 3/6 (4/6,5/6,1)
prob. for maximal uncert. 5/21 4/21 5/21 3/21 4/21
We can calculate the information gain as follows. Since information gain equals the difference between estimated and maximal uncertainty,
= 11m - H, and Urn as well as He is based on H = - Pi log Pi, we can calculate the information gain Ig - Pie log Pie Pim log Pim)
We have spent so much space on the derivation of an attribute dimension from the measuring operation since, as discussed above, we have assumed that the meaning of the concept or variable is limited by the measuring operation in terms of which it is defined and in terms of which attribute assignations are made to the given set of objects or events. Similarly any gain in information is based on the specific nature and meaning of the attribute set and its numerical representation for the knowledge it can yield about the set of objects or events under consideration.
Having established the essential format of an attribute dimension and measure based on conceptualization underlying the Bales scoring system, let us return to our substantive research problem. We were concerned to gain information about the supportive and rejective behavior of members in small group interaction. We found that we had to restrict ourselves to measures based on the amount rather than the intensity of communicative behavior. Also we found that we could not use any direct measure of the amount of supportive or rejective communication, but rather that we had to derive relational, i.e. relative measures. Let us now go on to consider some such alternative variables and what information we can gain in terms of them, both in terms of measurement and prediction.
Clearly with relational measures, as we cannot gain information about a single attribute dimension but can gain information only about the relationship among two or more attribute dimensions of the given set of objects or events considered, it is clearly quite critical what attributes we consider in relation to each other. In the remainder of this paper we shall consider two sets of relational measures to give us information about the socio-emotional involvement of a given person in the group. It should be clear that these two do not at all exhaust the possibilities of alternative relational measures derivable from the Bales scoring system.
The first of these considers the amount of socio-emotional communication originated by a given person in relation to the total amount of communication originated by that person. This attribute dimension parallels the one developed above, except that in this case the sample set is the total amount of communication originated by the given person, i.e. a subset of the universal set obtained from the speaker partition. (A similar measure, of course, could be derived based on the given person as the recipient of the communication, i.e. using the recipient partitioning to define the sample set.) This sample set then is partitioned by the content measure to the subsets containing units of communication originated by the given person that are supportive, task oriented or rejective in content. By using the cardinality of these four sets we can define four attribute dimensions: ‘amount of total communication originated’, ‘amount of supportive communication originated’, ‘amount of task communication originated’ and ‘amount of rejective communication originated’. The reference for these attribute dimensions are the persons participating in the group discussion. The relationship, or restraint among these attribute dimensions is that the sum of the content defined amounts of communications equal the total amount of communication originated by the given individual. By taking the proportions of the content defined amounts of communication in relation to the total amount of communication originated, we have three relational attribute dimensions such that knowledge of the states of any two of them gives us knowledge of the state of the third. The two variables ‘support of proportion of total amount of communication originated’ and ‘rejection as proportion of total amount of communication originated’ thus give us information about the relative amount of communication originated by the individual that was socio—emotional in content.
The second set of relational measures is concerned with the comparative amounts of socio-emotional communication originated by the different persons in the group. It uses as basic sample set the collection of units of a given content, say supportive. (Thus rather than the above measures, which were applicable to different individuals, this measure is applicable to the different types of content.) This set now is partitioned in terms of the alternative speakers originating the given units. We thus have the sample set containing all the supportive (or rejective) units of communication, as well as one subset for each of the group members. Again we use the cardinality of these sets to define the attribute dimensions: ‘amount of support originated by person A’, ‘amount of support originated by person B’, etc., and ‘total amount of support’. Also, by using proportions of the total, we obtain the relational attribute dimensions ‘proportion of total amount of supportive communication originated by person A’ which give us the comparative origination of support by various group members. Similar reasoning gives us the measure for comparative origination of rejective or task communication. (A similar set of measures can of course be obtained about the reeeption of communication by the various group members.)
We can calculate uncertainty probabilities for the alternative relational states for both the sets of measures by the method illustrated above. Similarly, we can partition the variables by either of the two methods illustrated above, for equal number of alternate states, or for equal probabilities for maximal uncertainty. We can also calculate and compare information gain for the various measures by the method illustrated above.
Before we go on to consider measurement or prediction in more detail, and how to assign estimated probabilities to the various relational states of the given variables, let us discuss another way of generating alternative set of measures by making transformations on the universal set on which the measures are based.
So far we have derived and applied all the attribute dimensions or measures to the universal set or subsets containing as elements the ‘units of communication’ obtained through the Bales scoring technique. It should be clear, however, that as long as we can partition the elements of the various sets by speaker, recipient and content, and as long as the sets are mutually exclusive and the cardinality of the sets meaningful, that it does not matter how we define or transform the elements of the sets, i.e. the universal set.
Let us consider one such transformation that consists of a partitioning of the universal set in terms of the temporal sequence of the elements as well as the speaker and recipient measures. Thus we may define as new element of the universal set, “the statement”: the sequence of temporally (sequentially) contiguous units of communication having the same speaker and recipient. This new element of ‘statement’ thus includes all the units of communication that one person addresses to a given other in continuous sequenee, or, in other terms, the sequence of units that are bounded by when the given person starts addressing the specific other person to when he stops, either by addressing someone else or by someone else starting to communicate. Any ‘statement’ therefore contains one or more units of communicat ion. Thus the collection of statements in the interaction correspond to a sequential partitioning of the units of the universal set.
The measures for the statement, however, are somewhat different from the measures for the single unit. The speaker and recipient measures are still the same as they still specify alternative, mutually exclusive attributes of the statement. The content measure, however, no longer specifies exclusive alternative contents as a given statement may contain several units each of which could have a different type of content. One thus has to devise a new set of rules to separate different statements according to content. Now since each statement is a set of units of communication, one could use the above measures for the proportional amounts of content for a given set, but for most purposes it is more convenient to simply posit some rules that specify alternative and mutually exclusive content categories.
The rules we shall use are as follows:
1. a statement is ‘supportive’ if it contains at least one unit of supportive communication.
2. a statement is ‘task oriented’ if it does not contain any units of supportive or rejective communication.
3. a statement is ‘rejective’ if it contains at least one unit of rejective communication.
4. a statement is ‘socio-emotional, mixed’ if it contains at least one unit of both. supportive and rejective communication.
It should be clear that this set of rules specify four mutually exclusive categories of content that include all the possible alternatives and that therefore that the attribute dimension based on them fulfills the conditions of a measure.
We can now apply the two sets of measures developed above, except for the change due to the content dimension applicable to the statement having four alternate states as compared to the three applicable to the single unit.
Finally, let us discuss how we gain information about the persons and the group process in terms of the variables specified above.
First, in terms of measurement. The use of the Bales scoring technique is adequately described elsewhere. The problem we face at this juncture is how to assign estimated probabilities to the alternative states of the variables as applying to an individual for an interaction. Now for any given individual, for any given group session, we obtain a single attribute alternative for each of the given measures. Thus for a given individual in a given group session for a single set of Bales scores for the interaction, there is no uncertainty about what state of the given variable applies to him. Thus on the assumption that the Bales scoring results give us correct information, and assuming that we have not made any predictions, we can calculate the information gain from the measurement by taking the difference between the certain assignation made by the measurement with the maximal uncertainty inherent in the measure.
Also, in case we have used the variable partitioning with equal numbers of states for each subset, we can compare the state obtained from the measurement with the state predicted by the probabilities of total uncertainty, i.e. the state with the highest probability.
One way to test the uncertainty inherent in the measurement assignation for a specific individual in a given group is by doing a reliability test on our measurement system. For the Bales technique, this can be done through having more than one observer score the particular group interaction under consideration. One can then use the relative frequency of assignation of the given individual to the different states of the variable as estimated probabilities for which states of the given variable applies to the given individual. Again, we obtain the gain in information by comparing these probabilities to those obtained for maximal uncertainty.
Often however, we are not concerned with obtaining information about a particular individual. We may be interested, on the other hand, with the characteristics of any individual in the given interaction, or the characteristics of an individual in the given type of group session. Thus we might want to generalize either over individuals or over group sessions. In this case we can use the information about the different individuals or group sessions included in the generalization to provide the replication to assign uncertainty estimates based on relative frequency. One has to be sure, however, that the measures for the different persons, or for the different group sessions are defined identically, i.e. that they specify the same relative states, so that we can be certain that the information about the different individuals on different occasions is comparable. It is here that the proportional form of the attribute dimension becomes important, as well as the basis on which the dimension was partitioned.
Let us go on now to consider the gain of information through prediction. As discussed above, the basic notion of predictive theory as developed in this paper, is the use of knowledge about the state of one variable to predict the state of another variable, on the condition, of course, that the variables are logically or definitionally, as well as operationally independent. This particular condition can become quite problematic for relational variables, as there are a number of attributes involved for each of the relational variables.
As discussed above, there are two assumptions involved in the assertion that knowledge of the states of one variable can help in predicting the state of the other variable applicable. The first of these is that for each of the states of the independent variable we can assign a predicted probability distribution for the states of the dependent variable that differs from the probability distribution obtained for maximal uncertainty. The other assumption is that the probability distribution assigned on basis of knowledge of the object or event having one state of the independent state is different from the probability distribution for other states of the independent variable. That means, we have to make the assumption that it makes a difference what state of the independent variable is associated with the given object or event. In traditional terms this is expressed as a ‘causal relationship’ between the two variables, by which we mean that we assume the existence of some sort of restraint so that the object or event with a given state cf the independent variable can only assume a given sate or states of the dependent variable.
It is this second assumption that is easily testable for the two predictive variables that we are considering, derived respectively from the notions of ‘role’ and from notions based on a ‘dynamic’ or ‘process’ analysis of the interaction as in dissonance or balance theory.
In terms of a variable derived from role notions, second assumption could be restated as the assertion: individuals who have different role— status in the group will exhibit different socio—emotional participation in the group.
To test this assertion we would separate the individuals in the group or groups under consideration into different roles and test whether individuals in one role originate (or receive) different relative amounts of socio-emotional communication. In terms of the psychotherapy groups under consideration for this paper, this might correspond to the role differentiations: male-female, and psychiatrist—patient. Since both of these attribute dimensions yield exclusive and complimentary alternatives, they satisfy the conditions of a measure on the individuals in the group. Since also the assignation of this measure is made independent of the Bales technique, this satisfies the condition for independence of the variables. To test the usefulness of this assertion for predictive purposes, we would compare the information gain for the given variables for the group as a whole, i.e. by finding the probable states for any individual in any of the groups, and compare this with the average information gain for say any female, and any male in any of the groups under consideration. As discussed above, therefore, we use the relative frequency with which any male has a given state of, for instance, the variable ‘proportion of total amount of supportive communication originated’, in any of the groups under consideration.
This method then gives us a comparison in information gain for each of the relational variables defined above.
The case for the assertion about the predictive usefulness of notions about the predictive usefulness of notions about ‘dynamic or process analysis’ such as in balance theory, dissonance theory, or the system by Homans, is somewhat more complicated since it depends on information about the history of the interaction. This information, however, can be generated by putting some restraints on the election of elements in our sample sets. The basic notion behind this approach is simply that knowledge about the history of the affective or socio— emotional interaction between any two given persons gives information about present socio-emotional interaction between the two persons. Thus for instance £n Homans, Heider’s and Newcoinb’s models there is the assertion that knowledge about how a given person A evaluates person B or his opinions or contribution gives us information about how B will evaluate A. A part of this dynamic or process analysis rests on the assertion that most of the communication from B to A is in response to communication from A to B. At this stage we shall explore the usefulness of the dynamic approach only in terms of the immediate history of the interaction, by considering the interaction between two consecutive statements. Thus the basic assertion of this simplified approach is that knowledge of the statement immediately preceding a given statement gives us information about that statement. We shall test this assertion in terms of propositions:
1. That knowledge whether a given statement is in response to a previous statement or whether it represents initiation of communication from the speaker to the given person gives us information about tie nature of that communication.
To test this proposition we have to devise a new relational measure or attribute dimension relating the given statement with the statement immediately preceding to determine whether the given statement is ‘communication response’ or whether it represents ‘initiation’ of communication. The set this measure applies to is the set of all ‘pairs of consecutive statements in the interaction’. But each of the pairs of statements constituting the elements in this sample set, can be partitioned again into the ‘preceding statement’ and the succeeding statement’ in the given pair. The relational measure, then, is applicable to the set containing all the ‘succeeding statements’ in the interaction. It should be evident that this sample set includes all but the initial statement in the given interaction.
The selection rules defining the ‘response-initiation’ attribute dimension applicable to the set of succeeding statements as defined above, are as follows:
1. A statement is ‘in response if it is originated by the recipient of the preceding statement, and if it is addressed to the speaker of the preceding statement.
2. A statement ‘initiates communication’ if it is not ‘in response’. As these selection rules specify two complementary and mutually exclusive attributes, they fulfill the conditions for a measure.
Since the set of succeeding statements is essentially similar to the set of all statements in the interaction, we can use the two sets of measures discussed for it above test whether partitioning the set by the ‘response-initiation’ independent variable yields us a gain in information gain by the dependent variables, and thus whether it is predictively useful.
2. The second proposition based on dynamic or process analysis we shall explore is the proposition: given that the statement is ‘in response’ to the preceding statement, that knowledge about the content of the preceding statement gives us predictive information about the content of the succeeding statement.
We are now no longer interested in all the communication in the given group interaction, but only in the communication that is characterized by being ‘in response’ to the preceding statement. Our sample set thus is a partition on the set of pairs of statements in the interaction, specifying the subset of pairs of statements for which the succeeding statement is characterized by being ‘in response’. We thus have put a conditional restraint on the selection of elements for our sample set.
We now partition this set of pairs of statements in terms of the independent variable, the attribute dimension characterizing the content of the preceding statements, thus obtaining four exclusive subsets of the sample set containing as elements pairs of consecutive statements. Again, each of these elements can be partitioned into the ‘preceding statement’ and the ‘succeeding statement’. We can thus again derive a set of ‘succeeding statements’ for each of the above subsets. It should be clear of course that the cardinality of a given set of pairs of statements is identical. to the set containing the corresponding ‘succeeding statements’, since we have obtained the set of succeeding statements by a one to one transformation on the set of pairs, i.e. one ‘succeeding statement’ identical to the succeeding statement in the corresponding pair of consecutive statements.
To each of these sets of ‘succeeding statements’ corresponding to the content partition on the basis of the preceding statement, we can now apply our two sets of measures as discussed above. We can thus again test whether the differentiation by content of the preceding statement leads to an increase in average information gain about the content of the succeeding statement, as obtained by measurement.
The central concern of this paper was the development and application of a paradigmatic model of theory and measurement in terms of the notion of information gain about the attributes of a given set of objects or events. The purpose of this model is to explore one alternative method for solving some conceptual problems arising out of attempts to construct precise and axiomatic theories in the social sciences. The usefulness of this particular paradigm is that it provides a method for conceptualizing measurement of, and theoretical relationships between variables yielding nominal or simple ordinal numerical representations that do not yield unique transformations for more traditional mathematical operations. It also provides a conceptualization of error in measurement and theory in terms of the uncertainty associated with the given measurement or prediction, and that therefore integrates this notion into the conceptualization of the purpose of measurement or theory as gains in information about a given set of objects or events.
The model also provides a method for quantitizing the notion of uncertainty and information gain, and thus allows us to calculate and compare information gain for different measurements and theories.
In the second part of the paper devoted to the substantive issue of gaining information about the socio-emotional involvement of persons in small group interactions, the model enables us to construct some alternative relational measures of supportive and rejective behavior of persons in group interactions, based on measurement in terms of the Bales scoring technique. It should be clear that the particular measures or variables developed are far from exhaustive, in that other relations among the categories could have been used.
A small selection of possible alternative variables were explored, by developing the concepts: ‘amount of support originated relative to other types of communicative contents’ and ‘amount of support originated by a given person relative to others’. This attribute dimension then was applied to different elements of communication, conceptualized as ‘single units of communication’ as well as ‘whole statements’. Again, far from complete or exhaustive, this argument shows how the precise meaning of the given variable can be altered by applying it to different sets of reference events.
Finally, some alternative predictive theories were constructed, developing not only their conceptualization and formalization but also sketching out briefly the relevant tests of usefulness in gaining information about the given attribute under consideration. Again, the purpose of this argument was primarily exploratory, to point out not only the range of possibilities of alternative theories, but also the increase in logical coherence and precision gained by use of the model.
As this paper is primarily exploratory in intent, no data was supplied, even though the substantive propositions were developed in reference to actual studies of interaction presently under way at U.B.C. under Dr. R.A.H. Robson.