Orange Textable: Difference between revisions
Jump to navigation
Jump to search
(4 intermediate revisions by the same user not shown) | |||
Line 204: | Line 204: | ||
==Example with one url== | ==Example with one url== | ||
[[File:SchemaNew.png|800x600px|center|frame|Figure 1: | [[File:SchemaNew.png|800x600px|center|frame|Figure 1: Complete scheme]] | ||
In this example we will examine the technologies and the frequency that these technologies have been used, according to one of the students during the course Sciences et Technologies de l’Information et de la Communication I (STIC I), Master of Science in Learning and Teaching Technologies, University of Geneva at 2003. <br /> | In this example we will examine the technologies and the frequency that these technologies have been used, according to one of the students during the course Sciences et Technologies de l’Information et de la Communication I (STIC I), Master of Science in Learning and Teaching Technologies, University of Geneva at 2003. <br /> | ||
Line 218: | Line 218: | ||
Firstly we have to import the url into Orange Textable. To do so, we will select the URLs widget and copy paste the above url into the URL field. Then we will specify the encoding as shown in the following picture. | Firstly we have to import the url into Orange Textable. To do so, we will select the URLs widget and copy paste the above url into the URL field. Then we will specify the encoding as shown in the following picture. | ||
[[File:One.png|center|frame|Figure 2: | [[File:One.png|center|frame|Figure 2: Interface of the URLs widget]] | ||
By using the display widget we can visualize the XML tags as well as the text enclosed in these tags. The tags that we are interested in are : “course” and “exercise”. | By using the display widget we can visualize the XML tags as well as the text enclosed in these tags. The tags that we are interested in are : “course” and “exercise”. | ||
Line 226: | Line 226: | ||
Now we have to isolate the exercises that are part of the course Stic I. In order to do that firstly we have to extract the content of the “course” tag. We will use the Extract XML widget and we will type “course” in the XML element field as shown in the following picture. | Now we have to isolate the exercises that are part of the course Stic I. In order to do that firstly we have to extract the content of the “course” tag. We will use the Extract XML widget and we will type “course” in the XML element field as shown in the following picture. | ||
[[File:Three.png|center|frame|Figure 3: | [[File:Three.png|center|frame|Figure 3: Interface of the Extract XML widget]] | ||
“Course” is the XML tag of our page that encloses all the associated information to a given course. As we can notice in the above picture we have created three segments. Each segment corresponds to the content of each course. | “Course” is the XML tag of our page that encloses all the associated information to a given course. As we can notice in the above picture we have created three segments. Each segment corresponds to the content of each course. | ||
In order to isolate the content of the course Stic, we link the Select widget to the Extract XML widget and type the regular expression shown in the following picture. | In order to isolate the content of the course Stic, we link the Select widget to the Extract XML widget and type the regular expression shown in the following picture. | ||
[[File:Four.png|center|frame|Figure 4: | [[File:Four.png|center|frame|Figure 4: Interface of the Select widget]] | ||
By doing this, we note that we’ve selected the whole segment that contains the string Stic and not just the specified string “Stic”. | By doing this, we note that we’ve selected the whole segment that contains the string Stic and not just the specified string “Stic”. | ||
So far we have created two segments that correspond to the content related to the courses Stic I and Stic II. Finally we have to isolate each exercise of the course Stic I. To do so, we will segment the above segmentation by using the Extract XML widget and the tag “exercise”. | So far we have created two segments that correspond to the content related to the courses Stic I and Stic II. Finally we have to isolate each exercise of the course Stic I. To do so, we will segment the above segmentation by using the Extract XML widget and the tag “exercise”. | ||
[[File:Five.png|center|frame|Figure 5: | [[File:Five.png|center|frame|Figure 5: Interface of the Extract XML widget]] | ||
Now we have 25 segments and each segment corresponds to an exercise of either the course Stic I or the course Stic II. To isolate the desired exercises we will use the Select widget six times and we will be specifying each time the string that refers to the desired segment, that is <exercise-number>1</exercise-number>, <exercise-number>2</exercise-number>, exercise-number>3</exercise-number> etc. | Now we have 25 segments and each segment corresponds to an exercise of either the course Stic I or the course Stic II. To isolate the desired exercises we will use the Select widget six times and we will be specifying each time the string that refers to the desired segment, that is <exercise-number>1</exercise-number>, <exercise-number>2</exercise-number>, exercise-number>3</exercise-number> etc. | ||
[[File:Six.png|center|frame|Figure 6: | [[File:Six.png|center|frame|Figure 6: Interface of the Select widget]] | ||
We note that at this point it is important to specify the label of our segment as it will allow us to distinguish them during the merging process that follows. | We note that at this point it is important to specify the label of our segment as it will allow us to distinguish them during the merging process that follows. | ||
Line 248: | Line 248: | ||
Our next goal is to identify the technologies associated to theses exercises. So, firstly we need one segmentation consisting of the six segments that we’ve created in the previous step (2). To do so, we will merge these segments using the Merge widget. In the Merge widget window we leave the default options and check the box import labels with key. Now we will have one segment with one label that consists of all the segments of step 2 (all exercises for Stic I) | Our next goal is to identify the technologies associated to theses exercises. So, firstly we need one segmentation consisting of the six segments that we’ve created in the previous step (2). To do so, we will merge these segments using the Merge widget. In the Merge widget window we leave the default options and check the box import labels with key. Now we will have one segment with one label that consists of all the segments of step 2 (all exercises for Stic I) | ||
[[File:Seven.png|center|frame|Figure 7: | [[File:Seven.png|center|frame|Figure 7: Interface of the Merge widget]] | ||
'''Step 4''' | '''Step 4''' | ||
Line 254: | Line 254: | ||
At this point we need to get rid of the urls included in the text, to avoid having double results in the case that the technology cited is also associated to a link including the name of the technology. To do so we will use the Record widget and the regular expression shown below. | At this point we need to get rid of the urls included in the text, to avoid having double results in the case that the technology cited is also associated to a link including the name of the technology. To do so we will use the Record widget and the regular expression shown below. | ||
[[File:PictureUrl.png|center|frame|Figure 7: | [[File:PictureUrl.png|center|frame|Figure 7: Interface of the Record widget]] | ||
'''Step 5''' | '''Step 5''' | ||
Line 260: | Line 260: | ||
Now we are ready to identify the technologies cited in our segmentation, which are the six merged segments. For accomplishing that, we will use the Segment widget and we will specify the strings that we are searching for (technologies names) as well as the annotation value of each outcome segment. | Now we are ready to identify the technologies cited in our segmentation, which are the six merged segments. For accomplishing that, we will use the Segment widget and we will specify the strings that we are searching for (technologies names) as well as the annotation value of each outcome segment. | ||
[[File:Eight.png|center|frame|Figure 9: | [[File:Eight.png|center|frame|Figure 9: Interface of the Segment widget]] | ||
In order to specify the Regular Expressions (Regexes), we take the example of the css string. We select Tokenize Mode (t) and type \b(css)\b in the Regex field. Then, we type “type” in the Annotation key field and css in the Annotation value field. Finally we check the boxes Ignore case (i) and Unicode dependent (u). We note that the annotation key is important as we will use it in order to “call” these segments during the construction of the table. | In order to specify the Regular Expressions (Regexes), we take the example of the css string. We select Tokenize Mode (t) and type \b(css)\b in the Regex field. Then, we type “type” in the Annotation key field and css in the Annotation value field. Finally we check the boxes Ignore case (i) and Unicode dependent (u). We note that the annotation key is important as we will use it in order to “call” these segments during the construction of the table. | ||
Line 268: | Line 268: | ||
In order to count the technologies associated to the given exercises as well as the number of the times that each technology has been cited for a given exercise, we will use the Count widget. In the Units field we select annotation key: “type” (labels of the technologies). In the Contexts field we select Mode: Containing segmentation and Annotation key: component_labels (labels of the exercises) and click the “Compute” button as show in the following picture. | In order to count the technologies associated to the given exercises as well as the number of the times that each technology has been cited for a given exercise, we will use the Count widget. In the Units field we select annotation key: “type” (labels of the technologies). In the Contexts field we select Mode: Containing segmentation and Annotation key: component_labels (labels of the exercises) and click the “Compute” button as show in the following picture. | ||
[[File:Nine.png|center|frame|Figure 10: | [[File:Nine.png|center|frame|Figure 10: Interface of the Count widget]] | ||
'''Step 7''' | '''Step 7''' | ||
Line 280: | Line 280: | ||
==Example with more urls== | ==Example with more urls== | ||
In this mini-research we study if the students of the course VIP focus more on the player of a game or on the game itself in their | In this mini-research we study if the students of the course Jeux vidéos pédagogiques (VIP) focus more on the player of a game or on the game itself in their analysis of various games and how this focus changes over time. | ||
For the purposes of our research we focus on the game analyses made by the students for the course VIP during the years 2012-2014. More specifically, we examine how many times in total the words “player” and “game” appear in the | For the purposes of our research we focus on the game analyses made by the students for the course VIP during the years 2012-2014. More specifically, we examine how many times in total the words “player” and “game” appear in the analysis of each year. Theses analysis can be found in edutechwiki in the following urls: | ||
Analyses 2012: http://edutechwiki.unige.ch/fr/Cat%C3%A9gorie:Maltt_VIP_Stella<br /> | |||
Analyses 2013: http://edutechwiki.unige.ch/fr/Cat%C3%A9gorie:Maltt_VIP_Tetris<br /> | |||
Analyses 2014: http://edutechwiki.unige.ch/fr/Cat%C3%A9gorie:Maltt_VIP_Utopia | Analyses 2014: http://edutechwiki.unige.ch/fr/Cat%C3%A9gorie:Maltt_VIP_Utopia | ||
'''Step 1''' | '''Step 1''' | ||
Firstly, we import the urls into | Firstly, we import the urls into Orange Textable defying for each url the encoding, the annotation key and the annotation value. Note that the annotation value must correspond to the year that each analysis was written in order to later present our results by year. As soon as we import all the desired urls we define the output segmentation label. | ||
'''Step 2''' | '''Step 2''' | ||
Furthermore, we clear our html pages from all the unwanted style and script tags along with their contents and all the remaining html tags in order to be left with the | Furthermore, we clear our html pages from all the unwanted style and script tags along with their contents and all the remaining html tags in order to be left with the analysis in pure text so that we can study them easier. We do that using the ''Record widget'' and the regular expressions shown in the picture below.The regular expression ''<script [^>]*> [\s\S]*?</script>'' removes the script tags along with their contents whereas the regular expression ''<style [^>] *> [\s\S]*?</style>'' removes the style tags with their contents and finally the regular expression ''<.*?>'' removes the html tags. | ||
[[File:Picture1Textable.png|center|frame|Figure 1: Interface of the Record widget]] | |||
'''Step 3''' | '''Step 3''' | ||
Once the texts are free of the unwanted elements, we search them for the two words that are of our interest. To do so, we use the Segment widget and the regular expressions shown below. | Once the texts are free of the unwanted elements, we search them for the two words that are of our interest. To do so, we use the ''Segment widget'' and the regular expressions shown below. | ||
[[File:Picture2Textable.png|center|frame|Figure 2: Interface of the Segment widget]] | |||
'''Step 4''' | '''Step 4''' | ||
We then continue by counting how many times the words “player” and “play” are cited in the | We then continue by counting how many times the words “player” and “play” are cited in the analysis of each year. We use the ''Count widget'' for doing so. In the Units field we select annotation key: “type” (labels of the words “player” and “play”). In the Contexts field we select Mode: Containing segmentation and Annotation key: years (labels of the urls) and we continue by clicking the “Compute” button as show in the following picture. | ||
[[File:Picture3Textable.png|center|frame|Figure 3: Interface of the Count widget]] | |||
'''Step 5''' | '''Step 5''' | ||
Finally and in order to visualize the result, that is the table that we have created, we use the Convert widget and Data Table widget (from the Data window) leaving their default options. | Finally and in order to visualize the result, that is the table that we have created, we use the ''Convert widget'' and ''Data Table widget'' (from the Data window) leaving their default options. | ||
Our final | Our final scheme is shown in the following picture | ||
[[File:PictureSchemaTextable.png|center|frame|Figure 6: Complete scheme]] | |||
'''Results''' | '''Results''' | ||
As we can see from the data table | As we can see from the data table, the students of the first year we examine (2012) refer more to the word "game" than "player". At the second year (2013), the use of the word "game" is also more frequent than "player" but we can notice that the reference to the latter starts earning ground. As for the third year (2014), the use of the word "player" is greater than the use of the word "game". From this mini-research derives that over the years the tendency of students to refer more to the game than the player is changing in benefit of the latter. | ||
[[File:PictureResultsTextable.png|center|frame|Figure 4: Results]] | |||
You can dowload the example by clicking [http://tecfaetu.unige.ch/etu-maltt/tetris/karanis0/SticIII/ here] | |||
==Conclusion== | ==Conclusion== |