Orange Textable: Difference between revisions
Jump to navigation
Jump to search
Line 154: | Line 154: | ||
</gallery> | </gallery> | ||
*In general the first section is different accord to the widget that you are using. | *In general the first section is different accord to the widget that you are using. | ||
**Preprocessing (''Preprocess widget''): This widget inputs a segmentation and outputs a segmentation covering the modified text. The possible modifications are to replace the accented characters by their non-accented equivalents as well as lower case by upper case characters and vice versa. Note that ''Preprocess'' creates a copy of each modified segment and increases the program’s memory footprint. Finally as it creates new strings and not only new segmentations it won’t work if combined with segmentations that refer to different strings. In the sequence depicted in the image bellow the frequency table will remain empty. | **Preprocessing (''Preprocess widget''): This widget inputs a segmentation and outputs a segmentation covering the modified text. The possible modifications are to replace the accented characters by their non-accented equivalents as well as lower case by upper case characters and vice versa. Note that ''Preprocess'' creates a copy of each modified segment and increases the program’s memory footprint. Finally as it creates new strings and not only new segmentations it won’t work if combined with segmentations that refer to different strings. In the sequence depicted in the image bellow the frequency table will remain empty.[[File:PreprocesEx.png|Image taken from Orange Textable documentation]] | ||
**Substitutions (''Record widget''): This widget inputs segmentation which covers the text that should be recoded and outputs segmentation that covers the recoded text. It “captures” and substitutes the inputted text by using regular expressions. The text to be “captured” is encoded in the “Regex” field and the text that substitutes it in the “Replacement string” field. If the “Replacement string” field is empty the “captured” text will be deleted. Note that it creates new strings and not only new segmentations so it is subject of the same limitations as the ''Preprocess widget''. | **Substitutions (''Record widget''): This widget inputs segmentation which covers the text that should be recoded and outputs segmentation that covers the recoded text. It “captures” and substitutes the inputted text by using regular expressions. The text to be “captured” is encoded in the “Regex” field and the text that substitutes it in the “Replacement string” field. If the “Replacement string” field is empty the “captured” text will be deleted. Note that it creates new strings and not only new segmentations so it is subject of the same limitations as the ''Preprocess widget''. | ||
**Ordering (''Merge widget''): This widget inputs two or more segmentations and outputs a merged segmentation. You can reorder the inputted segmentations by selecting them and then clicking the move up/move down buttons. | **Ordering (''Merge widget''): This widget inputs two or more segmentations and outputs a merged segmentation. You can reorder the inputted segmentations by selecting them and then clicking the move up/move down buttons. | ||
Line 170: | Line 170: | ||
'''Advanced interface''' | '''Advanced interface''' | ||
*The first section of some widgets allows more configuration of your segmentation. | *The first section of some widgets allows more configuration of your segmentation. | ||
<gallery> | |||
File:PreprocessAdv.png|Preprocess widget | |||
File:SegmentAdv.png|Segment widget | |||
</gallery> | |||
**According to the widget that you are using you can create or import a list of substitutions (''Record widget'') or regular expressions (''Segment widget'') that you can reorder and hence define the order of their application on the input segmentation, as well as delete parts (or all) of these substitutions or regular expressions. Finally you can also export the list that you’ve created. These widgets allows you also to to control the application of the regular expressions by using the '''Ignore case (i)''', '''Unicode dependent (u)''', '''Multiline (m)''' and '''Dot matches all (s)''' checkboxes. For more information on regular expressions see [https://docs.python.org/2/library/re.html Python documentation]. | **According to the widget that you are using you can create or import a list of substitutions (''Record widget'') or regular expressions (''Segment widget'') that you can reorder and hence define the order of their application on the input segmentation, as well as delete parts (or all) of these substitutions or regular expressions. Finally you can also export the list that you’ve created. These widgets allows you also to to control the application of the regular expressions by using the '''Ignore case (i)''', '''Unicode dependent (u)''', '''Multiline (m)''' and '''Dot matches all (s)''' checkboxes. For more information on regular expressions see [https://docs.python.org/2/library/re.html Python documentation]. | ||
**The ''Segment widget'' additionally, allows you to specify an '''Annotation Key''' and '''Annotation Value''' as well as “to specify if a given regular expression describes the form of the targeted segments ('''Tokenize''' mode) or rather the form of the separators in-between these segments ('''Split''' mode). In '''Split''' mode, empty segments that might occur between two consecutive occurrences of separators are automatically removed” [https://orange-textable.readthedocs.org/en/latest/segment.html]. | **The ''Segment widget'' additionally, allows you to specify an '''Annotation Key''' and '''Annotation Value''' as well as “to specify if a given regular expression describes the form of the targeted segments ('''Tokenize''' mode) or rather the form of the separators in-between these segments ('''Split''' mode). In '''Split''' mode, empty segments that might occur between two consecutive occurrences of separators are automatically removed” [https://orange-textable.readthedocs.org/en/latest/segment.html]. | ||
**The ''Select widget'' has three values that can be selected in the Method field. | **The ''Select widget'' has three values that can be selected in the Method field. | ||
<gallery> | |||
File:SelectReg.png|Method Regex | |||
File:SelectSamp.png|Method Sample | |||
File:SelectThres.png|Method Threshold | |||
</gallery> | |||
***Method Regex: Is the same as the basic interface. The difference is that you can control the application of the regular expressions by using the '''Ignore case (i)''', '''Unicode dependent (u)''', '''Multiline (m)''' and '''Dot matches all (s)''' checkboxes. | ***Method Regex: Is the same as the basic interface. The difference is that you can control the application of the regular expressions by using the '''Ignore case (i)''', '''Unicode dependent (u)''', '''Multiline (m)''' and '''Dot matches all (s)''' checkboxes. | ||
***Method Sample: This method allows you to randomly select the segments send to output. You can either select to express the size of the sample in the number of segments ('''Count''') either in percentage of input segments ('''Proportion'''). | ***Method Sample: This method allows you to randomly select the segments send to output. You can either select to express the size of the sample in the number of segments ('''Count''') either in percentage of input segments ('''Proportion'''). |