RapidMiner Studio: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 73: | Line 73: | ||
* RapidMiner allows you to work with different types and sizes of data sources | * RapidMiner allows you to work with different types and sizes of data sources | ||
= Use | = Use examples = | ||
== Basic text mining == | |||
As described before, RapidMiner can be used as a text mining software. I will describe here an example of text mining process, where we will : | As described before, RapidMiner can be used as a text mining software. I will describe here an example of text mining process, where we will : | ||
Line 81: | Line 83: | ||
# View results | # View results | ||
== Launch RapidMiner Studio and load data == | === Launch RapidMiner Studio and load data === | ||
[[File:RapidMiner_Studio_Tutorial1_B.PNG|100px|thumb|left|Fig.1 : Text Mining Extension]] | [[File:RapidMiner_Studio_Tutorial1_B.PNG|100px|thumb|left|Fig.1 : Text Mining Extension]] | ||
Line 108: | Line 110: | ||
In next section we will talk about operators, and we will come back to '''Process Documents from Files parameters''' to choose which vector we want RapidMiner to create. | In next section we will talk about operators, and we will come back to '''Process Documents from Files parameters''' to choose which vector we want RapidMiner to create. | ||
== Tokenize & define StopWords == | === Tokenize & define StopWords === | ||
Now that we have our Process Documents from Files operator in our '''Main Process area''' and our text directories set up correctly, we need to connect our operator '''Process Documents from Files''' on the left (from inp to wor) and on the right (from exa to res, and wor to res). This will allow the data to be processed. | Now that we have our Process Documents from Files operator in our '''Main Process area''' and our text directories set up correctly, we need to connect our operator '''Process Documents from Files''' on the left (from inp to wor) and on the right (from exa to res, and wor to res). This will allow the data to be processed. | ||
Line 116: | Line 118: | ||
We will now define what steps (or processes) should be executed during our '''Process Documents from Files''' operator. So by double-clicking on it, we can see it's inside. We will now add a '''Tokenize''' operator that can be found in operators area (in Tokenization) on the left. Tokenize will separate words making them independent values. One of RapidMiner big values is that graphic user interface, that allows you to build processes quite naturally. We will also be able to add '''Filter Stopwords (french)''' - because my text files are in french - into our main '''Process Documents from Files''' operator, also by dragging it. You should see something like in Fig. 5 above. | We will now define what steps (or processes) should be executed during our '''Process Documents from Files''' operator. So by double-clicking on it, we can see it's inside. We will now add a '''Tokenize''' operator that can be found in operators area (in Tokenization) on the left. Tokenize will separate words making them independent values. One of RapidMiner big values is that graphic user interface, that allows you to build processes quite naturally. We will also be able to add '''Filter Stopwords (french)''' - because my text files are in french - into our main '''Process Documents from Files''' operator, also by dragging it. You should see something like in Fig. 5 above. | ||
== View result == | === View result === | ||
If your main operator is connected (input - output) and that inside of it, your Tokenize operator and your Stopwords operator are also connected to each other, and to input and output as suggests the figure above, you should be ready to launch the process which should generate your results. | If your main operator is connected (input - output) and that inside of it, your Tokenize operator and your Stopwords operator are also connected to each other, and to input and output as suggests the figure above, you should be ready to launch the process which should generate your results. | ||
Line 124: | Line 126: | ||
If you launch the process leaving the default value (TF-IDF), RapidMiner will present you the results in different ways. First you have two tabs, '''WordList''' and '''ExampleSet'''. | If you launch the process leaving the default value (TF-IDF), RapidMiner will present you the results in different ways. First you have two tabs, '''WordList''' and '''ExampleSet'''. | ||
=== WordList View === | ==== WordList View ==== | ||
[[File:RapidMiner_Studio_Tutorial1_H.PNG|300px|thumb|right|Fig. 6 : WordList View]] | [[File:RapidMiner_Studio_Tutorial1_H.PNG|300px|thumb|right|Fig. 6 : WordList View]] | ||
Line 134: | Line 136: | ||
* Fifth and sixth column shows '''Text Directory Occurences''' (how many times we can find the word in each text directory) | * Fifth and sixth column shows '''Text Directory Occurences''' (how many times we can find the word in each text directory) | ||
=== ExampleSet View === | ==== ExampleSet View ==== | ||
[[File:RapidMiner_Studio_Tutorial1_I.PNG|300px|thumb|right|Fig. 7 : ExampleSet View]] | [[File:RapidMiner_Studio_Tutorial1_I.PNG|300px|thumb|right|Fig. 7 : ExampleSet View]] | ||
Line 147: | Line 149: | ||
Note : Fig. 8 shows you some of the charts view types that RapidMiner proposes. | Note : Fig. 8 shows you some of the charts view types that RapidMiner proposes. | ||
== Export results == | === Export results === | ||
When it comes to export results in RapidMiner Studio, each extension and RapidMiner Studio function will allow to do different sort of things. | When it comes to export results in RapidMiner Studio, each extension and RapidMiner Studio function will allow to do different sort of things. | ||
Line 155: | Line 157: | ||
Note : The export as an image function seems to allow you to export all software main area (in the center) but not to export individually an image. | Note : The export as an image function seems to allow you to export all software main area (in the center) but not to export individually an image. | ||
== Tweets mining and analysis == | |||
=== Introduction === | |||
RapidMiner Studio allows you to extract, transform and analyse data from A to Z with it's core functionalities and free plugins. Unfortunately, some Cloud extensions and functionalities are premium, and pricey. I will explain here how you can extract and analyse tweets only using the free version of RapidMiner Studio and a third-party service for the tweet extraction. | |||
=== Tweets extraction === | |||
First of all you need to get your data that you want to input in RapidMiner. In our case, we need the tweets that we want to process. As said before, some third-party services allow you to extract tweets automatically from Twitter : I will present [https://zapier.com Zapier], which "''connects the web apps you use to easily move your data and automate tedious tasks''". It's free version allows users to specify a | |||
=== Data transformation === | |||
=== Data analysis === | |||
= Links = | = Links = |