RapidMiner Studio: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
Line 106: Line 106:
In next section we will talk about operators, and we will come back to '''Process Documents from Files parameters''' to choose which vector we want RapidMiner to create.
In next section we will talk about operators, and we will come back to '''Process Documents from Files parameters''' to choose which vector we want RapidMiner to create.


== Define StopWords and operations ==
== Tokenize & define StopWords ==


Now that we have our Process Documents from Files operator in our '''Main Process area''' and our text directories set up correctly, we will need to add our StopWords operator.
Now that we have our Process Documents from Files operator in our '''Main Process area''' and our text directories set up correctly, we need to connect our operator '''Process Documents from Files''' on the left (from inp to wor) and on the right (from exa to res, and wor to res). This will allow the data to be processed.
 
We will now define what steps (or processes) should be executed during our '''Process Documents from Files''' operator. So by double-clicking on it, we can see it's inside. We will now add a '''Tokenize''' operator that can be found in operators area (in Tokenization) on the left. Tokenize will separate words making them independent values.
* Drag '''Filter Stopwords French''' (because my text files are in french) in the '''Vector Creation''' and connecting them both on left and right side (input and output), and then click on the "up" arrow to return to main level (see Fig. 4)
 
If you browse your '''Operators''' on your left, in Text Processing/Filtering you should see Filter


== View result ==
== View result ==

Revision as of 16:49, 10 November 2014

Rapidminer logo.jpg


RapidMiner Studio 5.3.015 (2014/02/26)

Screenshot-rapidminer-studio.png

Developed by: RapidMiner
License: Affero General Public License version 1
Web page : Tool homepage
Tool type : Framework/Library/API,

Tool.png

The last edition of this page was on: 2014/11/09

The Completion level of this page is : Medium


SHORT DESCRIPTION

[[has description::RapidMiner is a world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. RapidMiner is now RapidMiner Studio and RapidAnalytics is now called RapidMiner Server.

In a few words, RapidMiner Studio is a "downloadable GUI for machine learning, data mining, text mining, predictive analytics and business analytics". It can also be used (for most purposes) in batch mode (command line mode).

Camacab0 (talk) 15:54, 10 November 2014 (CET)]]


TOOL CHARACTERISTICS

Usability

Authors of this page consider that this tool is rather easy to use.

Tool orientation

This tool is designed for general purpose analysis.

Data mining type

This tool is made for Structured data mining, Text mining, Image mining, Audio mining, Video mining, Data gathering, Social network analysis.

Manipulation type

This tool is designed for Data extraction, Data transformation, Data analysis, Data visualisation, Data conversion, Data cleaning.

IMPORT FORMAT : SQL, TXT, XLS, XML, a lot more

EXPORT FORMAT : CSV, XML, XSL, a lot more


Tool objective(s) in the field of Learning Sciences

Analysis & Visualisation of data
Predicting student performance
Student modelling
Social Network Analysis (SNA)
Constructing courseware

Providing feedback for supporting instructors:
Recommendations for students
Grouping students:
Developing concept maps:
Planning/scheduling/monitoring
Experimentation/observation

Tool can perform:

  • Data extraction of type: Web crawler, Flat file database/Logfile extractor, Structured database extractor
  • Transformation of type: Simple data format conversion, Simple data transformation operations, Advanced data transformation operations, Mathematical transformation of data for analysis
  • Data analysis of type: Basic statistics and data summarization, Data mining methods and algorithms
  • Data visualisation of type: Sequential Graphic, Chart/Diagram (These visualisations can be interactive and updated in "real time")



ABOUT USERS

Tool is suitable for:

Students/Learners/Consumers
Teachers/Tutors/Managers
Researchers
Developers/Designers
Organisations/Institutions/Firms
Others

Required skills:

STATISTICS: Basic

PROGRAMMING: None

SYSTEM ADMINISTRATION: N/A

DATA MINING MODELS: Medium



FREE TEXT


Tool version : RapidMiner Studio 5.3.015 2014/02/26
(blank line)

Developed by : RapidMiner
(blank line)
Tool Web page : http://sourceforge.net/projects/rapidminer/#resources
(blank line)
Tool type : Framework/Library/API
(blank line)
License:Affero General Public License version 1

Screenshot-rapidminer-studio.png

SHORT DESCRIPTION


RapidMiner is a world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. RapidMiner is now RapidMiner Studio and RapidAnalytics is now called RapidMiner Server.

In a few words, RapidMiner Studio is a "downloadable GUI for machine learning, data mining, text mining, predictive analytics and business analytics". It can also be used (for most purposes) in batch mode (command line mode).

Camacab0 (talk) 15:54, 10 November 2014 (CET)

TOOL CHARACTERISTICS


Tool orientation Data mining type Usability
This tool is designed for general purpose analysis. This tool is designed for Structured data mining, Text mining, Image mining, Audio mining, Video mining, Data gathering, Social network analysis. Authors of this page consider that this tool is rather easy to use.
Data import format Data export format
SQL, TXT, XLS, XML, a lot more. CSV, XML, XSL, a lot more.
Tool objective(s) in the field of Learning Sciences

☑ Analysis & Visualisation of data
☑ Predicting student performance
☑ Student modelling
☑ Social Network Analysis (SNA)
☑ Constructing courseware

☑ Providing feedback for supporting instructors:
☑ Recommendations for students
☑ Grouping students:
☑ Developing concept maps:
☑ Planning/scheduling/monitoring
Experimentation/observation


Can perform data extraction of type:
Web crawler, Flat file database/Logfile extractor, Structured database extractor


Can perform data transformation of type:
Simple data format conversion, Simple data transformation operations, Advanced data transformation operations, Mathematical transformation of data for analysis


Can perform data analysis of type:
Basic statistics and data summarization, Data mining methods and algorithms


Can perform data visualisation of type:
Sequential Graphic, Chart/Diagram (These visualisations can be interactive and updated in "real time")


ABOUT USER


Tool is suitable for:
Students/Learners/Consumers:☑ Teachers/Tutors/Managers:☑ Researchers:☑ Organisations/Institutions/Firms:☑ Others:☑
Required skills:
Statistics: BASIC Programming: NONE System administration: Data mining models: MEDIUM

OTHER TOOL INFORMATION


Screenshot-rapidminer-studio.png
Screenshot-rapidminer-studio.png
Rapidminer logo.jpg
RapidMiner Studio
Affero General Public License version 1
Free&Open source
RapidMiner
2014/02/26
5.3.015
http://sourceforge.net/projects/rapidminer/#resources
[[has description::RapidMiner is a world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. RapidMiner is now RapidMiner Studio and RapidAnalytics is now called RapidMiner Server.

In a few words, RapidMiner Studio is a "downloadable GUI for machine learning, data mining, text mining, predictive analytics and business analytics". It can also be used (for most purposes) in batch mode (command line mode).

Camacab0 (talk) 15:54, 10 November 2014 (CET)]]

General analysis
Students/Learners/Consumers, Teachers/Tutors/Managers, Developers/Designers, Researchers, Organisations/Institutions/Firms, Others
Basic
None
N/A
Medium
Framework/Library/API
Web crawler, Flat file database/Logfile extractor, Structured database extractor
Structured data mining, Text mining, Image mining, Audio mining, Video mining, Data gathering, Social network analysis
Data extraction, Data transformation, Data analysis, Data visualisation, Data conversion, Data cleaning
Basic statistics and data summarization, Data mining methods and algorithms
Simple data format conversion, Simple data transformation operations, Advanced data transformation operations, Mathematical transformation of data for analysis
SQL, TXT, XLS, XML, a lot more
CSV, XML, XSL, a lot more
a lot more
a lot more
Sequential Graphic, Chart/Diagram
rather easy to use
Medium

Draft

Introduction

Rapidminer is both a free open source and commercial product for text mining (content analysis).

“RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML files and created in RapidMiner's graphical user interface (GUI). RapidMiner is written in the Java programming language. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project.” (Wikipedia, retrieved 20:37, 13 March 2012 (CET))

Installation

  • Installation of RapidMiner Studio is very easy on Windows (tested on Windows 7 and Windows 8.1, both 64 bits), when using the Installer provided on your RapidMiner Account page.
  • Installation is kind of difficult on Mac OS X depending on Java versions. In 10.10, RapidMiner asks for Java 1.7 or above, even if you've got 1.8.X installed.

Note : RapidMiner is now a commercial software, so you can only use the product for 14 days, after asking a trial license.

A complete set of tools

First of all, it is important to say that RapidMiner Studio - and RapidMiner Server, that work with it - are a complete set of tools, rather than a more specific software. RapidMiner website says that "RapidMiner lets you easily sort through and run more than 1500 operations".

Because of it's complexity, i will only describe some of RapidMiner Studio's functions. However, I will show above an use example of RapidMiner Studio as a basic text miner. RapidMiner Studio's highlights are :

  • A visual - code-free - environment, so no programming needed
  • Available on all major operating systems and platforms
  • Main function : Design of analysis processes
  • Predictive analytics (with pre-made templates)
  • Data loading
  • Data transformation
  • Data modeling
  • Data visualization (with lots of visualizations)
  • Extension API
  • Lots of data sources : Excel, Access, Oracle, IBM DB2, Microsoft SQL, Sybase, Ingres, MySQL, Postgres, SPSS, dBase, Text files, and more
  • RapidMiner allows you to work with different types and sizes of data sources

Use example : text mining

As described before, RapidMiner can be used as a text mining software. I will describe here an example of text mining process, where we will :

  1. Load and extract words from (the text files in) two directories
  2. Ignore some words that are not wanted (stoplist)
  3. Generate the results
  4. View results in two different ways

Launch RapidMiner Studio and load data

Fig.1 : Text Mining Extension
Fig.2 : The workspace

As you launched RapidMiner Studio (v. 6.1.1000) you will need to install the Text Mining extension. RapidMiner works with extensions that plug into the core system. The Text Mining extension can be found in RapidMiner Marketplace, which can be accessed from Help > Updates and Extensions (Marketplace) as shows the figure 1.

After restarting the software, we can start working with it. First of all create a New Process. You will see now the main window of RapidMiner Studio, and I will briefly describe the main zones of the working space :

  • In blue we have the main toolbar
  • In orange we can see all the operators that we can use in our processes
  • In green we have the repositories
  • In purple we have the main process windows, where we will be able to see process results and progression
  • In black we have parameters of each element of or process and help

From here, we will first of all find our operator Process Documents from Files (screenshot here) and we will drag it into the Process zone, in the center. At this point we have our operator in our process, and we need to set his parameters. Clic on our operator in the main process area, and see which parameters you can set on the right side. First parameter is text directories which we will set right away.

Fig.3 : Text Directories
  • In my case, i have a directory on my Desktop which name is "data"
  • In /data/, I have /litterature/text1.txt and /photographie/text2.txt
  • I will set up my text directories like suggested in the Fig. 3 and give both a different name to be able to show results depending on text directory

In next section we will talk about operators, and we will come back to Process Documents from Files parameters to choose which vector we want RapidMiner to create.

Tokenize & define StopWords

Now that we have our Process Documents from Files operator in our Main Process area and our text directories set up correctly, we need to connect our operator Process Documents from Files on the left (from inp to wor) and on the right (from exa to res, and wor to res). This will allow the data to be processed.

We will now define what steps (or processes) should be executed during our Process Documents from Files operator. So by double-clicking on it, we can see it's inside. We will now add a Tokenize operator that can be found in operators area (in Tokenization) on the left. Tokenize will separate words making them independent values.

  • Drag Filter Stopwords French (because my text files are in french) in the Vector Creation and connecting them both on left and right side (input and output), and then click on the "up" arrow to return to main level (see Fig. 4)

If you browse your Operators on your left, in Text Processing/Filtering you should see Filter

View result

Export results

Links

Official

Get RapidMiner

Documentation / Tutorials