MeTA

From EduTech Wiki
Jump to: navigation, search


MeTA: ModErn Text Analysis

No image.png

Developed by: ChengXiang Zhai et al. University of Illinois at Urbana -Champaign
License:
Web page : Tool homepage
Tool type : Multi purpose

Tool.png

The last edition of this page was on: 2016/11/18

The Completion level of this page is : Low


SHORT DESCRIPTION

Features:

  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms


TOOL CHARACTERISTICS

Usability

Authors of this page consider that this tool is somewhat difficult to use.

Tool orientation

This tool is designed for general purpose analysis.

Data mining type

This tool is made for Text mining.

Manipulation type

This tool is designed for Data extraction, Data analysis, Data visualisation.

IMPORT FORMAT :

EXPORT FORMAT :


Tool objective(s) in the field of Learning Sciences

Analysis & Visualisation of data
Predicting student performance
Student modelling
Social Network Analysis (SNA)
Constructing courseware

Providing feedback for supporting instructors:
Recommendations for students
Grouping students:
Developing concept maps:
Planning/scheduling/monitoring
Experimentation/observation

Tool can perform:

  • Data extraction of type:
  • Transformation of type:
  • Data analysis of type: Data mining methods and algorithms
  • Data visualisation of type: (These visualisations can be updated in "real time" )



ABOUT USERS

Tool is suitable for:

Students/Learners/Consumers
Teachers/Tutors/Managers
Researchers
Developers/Designers
Organisations/Institutions/Firms
Others

Required skills:

STATISTICS: N/A

PROGRAMMING: N/A

SYSTEM ADMINISTRATION: N/A

DATA MINING MODELS: N/A



FREE TEXT


Tool version : MeTA: ModErn Text Analysis
(blank line)

Developed by : ChengXiang Zhai et al. University of Illinois at Urbana -Champaign
(blank line)
Tool Web page : https://meta-toolkit.org/
(blank line)
Tool type : Multi purpose
(blank line)

No image.png

1 SHORT DESCRIPTION


Features:

  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms

2 TOOL CHARACTERISTICS


Tool orientation Data mining type Usability
This tool is designed for general purpose analysis. This tool is designed for Text mining. Authors of this page consider that this tool is somewhat difficult to use.
Data import format Data export format
. .
Tool objective(s) in the field of Learning Sciences

☐ Analysis & Visualisation of data
☐ Predicting student performance
☐ Student modelling
☐ Social Network Analysis (SNA)
☐ Constructing courseware

☐ Providing feedback for supporting instructors:
☐ Recommendations for students
☐ Grouping students:
☐ Developing concept maps:
☐ Planning/scheduling/monitoring
Experimentation/observation

Can perform data extraction of type:

Can perform data transformation of type:


Can perform data analysis of type:
Data mining methods and algorithms


Can perform data visualisation of type:
(These visualisations can be updated in "real time" )


3 ABOUT USER


Tool is suitable for:
Students/Learners/Consumers:☑ Teachers/Tutors/Managers:☐ Researchers:☑ Organisations/Institutions/Firms:☐ Others:☐
Required skills:
Statistics: Programming: System administration: Data mining models:

4 OTHER TOOL INFORMATION


No screenshot.jpg
MeTA: ModErn Text Analysis
ChengXiang Zhai et al. University of Illinois at Urbana -Champaign
https://meta-toolkit.org/
Features:
  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms
General analysis
Students/Learners/Consumers, Developers/Designers, Researchers
N/A
N/A
N/A
N/A
Multi purpose
Text mining
Data extraction, Data analysis, Data visualisation
Data mining methods and algorithms
somewhat difficult to use
Low

Quote from MeTA: ModErn Text Analysis (november, 2016) Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people manage and analyze vast amount of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans for humans.

This has two consequences. First, since text data are generated by people, they are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. Second, since text is written for consumption by humans, humans play a critical role in any text data application system, and a text management and analysis system must involve them in the loop of text analysis.

Existing toolkits supporting text management and analysis tend to fall into two categories. The first is search engine toolkits, which are especially suitable for building a search engine application, but tend to have limited support for text analysis/mining functions. Examples include Lucene, Terrier, and Indri/Lemur. The second is text mining or general data mining and machine learning toolkits, which tend to selectively support some text analysis functions, but generally do not support search capability.

However, seamless integration of search engine capabilities with various text analysis functions is necessary due to two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern. A main design philosophy of MeTA, which also differentiates MeTA from all the existing toolkits, is its emphasis on the tight integration of search capabilities (indeed, text access capabilities in general) with text analysis functions, enabling it to provide full support for building a powerful text analysis application.

Another design philosophy of MeTA is to facilitate education and research experiments with various algorithms. In this direction, it is similar to Indri/Lemur in its emphasis on modularity and extensibility achieved through object-oriented design. It enables flexible configuration of a selected subset of modules so as to make it easy for designing course assignments or experimenting with a few selected algorithms as needed in focused research projects. For example, it has been successfully used in a MOOC on Text Retrieval and Search Engines where over one thousand Coursera learners have used the toolkit to finish a large programming assignment. It will be used again for supporting programming assignments for another upcoming MOOC on Text Mining and Analytics.

5 Bibliography

Massung, S., Geigle, C., & Zhai, C. (2016). META: A Unified Toolkit for Text Retrieval and Analysis. ACL 2016, 91.https://www.aclweb.org/anthology/P/P16/P16-4.pdf#page=103

ChengXiang Zhai (2011). Beyond Search: Statistical Topic Models for Text Analysis (slides. https://meta-toolkit.org/sigir-keynote-zhai.pdf
Facts about "MeTA"
Analysis orientationGeneral analysis +
Analysis typeData mining methods and algorithms +
Data manipulation typeData extraction +, Data analysis + and Data visualisation +
Data tool typeMulti purpose +
End user typeStudents/Learners/Consumers +, Developers/Designers + and Researchers +
Has completion levelLow +
Has descriptionFeatures:
  • text tokenization, including
    Features:
  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithmsrious languages
  • multithreaded algorithms +
Has interface usabilitysomewhat difficult to use +
Has nameMeTA: ModErn Text Analysis +
Has websitehttps://meta-toolkit.org/ +
Is developed byChengXiang Zhai et al. University of Illinois at Urbana -Champaign +
Last editionNovember 18, 2016 +
Mining tool typeText mining +
User data mining models levelN/A +
User programming levelN/A +
User statistics levelN/A +
User system engineering levelN/A +