Property:Has description

The educational technology and digital learning wiki
Jump to navigation Jump to search

This is a property of type Text.

Showing 62 pages using this property.
K
Quote: <span style="background-color:#eeeeee" class="citation">“koRpus is an R package i originally wrote to measure similarities/differences between texts. over time it grew into what it is now, a hopefully versatile tool to analyze text material in various ways, with an emphasis on scientific research, including readability and lexical diversity features.”</span>  +
L
LOCO-Analyst is an educational tool aimed at providing teachers with feedback on the relevant aspects of the learning process taking place in a web-based learning environment, and thus helps them improve the content and the structure of their web-based courses. LOCO-Analyst aims at providing teachers with feedback regarding: *all kinds of activities their students performed and/or took part in during the learning process, *the usage and the comprehensibility of the learning content they had prepared and deployed in the LCMS, *contextualized social interactions among students (i.e., social networking) in the virtual learning environment.  +
The Learning Analytics Enriched Rubric (LA e-Rubric) is an advanced grading method used for criteria-based assessment. As a rubric, it consists of a set of criteria. For each criterion, several descriptive levels are provided. A numerical grade is assigned to each of these levels. An enriched rubric contains some criteria and related grading levels that are associated to data from the analysis of learners’ interaction and learning behavior in a Moodle course, such as number of post messages, times of accessing learning material, assignments grades and so on. Using learning analytics from log data that concern collaborative interactions, past grading performance and inquiries of course resources, the LA e-Rubric can automatically calculate the score of the various levels per criterion. The total rubric score is calculated as a sum of the scores per each criterion.  +
Quote from the [http://www.tal.univ-paris3.fr/lexico/index-gb.htm Home page]: <span style="background-color:#eeeeee" class="citation">“Lexico3 is the 2001 edition of the Lexico software, first published in 1990. Functions present from the first version (segmentation, concordances, breakdown in graphic form, characteristic elements and factorial analyses of repeated forms and segments) were maintained and for the most part significantly improved. The Lexico series is unique in that it allows the user to maintain control over the entire lexicometric process from initial segmentation to the publication of final results. Beyond identification of graphic forms, the software allows for study of the identification of more complex units composed of form sequences: repeated segments, pairs of forms in co-occurrences, etc which are less ambiguous than the graphic forms that make them up.”</span> A free version is available for "personal work", bottom of [http://www.tal.univ-paris3.fr/lexico/download.htm this page]  +
Quote from the home page: <span style="background-color:#eeeeee" class="citation">“This web-based tool enables you to "scrub" (clean) your unicode text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, tokenize with character- or word- Ngrams or TF-IDF weighting, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations. To facilitate subsequent text mining analyses beyond the scope of this site, users can also transpose and download their matricies of word counts or relative proportions as comma- or tab-separated files (.csv, .tsv).”</span>  +
<span style="background-color:#eeeeee" class="citation">“The open-source LightSide platform, including the machine-learning and feature-extraction core as well as the researcher's workbench UI, has been and continues to be funded in part through Carnegie Mellon University, in particular by grants from the National Science Foundation and the Office of Naval Research.”</span> ([http://ankara.lti.cs.cmu.edu/side/ LightSide home page], sept. 2014).  +
LingPipe is tool kit for processing text using computational linguistics. LingPipe is used to do tasks like: * Find the names of people, organizations or locations in news * Automatically classify Twitter search results into categories * Suggest correct spellings of queries The free and open source version requires that data processed and linked software must be freely available. There are other versions.  +
Log Parser is a flexible command line utility that was initially written by Gabriele Giuseppini, a Microsoft employee, to automate tests for IIS logging. It was intended for use with the Windows operating system, and was included with the IIS 6.0 Resource Kit Tools. The default behavior of logparser works like a "data processing pipeline", by taking an SQL expression on the command line, and outputting the lines containing matches for the SQL expression. (From wikipedia) Microsoft describes Logparser as a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows operating system such as the Event Log, the Registry, the file system, and Active Directory. The results of the input query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart.  +
Log Parser studio graphical user interface (GUI) to function as a front-end to [[Log Parser|Log Parser 2.2]] and a ‘Query Library’ in order to manage all queries and scripts that one builds up over time. Log Parser Studio (LPS) can house all queries in a central location and allows to edit, create and save queries. You can search for queries using free text search as well as export and import both libraries and queries in different formats allowing for easy collaboration as well as storing multiple types of separate libraries for different protocols.  +
M
MAXQDA is a mixed methods research tool. There are two versions: * MAXQDA includes more classical QDA functionality (e.g. the ones that can be found in Atlas or Nvivo) + data management/import tools * MAXQDAplus contains the quantiative MAXDictio tool. According to [http://en.wikipedia.org/wiki/MAXQDA Wikipedia] (oct 2013), <span style="background-color:#eeeeee" class="citation">“MAXQDA is a software program designed for computer-assisted qualitative and mixed methods data, text and multimedia analysis in academic, scientific, and business institutions. It is the successor of winMAX, which was first made available in 1989.”</span>  +
Maps is a MediaWiki extension that provides the ability to visualize geographic data with dynamic, JavaScript based, mapping API's. It has built-in support for geocoding, displaying maps, displaying markers, adding pop-ups, and more.  +
Features: * text tokenization, including deep semantic features like parse trees * inverted and forward indexes with compression and various caching strategies * a collection of ranking functions for searching the indexes * topic models * classification algorithms * graph algorithms * language models * CRF implementation (POS-tagging, shallow parsing) * wrappers for liblinear and libsvm (including libsvm dataset parsers) * UTF8 support for analysis on various languages * multithreaded algorithms  +
Quote from the home page: <span style="background-color:#eeeeee" class="citation">“Textalytics is a text analysis engine that extracts meaningful elements from any type of content and structures it, so that you can easily process and manage it. Textalytics features a set of high-level web services — adaptable to the characteristics of every type of business — which can be flexibly integrated into your processes and applications.”</span>  +
The first version of the software was deployed to serve the needs of the free content Wikipedia encyclopedia in 2002. It has been deployed since then in tens of thousands other websites for all sorts of purposes.  +
This extension makes it possible to collect a number of pages. Collections can be edited, persisted and optionally retrieved as PDF, ODF or DocBook (XML)  +
Commercial software for extracting specific information. Using a point-and-click interface, Mozenda enables to extract specific information and images from websites. Mozenda is composed of an "Agent builder" and a web-console. The Mozenda Web Console can run the Agent created in the Agent Builder and enables to organize, manage, view, export and publish information. All agents are run on highly optimized harvesting servers in Mozenda's Data Centers.  +
N
NaCTeM has developed a number of high-quality text mining tools for the UK academic community. However, at least some seem to available to all for ''non commercial purposes'' ([http://www.nactem.ac.uk/terms_conditions.php]) NaCTeM's tools and services offer benefits to a wide range of users eg. reduction in time and effort for finding and linking pertinent information from large scale textual resources and customised solutions in semantic data analysis. ([http://www.nactem.ac.uk/aims.php Our Aims and Objectives], retrieved March 2014). NaCTeM tools are available in different ways. For basic tools, web services exist. Others require download and sometimes configuration/installation.  +
NetDraw is a free Windows program for visualizing social network data NetDraw is also included in [https://sites.google.com/site/ucinetsoftware/home UCINET], a fairly cheap commercial SNA program deveveloped by the same company.  +
NetMiner is an application software for exploratory analysis and visualization of large network data based on SNA(Social Network Analysis). It can be used for general research and teaching in social networks. This tool allows researchers to explore their network data visually and interactively, helps them to detect underlying patterns and structures of the network. It features data transformation, network analysis, statistics, visualization of network data, chart, and a programming language based on the [[Python]] script language.  +
Quote from the [https://netlytic.org/ home page] (11/2014): Netlytic is a cloud-based text and social networks analyzer that can automatically summarize and discover social networks from online conversations on social media sites.  +
Neural Designer is a data mining application intended for professional data scientists. It uses neural networks, which are mathematical models of the brain function that can be trained in order to perform tasks such as function regression, pattern recognition, time series prediction or auto-association. The software provides a graphical user interface using a wizard approach consisting of a sequence of pages. It allows you to run the tasks and to obtain comprehensive results as a report in an easy way. Neural Designer outstands in terms of performance. Indeed, it is developed using C++, has been subjected to code optimization techniques and makes use of parallel processing. It can analyze bigger data sets in less time.  +
O
Quote: OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase. ([http://openrefine.org/ ], oct 2. 2014)  +
OpenSesame is a graphical, open-source experiment builder for the social sciences. It sports a modern and intuitive user interface that allows you to build complex experiments with a minimum of effort. With OpenSesame you can create a wide range of experiments. The plug-in framework and [[Python]] scripting allow you to incorporate external devices, such as eye trackers, response boxes, and parallel port devices, into your experiment. OpenSesame is freely available under the General Public Licence.  +
Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics. Various addons like [[Orange Textable ]] expand functionality of this software  +
Quote from the [http://langtech.ch/textable Textable] (oct. 2, 2014) Orange Textable is an open-source software tool for building data tables on the basis of raw text sources. Look at the following example to see it in typical action. Orange Textable offers the following features: * text data import from keyboard, files, or urls * systematic recoding * segmentation and annotation of various text units * extract and exploit XML-encoded annotations * automatic, random, and arbitrary selection of unit subsets * unit context examination using concordance and collocation tables * frequency and complexity measures * recoded text data and table export  +
P
Semantic Forms is an extension to MediaWiki that allows users to add, edit and query data using forms. It is heavily tied in with the Semantic MediaWiki extension, and is meant to be used for structured data that has semantic markup.  +
<span style="background-color:#eeeeee" class="citation">“Pajek (Slovene word for Spider) is a program, for Windows, for analysis and visualization of large networks. It is freely available, for noncommercial use, at its download page. See also a reference manual for Pajek (in PDF). The development of Pajek is traced in its History. See also an overview of Pajek's background and development. ”</span> ([http://pajek.imfm.si/doku.php?id=pajek Pajek], sept. 22, 2014) Pajek includes six data structures (e.g. network, permutation, cluster,...) and about 15 alorithms using these structures (e.g. partitions, decompositions, paths, flows...)  +
Piwik is an open source web analytics platform. Piwik displays reports regarding the geographic location of visits, the source of visits (i.e. whether they came from a website, directly, or something else), the technical capabilities of visitors (browser, screen size, operating system, etc.), what the visitors did (pages they viewed, actions they took, how they left), the time of visits and more. In addition to these reports, Piwik provides some other features that can help users analyze the data Piwik accumulates, such as: *Annotations — the ability to save notes (such as one's analysis of data) and attach them to dates in the past. *Transitions — a feature similar to Click path-like features that allows one to see how visitors navigate a website, but different in that it only displays navigation information for one page at a time. *Goals — the ability to set goals for actions it is desired for visitors to take (such as visiting a page or buying a product). Piwik will track how many visits result in those actions being taken. *E-commerce — the ability to track if and how much people spend on a website. *Page Overlay — a feature that displays analytics data overlaid on top of a website. *Row Evolution — a feature that displays how metrics change over time within a report. *Custom Variables — the ability to attach data, like a user name, to visit data.  +
Q
QDA Miner is qualitative "mixed methods" data analysis package. There are two version: * A [http://provalisresearch.com/products/qualitative-data-analysis-software/freeware/ free QDA Miner Lite] Version * An expensive commercial version Quote from the official [http://provalisresearch.com/products/qualitative-data-analysis-software/ product page]: <span style="background-color:#eeeeee" class="citation">“DA Miner is an easy-to-use qualitative data analysis software package for coding, annotating, retrieving and analyzing small and large collections of documents and images. QDA Miner qualitative data analysis tool may be used to analyze interview or focus group transcripts, legal documents, journal articles, speeches, even entire books, as well as drawings, photographs, paintings, and other types of visual documents. Its seamless integration with SimStat, a statistical data analysis tool, and [[WordStat]], a quantitative content analysis and text mining module, gives you unprecedented flexibility for analyzing text and relating its content to structured information including numerical and categorical data.”</span>  +
R
R +
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. R is available as Free Software for data manipulation, calculation and graphical display. It includes *an effective data handling and storage facility, *a suite of operators for calculations on arrays, in particular matrices, *a large, coherent, integrated collection of intermediate tools for data analysis, *graphical facilities for data analysis and display either on-screen or on hardcopy, and *a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities. R can be considered as an environment within which statistical techniques are implemented. R can be extended via packages. For example, try: * [http://rqda.r-forge.r-project.org/ RQDA] * [http://cran.at.r-project.org/web/views/NaturalLanguageProcessing.html CRAN Task View: Natural Language Processing]  +
RapidAnalytics is an open source server for data mining and business analytics. It is based on the data mining solution RapidMiner and includes ETL, data mining, reporting, dashboards in a single server solution.  +
RapidMiner is a world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. '''RapidMiner is now RapidMiner Studio''' and RapidAnalytics is now called RapidMiner Server. In a few words, RapidMiner Studio is a "downloadable GUI for machine learning, data mining, text mining, predictive analytics and business analytics". It can also be used (for most purposes) in batch mode (command line mode). [[User:Camacab0|Camacab0]] ([[User talk:Camacab0|talk]])  +
Quotes from the [https://redash.io/help/aboutrd/aboutrd.html#whats_redash FAQ]: Redash is an open source tool for teams to query, visualize and collaborate. Redash is quick to setup and works with any data source you might need so you can query from anywhere in no time. [..] Redash was built to allow fast and easy access to billions of records, that we process and collect using Amazon Redshift (“petabyte scale data warehouse” that “speaks” PostgreSQL). Today Redash has support for querying multiple databases, including: Redshift, Google BigQuery,Google Spreadsheets, PostgreSQL, MySQL, Graphite, Axibase Time Series Database and custom scripts. Main features: * Query editor - enjoy all the latest standards like auto-complete and snippets. Share both your results and queries to support an open and data driven approach within the organization. * Visualization - once you have your dataset, select one of our /9 types of visualizations/ for your query. You can also export or embed it anywhere. * Dashboard - combine several visualizations into a topic targeted dashboard. * Alerts - get notified via email, Slack, Hipchat or a webhook when your query's results need attention. " API - anything you can do with the UI, you can do with the API. Easily connect results to other systems or automate your workflows.  +
S
SAM includes a set of visualizations of learner activities to increase awareness and to support self-reflection. These are implemented as widgets in the ROLE project  +
SATO is a multi-purpose text mining tool, e.g. it includes concordancing, lexical inventoring, annotation and categorization. It allows to mark up text with variables for further analysis. SATO is a web-based text analysis tool using a command line language. So far, only a french interface exists. A commercial version exists, i.e. you can buy a license to install the same system on your own server.  +
SNAPP essentialy serves as a diagnostic instrument, allowing teaching staff to evaluate student behavorial patterns against learning learning activity design objectives and intervene as required in a timely manner.  +
Run SQL queries on APIs, JSON / XML / RSS feeds, Web pages (tables), EVERYTHING!  +
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. Features *Scrapy was designed with simplicity in mind, by providing the features you need without getting in your way *Just write the rules to extract the data from web pages and let Scrapy crawl the entire web site for you *Scrapy is used in production crawlers to completely scrape more than 500 retailer sites daily, all in one server *Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core *Scrapy is completely written in [[Python]] and runs on Linux, Windows, Mac and BSD *Scrapy comes with lots of functionality built in. Check this section of the documentation for a list of them. *Scrapy is extensively documented and has an comprehensive test suite with very good code coverage  +
Semantic Drilldown is an extension to Semantic MediaWiki (SMW) that provides a page for drilling down through a site's data, using categories and filters on semantic properties. The list of pages in each top-level category can be viewed, and for each such category, filters can be created that cover a specific semantic property. If filters exist for a category, users can click on the different possible values for those filters, narrowing the set of results, and thus drill down through the data.  +
Semantic Forms Inputs is an extension to MediaWiki that provides additional input types for Semantic MediaWikis that use the Semantic Forms extension.  +
Semantic Maps is an extension to Semantic MediaWiki (SMW) that adds semantic capabilities to the Maps extension and adds the datatype Geographic coordinate.  +
Semantic MediaWiki is an extension for managing structured data in your wiki and for querying that data to create dynamic representations: tables, timelines, maps, lists, etc.  +
Semantic Result Formats (SRF) is a MediaWiki extension, used in conjunction with the Semantic MediaWiki extension, that bundles a number of further result formats for SMW's inline queries. The individual formats can be added to the installation independently...  +
The goal of the SEMantic simILARity software toolkit (SEMILAR; pronounced the same way as the word 'similar') is to promote productive, fair, and rigorous research advancements in the area of semantic similarity. The kit is available as application software or as Java API. As of March 2014, the GUI-based SEMILAR application is only available to a limited number of users who commit to help improving the usability of the interface. The JAVA libray (API) however, can be downloaded. SEMILAR comes with various similarity methods based on Wordnet, Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), BLEU, Meteor, Pointwise Mutual Information (PMI), Dependency based methods, optimized methods based on Quadratic Assignment, etc. And the similarity methods work in different granularities - word to word, sentence to sentence, or bigger texts. Some methods have their own variations which coupled with parameter settings and your selection of preprocessing steps could result in a huge space of possible instances of the same basic method.  +
Quote: The Stanford NLP Group makes parts of our Natural Language Processing software available to everyone. These are statistical NLP toolkits for various major computational linguistics problems. They can be incorporated into applications with human language technology needs. ([http://nlp.stanford.edu/software/index.shtml])  +
T
Quoted from the tOko homepage (oct 2014) * tOKo is an open source tool for text analysis and browsing a corpus of documents. It implements a wide variety of text analysis and browsing functions in an interactive user interface. * An important application area of tOKo is ontology development. It supports both ontology construction from a corpus, as well as relating the ontology back to a corpus (for example by highlighting concepts from the ontology in a document). * Another application area is community research. Here the objective is to analyse the exchange of information, for example in a community forum or through a collection of interconnected weblogs.  +
Tableau software helps people communicate data through an innovation called VizQL, a visual query language that converts drag-and-drop actions into data queries, allowing users to quickly find and share insights in their data. With Tableau, “data workers” first connect to data stored in files, cubes, databases, warehouses, Hadoop technologies, and even some cloud sources like Google Analytics. They then interact with the Tableau user interface to simultaneously query the data and view the results in charts, graphs, and maps that can be arranged together on dashboards. ([http://shop.oreilly.com/product/0636920030942.do Jones, 2014]: 15) Basically, one has to install a desktop application (Win/Mac) and create a visualization. The result then can be published either on their public server or on your own server (commercial).  +
Tabula is a free, open source tool that allows you to easily take data out of PDF files and into Excel, database programs, and web applications. Tabula allows users to upload their documents, indicate the position of the tables they want and extract the data right into Comma Separated Variable (CSV) or Tab Separated Variable (TSV) file, or just copy the text as CSV to a clipboard. Tabula can repeat operation on several pages or documents.  +
'''Quotes''' from the [http://eric.univ-lyon2.fr/~ricco/tanagra/index.html official home page] (10/2014): * TANAGRA is a free DATA MINING software for academic and research purposes. It proposes several data mining methods from exploratory data analysis, statistical learning, machine learning and databases area. * The main purpose of Tanagra project is to give researchers and students an easy-to-use data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing to analyse either real or synthetic data. * The second purpose of TANAGRA is to propose to researchers an architecture allowing them to easily add their own data mining methods, to compare their performances. TANAGRA acts more as an experimental platform in order to let them go to the essential of their work, dispensing them to deal with the unpleasant part in the programmation of this kind of tools : the data management. * The third and last purpose, in direction of novice developers, consists in diffusing a possible methodology for building this kind of software. They should take advantage of free access to source code, to look how this sort of software is built, the problems to avoid, the main steps of the project, and which tools and code libraries to use for. In this way, Tanagra can be considered as a pedagogical tool for learning programming techniques. According to its author, Tangara can be compared to [[Weka]]: In comparison it has an easier to use Interface, but less functionality.  +
TAPoRware is a set of text analysis tools that enables users to perform text analysis on HTML, XML and plain text files, using documents from the users' machine or on the web. There are five families of tools: for HTML, XML, Text, Other and Beta. A list is included below in the free text section.  +
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur euismod molestie suscipit. Quisque metus libero, vulputate sed consectetur elementum, molestie id mi. Aliquam tristique diam metus, eget tincidunt tortor aliquet sit amet. Vestibulum ac velit id lacus blandit hendrerit eu nec risus. Donec ac elementum nisi. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nulla nec ipsum felis. Vestibulum neque diam, laoreet in mollis eget, vulputate at erat. Donec quis semper est, in condimentum quam. Pellentesque pulvinar semper est, ac condimentum massa adipiscing ut. Sed pharetra ligula et posuere vulputate. Morbi ullamcrper auctor varius. Nulla eget nibh at ipsum convallis faucibus. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Sed sed turpis sagittis, viverra libero ac, lacinia ligula.  +
TextSTAT is simple text analysis program. It's main functionality is concordance. Quote from the [http://neon.niederlandistik.fu-berlin.de/en/textstat/ home page] (11/2014): <span style="background-color:#eeeeee" class="citation">“TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file. TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus... ”</span>  +
Quote from the [http://textalyser.net/ home page]: <span style="background-color:#eeeeee" class="citation">“Welcome to the online text analysis tool, the detailed statistics of your text, perfect for translators (quoting), for webmasters (ranking) or for normal users, to know the subject of a text. Now with new features as the analysis of words groups, finding out the keyword density, analyse the prominence of word or expressions.”</span>  +
Tm +
tm package provides a framework for text mining applications within R. The tm package offers functionality for managing text documents, abstracts the process of document manipulation and eases the usage of heterogeneous text formats in R. The package provides native support for reading in several classic file formats such as plain text, PDFs, or XML files. There is also a plug-in mechanism to handle additional file formats. The data structures and algorithms can be extended to fit custom demands.  +
Tropes is a free text-analysis(text mining) software . Tropes include its ability to carry out stylistic, syntactic and semantic analyses and to present the results in graph and table form. Tropes can yield information about a text such as stylistic/rhetorical analyses (argumentative, enunciative, descriptive or narrative style). It can also identify different word categories (verbs, connectors, personal pronouns, modalities, qualifying adjectives), conduct thematic analyses (reference fields), and detect discursive/chronological structures.  +
Quote: We provide a tokenizer, a part-of-speech tagger, hierarchical word clusters, and a dependency parser for tweets, along with annotated corpora and web-based annotation tools.  +
V
Voyeur is a web-based text analysis environment. It is designed to be user-friendly, flexible and powerful. Voyeur is part of the Hermeneuti.ca, a collaborative project to develop and theorize text analysis tools and text analysis rhetoric. [http://hermeneuti.ca/voyeur/ Voyeur Tools: See Through Your Texts] (retrieved 3/2014). In Yoyeur, you can * use texts in a variety of formats including plain text, HTML, XML, PDF, RTF and MS Word * use texts from different locations, including URLs and uploaded files * perform lexical analysis including the study of frequency and distribution data; in particular export data into other tools (as XML, tab separated values, etc.) * embed live tools into remote web sites that can accompany or complement your own content  +
W
Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.  +
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka 3.7 (still beta in oct. 2014) includes a package system, that allows to add functionality without recompiling the system. As of summer 2014, most people seem to use this developer version. Weka is a very popular free data mining tool that includes advanced text mining features  +
Quotation from the [http://www.lexically.net/downloads/version6/HTML/index.html?getting_started.htm getting started page] (11/2014): <span style="background-color:#eeeeee" class="citation">“ WordSmith Tools is an integrated suite of programs for looking at how words behave in texts. You will be able to use the tools to find out how words are used in your own texts, or those of others. The WordList tool lets you see a list of all the words or word-clusters in a text, set out in alphabetical or frequency order. The concordancer, Concord, gives you a chance to see any word or phrase in context -- so that you can see what sort of company it keeps. With KeyWords you can find the key words in a text. The tools have been used by Oxford University Press for their own lexicographic work in preparing dictionaries, by language teachers and students, and by researchers investigating language patterns in lots of different languages in many countries world-wide.”</span>  +
Wordstat is a commercial text-mining and content analysis software. It is integrated with the [[QDA Miner]] and SimStat products from the same company. Quote from the official [http://provalisresearch.com/products/content-analysis-software/product page]: <span style="background-color:#eeeeee" class="citation">“WordStat is a flexible and easy-to-use text analysis software – whether you need text mining tools for fast extraction of themes and trends, or careful and precise measurement with state-of-the-art quantitative content analysis tools. WordStat‘s seamless integration with SimStat – our statistical data analysis tool – and QDA Miner – our qualitative data analysis software – gives you unprecedented flexibility for analyzing text and relating its content to structured information, including numerical and categorical data.”</span>  +
Quote from the [http://www.wordcruncher.com/index.html home page]: <span style="background-color:#eeeeee" class="citation">“WordCruncher is a free eBook reader with research tools to help students and scholars study important texts. * You can look for specific references, search for words or phrases, follow cross-reference hyperlinks, and enlarge images. * You can copy and paste text, add bookmarks, highlight text, and make searchable notes. * Additional study aids include complex searches, word frequencies, word frequency distributions, synchronized windows to compare translations, word tags, and various text analysis reports (e.g., collocation, vocabulary dispersion, vocabulary usage). ”</span>  +