Natural language processing: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{under construction}}
== Introduction ==
== Introduction ==
   
   
Natural Language Processing (NLP), also known as Human Language Technologies; Computational Linguistics; or Speech Recognition and Synthesis, is a field of computer science which studies the human language as an interface between computer and human. The goal is to allow computers to fully process large amounts of natural language data, making them able to perform tasks such as automatic translation between different languages, answering questions posed by a human using his/her own language format, or understanding and synthesizing speech.
Natural Language Processing (NLP), also known as Human Language Technologies; Computational Linguistics; or Speech Recognition and Synthesis, is a field of computer science which studies the human language as an interface between computer and human. The goal is to allow computers to fully process large amounts of natural language data, making them able to perform tasks such as automatic translation between different languages, answering questions posed by a human using his/her own language format, or understanding and synthesizing speech.
   
   
NLP for educational applications has gained visibility outside of the NLP community. Applications such as automated writing evaluation (AWE), speech scoring, and plagiarism detection have already been used for high-stakes assessment and in instructional contexts (e.g., Massive Open Online Courses). Simulation and gaming applications used for instructional purposes, especially ones focused on language learning, also illustrate how NLP can be applied in educational contexts.
NLP for educational applications has gained visibility outside of the NLP community. Applications such as automated writing evaluation (AWE), speech scoring, and [[plagiarism]] detection have already been used for high-stakes assessment and in instructional contexts (e.g., [[Massive Open Online Courses]]). [[Simulation and gaming]] applications used for instructional purposes, especially ones focused on language learning, also illustrate how NLP can be applied in educational contexts.
== The NLP Pipeline ==
== The NLP Pipeline ==
   
   
Line 13: Line 10:
In addition to word segmentation, sequence segmentation (separating phrases) is a very important step to understanding text. Once again, this may seem a simple task, since the separation between sentences is often based on punctuation, but sometimes punctuation characters are ambiguous. The period character ‘.’ for example, can be used to not only separate sentences but also in abbreviations like ‘Mr.’ and ‘Inc.’, in acronyms like ‘m.p.h.’, in website addresses such ‘www.google.com’ or numbers like ’12.5’. For these reasons, many of the advanced techniques for word and sentence segmentation available nowadays are based on machine learning approaches.   
In addition to word segmentation, sequence segmentation (separating phrases) is a very important step to understanding text. Once again, this may seem a simple task, since the separation between sentences is often based on punctuation, but sometimes punctuation characters are ambiguous. The period character ‘.’ for example, can be used to not only separate sentences but also in abbreviations like ‘Mr.’ and ‘Inc.’, in acronyms like ‘m.p.h.’, in website addresses such ‘www.google.com’ or numbers like ’12.5’. For these reasons, many of the advanced techniques for word and sentence segmentation available nowadays are based on machine learning approaches.   
   
   
Once the segmentation process is done, we want to ascribe “meaning” to words and sentences. This is a very challenging step, mainly due to the ambiguous nature of language. Jurafsky (2014), provides the following example to illustrate this challenge: the sentence ''I made her duck'' has at least five different meanings:
Once the segmentation process is done, we want to ascribe “meaning” to words and sentences. This is a very challenging step, mainly due to the ambiguous nature of language. Jurafsky<ref>Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London:: Pearson.</ref>, provides the following example to illustrate this challenge: the sentence ''I made her duck'' has at least five different meanings:
I cooked waterfowl for her.
* cooked waterfowl for her.
I cooked waterfowl belonging to her.
* I cooked waterfowl belonging to her.
I created the (plaster?) duck she owns.
* I created the (plaster?) duck she owns.
I caused her to quickly lower her head or body.
* I caused her to quickly lower her head or body.
I waved my magic wand and turned her into undifferentiated waterfowl.
* I waved my magic wand and turned her into undifferentiated waterfowl.
 
The ambiguity of the sentence is due to several different reasons. For example, the word make has two different meanings: ''create'' or ''cook''. If we consider the sentence in its spoken format, there is also a phonetics ambiguity since the first word could have been ''eye'' or the second word ''maid''. Therefore, processing natural language requires that we resolve or disambiguate these ambiguities. Techniques like part-of-speech tagging, word-sense disambiguation, syntactic disambiguation, and lexical disambiguation can be used for this purpose. This means that to fully understand natural language, we need to know about the following to get started:
The ambiguity of the sentence is due to several different reasons. For example, the word make has two different meanings: ''create'' or ''cook''. If we consider the sentence in its spoken format, there is also a phonetics ambiguity since the first word could have been ''eye'' or the second word ''maid''. Therefore, processing natural language requires that we resolve or disambiguate these ambiguities. Techniques like part-of-speech tagging, word-sense disambiguation, syntactic disambiguation, and lexical disambiguation can be used for this purpose. This means that to fully understand natural language, we need to know about the following to get started:
* Phonetics and Phonology: the sounds of human speech for a given word.
* Morphology: how words are formed, and their relationship to other words in the same language.
* Syntax: rules that define the structural relationships between words and, therefore, govern the structure of a sentence.
* Semantics: knowledge of the meaning of each word in a language.
* Pragmatics: understanding relationships between sentence meaning and the speaker’s intentions
* Discourse: knowledge about linguistic units larger than a single utterance


Phonetics and Phonology: the sounds of human speech for a given word.
Morphology: how words are formed, and their relationship to other words in the same language.
Syntax: rules that define the structural relationships between words and, therefore, govern the structure of a sentence.
Semantics: knowledge of the meaning of each word in a language.
Pragmatics: understanding relationships between sentence meaning and the speaker’s intentions
Discourse: knowledge about linguistic units larger than a single utterance
== NLP for Educational Applications ==
== NLP for Educational Applications ==
   
   
The 12th Workshop on Innovative Use of NLP for Building Educational Applications highlights the following applications for NLP in Educational Applications:
The [http://www.cs.rochester.edu/~tetreaul/bea12.html 12th Workshop on Innovative Use of NLP for Building Educational Applications] highlights the following applications for NLP in Educational Applications:
* Automated scoring/evaluation for written student responses
Automated scoring/evaluation for written student responses
** Content analysis for scoring/assessment
Content analysis for scoring/assessment  
** Analysis of the structure of argumentation
Analysis of the structure of argumentation
** Grammatical error detection and correction
Grammatical error detection and correction  
** Discourse and stylistic analysis
Discourse and stylistic analysis  
** Plagiarism detection
Plagiarism detection  
** Machine translation for assessment, instruction and curriculum development
Machine translation for assessment, instruction and curriculum development  
** Detection of non-literal language (e.g., metaphor)
Detection of non-literal language (e.g., metaphor)
** Sentiment analysis Non-traditional genres (beyond essay scoring)
Sentiment analysis
Non-traditional genres (beyond essay scoring)
Intelligent Tutoring (IT) and Game-based assessment that incorporates NLP
Game-based learning
Dialogue systems in education
Hypothesis formation and testing
Multi-modal communication between students and computers
Generation of tutorial responses
Knowledge representation in learning systems
Concept visualization in learning systems
Learner cognition
Assessment of learners' language and cognitive skill levels
Systems that detect and adapt to learners' cognitive or emotional states
Tools for learners with special needs
Use of corpora in educational tools
Data mining of learner and other corpora for tool building
Annotation standards and schemas / annotator agreement
Tools and applications for classroom teachers and/or test developers
NLP tools for second and foreign language learners
Semantic-based access to instructional materials to identify appropriate texts
Tools that automatically generate test questions
Processing of and access to lecture materials across topics and genres
Adaptation of instructional text to individual learners' grade levels
Tools for text-based curriculum development
E-learning tools for personalized course content
Language-based educational games
 
To summarize, NLP plays an important role in the development of Educational Applications. It can be used to support a wide range of learning domains, including writing, speaking, reading, science, and mathematics. Although it plays an especially important role in language learning. For this reason, it has gained visibility outside of the NLP community. Some applications using NLP have already been deployed commercially. Automated writing evaluation (AWE) and speech scoring applications, for example, are already used in high-stakes assessment and instructional contexts. It is also incorporated into Massive Open Online Courses (MOOCs) systems to manage the thousands of assignments. Plagiarism detection is also prevalent among commercially available NLP applications in education.


* Intelligent Tutoring (IT) and Game-based assessment that incorporates NLP
** Game-based learning
** Dialogue systems in education
** Hypothesis formation and testing
** Multi-modal communication between students and computers
** Generation of tutorial responses
** Knowledge representation in learning systems
** Concept visualization in learning systems


* Learner cognition
** Assessment of learners' language and cognitive skill levels
** Systems that detect and adapt to learners' cognitive or emotional states
** Tools for learners with special needs


* Use of corpora in educational tools
** Data mining of learner and other corpora for tool building
** Annotation standards and schemas / annotator agreement


* Tools and applications for classroom teachers and/or test developers
** NLP tools for second and foreign language learners
** Semantic-based access to instructional materials to identify appropriate texts
** Tools that automatically generate test questions
** Processing of and access to lecture materials across topics and genres
** Adaptation of instructional text to individual learners' grade levels
** Tools for text-based curriculum development
** E-learning tools for personalized course content
** Language-based educational games
To summarize, NLP plays an important role in the development of Educational Applications. It can be used to support a wide range of learning domains, including writing, speaking, reading, science, and mathematics. Although it plays an especially important role in language learning. For this reason, it has gained visibility outside of the NLP community. Some applications using NLP have already been deployed commercially. Automated writing evaluation (AWE) and speech scoring applications, for example, are already used in high-stakes assessment and instructional contexts. It is also incorporated into Massive Open Online Courses (MOOCs) systems to manage the thousands of assignments. Plagiarism detection is also prevalent among commercially available NLP applications in education.
== Links ==
== Links ==


Line 88: Line 73:
http://edutechwiki.unige.ch/en/Stanford_NLP_toolkits       
http://edutechwiki.unige.ch/en/Stanford_NLP_toolkits       
   
   
== References ==
== References ==  
<references />
Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London:: Pearson.




[[Category:technologies]]
[[Category:technologies]]

Latest revision as of 20:02, 14 May 2019

Introduction

Natural Language Processing (NLP), also known as Human Language Technologies; Computational Linguistics; or Speech Recognition and Synthesis, is a field of computer science which studies the human language as an interface between computer and human. The goal is to allow computers to fully process large amounts of natural language data, making them able to perform tasks such as automatic translation between different languages, answering questions posed by a human using his/her own language format, or understanding and synthesizing speech.

NLP for educational applications has gained visibility outside of the NLP community. Applications such as automated writing evaluation (AWE), speech scoring, and plagiarism detection have already been used for high-stakes assessment and in instructional contexts (e.g., Massive Open Online Courses). Simulation and gaming applications used for instructional purposes, especially ones focused on language learning, also illustrate how NLP can be applied in educational contexts.

The NLP Pipeline

The first step towards giving a computer the ability to understand language is making it able to recognize words. This step is known as tokenization or word segmentation. In many languages, including English, words are often separated by white spaces, which makes tokenization seem very simple. However, this is not always the case. For example, in many contexts, New York should be considered a single word, while I’m needs to be separated into the two words I and am. Besides that, some languages, such as Chinese, Japanese, and Thai, do not use white spaces between words.

In addition to word segmentation, sequence segmentation (separating phrases) is a very important step to understanding text. Once again, this may seem a simple task, since the separation between sentences is often based on punctuation, but sometimes punctuation characters are ambiguous. The period character ‘.’ for example, can be used to not only separate sentences but also in abbreviations like ‘Mr.’ and ‘Inc.’, in acronyms like ‘m.p.h.’, in website addresses such ‘www.google.com’ or numbers like ’12.5’. For these reasons, many of the advanced techniques for word and sentence segmentation available nowadays are based on machine learning approaches.

Once the segmentation process is done, we want to ascribe “meaning” to words and sentences. This is a very challenging step, mainly due to the ambiguous nature of language. Jurafsky[1], provides the following example to illustrate this challenge: the sentence I made her duck has at least five different meanings:

  • cooked waterfowl for her.
  • I cooked waterfowl belonging to her.
  • I created the (plaster?) duck she owns.
  • I caused her to quickly lower her head or body.
  • I waved my magic wand and turned her into undifferentiated waterfowl.

The ambiguity of the sentence is due to several different reasons. For example, the word make has two different meanings: create or cook. If we consider the sentence in its spoken format, there is also a phonetics ambiguity since the first word could have been eye or the second word maid. Therefore, processing natural language requires that we resolve or disambiguate these ambiguities. Techniques like part-of-speech tagging, word-sense disambiguation, syntactic disambiguation, and lexical disambiguation can be used for this purpose. This means that to fully understand natural language, we need to know about the following to get started:

  • Phonetics and Phonology: the sounds of human speech for a given word.
  • Morphology: how words are formed, and their relationship to other words in the same language.
  • Syntax: rules that define the structural relationships between words and, therefore, govern the structure of a sentence.
  • Semantics: knowledge of the meaning of each word in a language.
  • Pragmatics: understanding relationships between sentence meaning and the speaker’s intentions
  • Discourse: knowledge about linguistic units larger than a single utterance

NLP for Educational Applications

The 12th Workshop on Innovative Use of NLP for Building Educational Applications highlights the following applications for NLP in Educational Applications:

  • Automated scoring/evaluation for written student responses
    • Content analysis for scoring/assessment
    • Analysis of the structure of argumentation
    • Grammatical error detection and correction
    • Discourse and stylistic analysis
    • Plagiarism detection
    • Machine translation for assessment, instruction and curriculum development
    • Detection of non-literal language (e.g., metaphor)
    • Sentiment analysis Non-traditional genres (beyond essay scoring)
  • Intelligent Tutoring (IT) and Game-based assessment that incorporates NLP
    • Game-based learning
    • Dialogue systems in education
    • Hypothesis formation and testing
    • Multi-modal communication between students and computers
    • Generation of tutorial responses
    • Knowledge representation in learning systems
    • Concept visualization in learning systems
  • Learner cognition
    • Assessment of learners' language and cognitive skill levels
    • Systems that detect and adapt to learners' cognitive or emotional states
    • Tools for learners with special needs
  • Use of corpora in educational tools
    • Data mining of learner and other corpora for tool building
    • Annotation standards and schemas / annotator agreement
  • Tools and applications for classroom teachers and/or test developers
    • NLP tools for second and foreign language learners
    • Semantic-based access to instructional materials to identify appropriate texts
    • Tools that automatically generate test questions
    • Processing of and access to lecture materials across topics and genres
    • Adaptation of instructional text to individual learners' grade levels
    • Tools for text-based curriculum development
    • E-learning tools for personalized course content
    • Language-based educational games

To summarize, NLP plays an important role in the development of Educational Applications. It can be used to support a wide range of learning domains, including writing, speaking, reading, science, and mathematics. Although it plays an especially important role in language learning. For this reason, it has gained visibility outside of the NLP community. Some applications using NLP have already been deployed commercially. Automated writing evaluation (AWE) and speech scoring applications, for example, are already used in high-stakes assessment and instructional contexts. It is also incorporated into Massive Open Online Courses (MOOCs) systems to manage the thousands of assignments. Plagiarism detection is also prevalent among commercially available NLP applications in education.

Links

https://www.aclweb.org/portal/content/13th-workshop-innovative-use-nlp-building-educational-applications-0

See Also

http://edutechwiki.unige.ch/en/Stanford_NLP_toolkits

References

  1. Jurafsky, D., & Martin, J. H. (2014). Speech and language processing (Vol. 3). London:: Pearson.