ReCaptcha: Difference between revisions

The educational technology and digital learning wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 1: Line 1:
{{Citizen science project
{{Citizen science project
==test==
|field_project_name=reCaptcha
|field_project_name=reCaptcha
|field_project_access_URL=http://www.google.com/recaptcha
|field_project_access_URL=http://www.google.com/recaptcha

Revision as of 12:11, 31 July 2013

{{Citizen science project

test

|field_project_name=reCaptcha |field_project_access_URL=http://www.google.com/recaptcha |field_project_description=reCAPTCHA is a user-dialogue system originally developed by Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum at Carnegie Mellon University's main Pittsburgh campus, and acquired by Google in September 2009. Like the CAPTCHA interface, reCAPTCHA asks users to enter words seen in distorted text images onscreen. By presenting two words it both protects websites from bots attempting to access restricted areas and helps digitize the text of books. The reCAPTCHA service supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects. |field_team_leadermm=Google |field_project_open=Yes |field_subject_areas=humanities |field_cs_subject_areas=other |field_purpose_of_project=reCAPTCHA has worked on digitizing the archives of The New York Times and books from Google Books. As of 2012, thirty years of The New York Times had been digitized and the project planned to have completed the remaining years by the end of 2013.

Wikipedia, retrieved July 2013 |field_participant_task_description=Scanned text is subjected to analysis by two different optical character recognition programs. Their respective outputs are then aligned with each other by standard string-matching algorithms and compared both to each other and to an English dictionary. Any word that is deciphered differently by both OCR programs or that is not in the English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, along with a control word already known. The system assumes that if the human types the control word correctly, then the response to the questionable word is accepted as probably valid.

Wikipedia, retrieved July 2013 |field_volonteer_computing=no |field_volonteer_thinking=no |field_volonteer_sensing=no |field_volonteer_gaming=no |field_tutorials_documentation=N/A |field_peer_to_peer_guidance=N/A |field_training_sequence=N/A |field_individual_performance_feedback=N/A |field_collective_performance_feedback=N/A |field_research_progress_feedback=N/A |field_member_profiles=no |field_community_tools=website |field_has_community_manager=no |field_project_news_updates=N/A |field_socialsoftware_sites=N/A |field_team_work=N/A }} About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

Bibliography and Links

  • Luis von Ahn, Benjamin Maurer, Colin McMillen, David Abraham and Manuel Blum. 2008. "reCAPTCHA: Human-Based Character Recognition via Web Security Measures" Science 12 September 2008: Vol. 321 no. 5895 pp. 1465–1468. http://dx.doi.org/10.1126/science.1160379