|
Datasets on Crowdsourcing |
|
Dataset Name (with link) |
Size:
#questions, #answers, ratio (#answers/#questions) |
#Questions
with Ground Truth |
Application |
Quuestion
Content: Image or Text |
Question
Type: Single Choice (N choose 1), or Rating |
Operator(s) |
Description |
|
Part 1: Datasets
with ground truth and workers' answers |
|
Fact Evaluation Judgment Dataset |
42624, 216725, 5.08 |
576 |
Fact Evaluation |
Text |
3 choose 1 |
Selection, Join |
The task is to identify
whether a fact (e.g., "Stephen Hawking graduated from Oxford") is
correct, wrong, or ambiguous. |
|
Fashion 10000: An Enriched Social Image Dataset for
Fashion and Clothing |
32398, 97194, 3 |
32398 |
Image Retrieval |
Image with Metadata
(e.g., Tags) |
2 choose 1 |
Selection,
Collection, Join |
The task identifies
whether or not an image is fashion-related (Note: the downloaded dataset size
is 9.8G). |
|
Sentiment
Popularity |
500, 10000, 20 |
500 |
Sentiment Analysis |
Text |
2 choose 1 |
Selection, Join |
The task aims at classifying movie reviews as
either positive or negative. |
|
Weather
Sentiment |
300, 6000, 20 |
300 |
Sentiment Analysis |
Text |
5 choose 1 |
Selection, Join |
The task is to judge
the sentiment of a tweet discussing the weather ("negative",
"neutral", "positive", "unrelated to weather"
or "cannot tell"). |
|
Face
Sentiment |
584, 5256, 9 |
584 |
Face Sentiment
Identification |
Image |
4 choose 1 |
Selection, Categorize,
Join |
The task is to identify
the sentiment (on whether it is neutral, happy, sad or angry) for a given
face image. |
|
Relevance Finding |
20232, 98453, 4.87 |
3276 |
Relevance Finding |
Text |
Rating (5 choices) |
Selection,
Categorize, Sort, Top-K, Join |
The task is to
identify the relevance of a given topic and a given document in a 5-level
rating, i,e, 2: highly relevant, 1: relevant, 0: non-relevant, -1: unknown,
-2: broken link. |
|
Dunchenne Smile Identification |
2162, 30319, 14.02 |
160 |
Dunchenne Smile
Identification |
Image |
2 choose 1 |
Selection, Categorize,
Join |
The task is to judge
whether or not a smile (on an image) is a Dunchenne Simile. |
|
HITSpam-Crowdflower |
5380, 42762, 7.95 |
101 |
Spam Detection |
Text |
2 choose 1 |
Selection, Join |
The task is to judge
whether or not a HIT is a spam task. |
|
HITSpam-Mturk |
5840, 28354, 4.86 |
101 |
Spam Detection |
Text |
2 choose 1 |
Selection, Join |
The task is to judge
whether or not a HIT is a spam task. |
|
Query Document Relevance |
2165, 17395, 8.03 |
2165 |
Relevance Finding |
Text |
2 choose 1 |
Selection, Join |
The task is to
identify wehther a given query and a given document is relevant or not. |
|
AdultContent |
11040, 92721, 8.40 |
1517 |
Classification |
Text |
Rating (5 choices) |
Selection, Categorize,
Sort, Top-K, Join |
The task is to identify
the adult level of a website (G, P, R, X, B). |
|
Emotion |
700, 7000, 10 |
700 |
Emotion Rating |
Text |
Rating (choose a
value from -100 to 100) |
Selection, Sort,
Top-K, Join |
The task is to rate
the emotion of a given text. There are 7 emotions (anger, disgust, fear, joy,
sadness, surprise, valence), and a user gives a value from -100 to 100 for
each emotion about the text. |
|
Word
pair similarity |
30, 300, 10 |
30 |
Word Similarity Finding |
Text |
Rating (choose a
numerical score from 0 to 10) |
Selection, Sort, Top-K,
Join |
The task is to assign a
numerical similarity score between 0 and 10 to a given text. |
|
Recognizing
Textual Entailment |
800, 8000, 10 |
800 |
Textual
Understanding |
Text |
2 choose 1 |
Selection, Join |
The task is to
identify whether a given Hypothesis sentence is implied by the information in
the given text. |
|
Temporal
Ordering |
462, 4620, 10 |
462 |
Event Ordering |
Text |
2 choose 1 |
Selection, Sort, Top-K,
Join |
The task is to identify
whether or not one event happens before another event in a given context. |
|
Word
Sense Disambiguation |
177,1770, 10 |
177 |
Word Sense
Disambiguation |
Text |
K choose 1 |
Selection, Join |
The task is to
choose the most appropriate sense of a word (out of several given senses) in
the given context. |
|
Web search data |
2665, 15567, 5.84 |
2652 |
Relevance Finding |
Text |
Rating (5 choices) |
Selection, Sort, Top-K,
Join |
The task is to judge the
relevance of query-URL pairs with a 5-level rating scale (from 1 to 5). |
|
Duck |
108, 4212, 39 |
108 |
Duck Identification |
Image |
2 choose 1 |
Selection, Join |
The task is to
identify whether the image contains a duck or not. |
|
Dog |
807, 8070, 10 |
807 |
Dog Breed Identification |
Image |
4 choose 1 |
Selection, Categorize,
Join |
The task is to recognize
a breed (out of Norfolk Terrier, Norwich Terrier, Irish Wolfhound, and
Scottish Deerhound) for a given dog. |
|
Airline
Twitter sentiment |
55000,16000,3.4375 |
16,000 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
A sentiment analysis
job about the problems of each major U.S. airline. Twitter data was scraped
from February of 2015 and contributors were asked to first classify positive,
negative, and neutral tweets, followed by categorizing negative reasons (such
as "late flight" or "rude service"). |
|
Company
categorizations (with URLs) |
28675,7152,4 |
7,152 |
Categorization |
Text |
6 choose 1 |
Categorization |
A data set where business names
were matched with URLs/homepages for the named businesses. |
|
|
|
|
|
|
|
|
|
|
Part 2: Datasets with only ground truth ( no
workers' answers ) |
|
Clothing
pattern identification |
14,750 |
14,750 |
Classification |
Image |
K
choose 1 |
Selection, Categorize |
A large dataset where contributors viewed pictures
of dresses and classified their patterns. Sixteen popular pattern types were
provided and each dress was judged by three contributors. Dataset is provided
with their aggregated judgments and URLs for each dress. |
|
Sound
detection and classification |
8,000 |
8,000 |
Classification |
Audio |
K choose 1 |
Selection, Categorize |
Contributors listened to
short audio clips and identified white noise events like coughing, dropped
keys, and barking dogs. They also tried to identify the scene, such as
office, cafe, or supermarket and ranked the difficulty of each individual
row. Audio clips range from about five to ten seconds. |
|
Relevancy
of terms to a disaster relief topic |
7,566 |
7,566 |
Relevance Finding |
Text |
5 choose 1 |
Selection |
Contributors viewed
a topic and a term and rated the relevancy of the latter to the former on a
five point scale (1 being very irrelevant, 5 being very relevant). The topics
all center around humanitarian aid or disaster relief and each topic was
defined for contributors. They were also asked if the term was a specific
person or place and whether it was misspelled. |
|
Hate
speech identification |
14,442 |
14,442 |
Hate Sentiment
Identification |
Text |
3 choose 1 |
Selection, Categorize,
Join |
Contributors viewed
short text and identified if it a) contained hate speech, b) was offensive
but without hate speech, or c) was not offensive at all. Contains nearly 15K
rows with three contributor judgments per text string. |
|
Economic
News Article Tone and Relevance |
8,000 |
8,000 |
Relevance Finding |
Text |
K choose 1 |
Selection,
Categorize, Join |
Contributors read
snippets of news articles. They then noted if the article was relevant to the
US economy and, if so, what the tone of the article was. Tone was judged on a
9 point scale (from 1 to 9, with 1 representing the most negativity). Dataset
contains these judgments as well as the dates, source titles, and text. Dates
range from 1951 to 2014. |
|
Football
Strategy |
3,731 |
3,731 |
Football Strategy |
Text |
6 choose 1 |
Selection |
Contributors were
presented a football scenario and asked to note what the best coaching
decision would be. An scenario: "It is third down and 3. The ball is on
your opponent's 20 yard line. There are five seconds left. You are down by
4." The decisions presented were punt, pass, run, kick a field goal,
kneel down, or don't know. There are thousands of such scenarios in this job. |
|
Numerical
Transcription from Images |
7,665 |
7,665 |
Numerical
Transcription from Images |
Image |
Numerical
Transcription |
Selection |
Contributors looked
at a series of pictures from a footrace and transcribed bib numbers of the
competitors. Some images contain multiple bib numbers or incomplete bib
numbers. |
|
Identifying
key phrases in text |
8,262 |
8,262 |
Textual Understanding |
Text |
K choose 1 |
Selection, Join |
Contributors looked at
question/answer pairs (like "When did Bob Marley die? 1981") and a
series of sentences surrounding that event (such as phrases from a Bob Marley
obituary). They marked which of those sentences spoke directly to the question
(such as "Robert Nesta "Bob" Marley, OM (6 February 1945
– 11 May 1981) was a Jamaican reggae singer, song writer,
musician") for a wide variety of topics. |
|
Gender
classifier data |
20,000 |
20,000 |
Gender
Identification |
Text/Image |
3 choose 1 |
Selection,
Categorize, Join |
This data set was
used to train a CrowdFlower AI gender predictor. Contributors were asked to
simply view a Twitter profile and judge whether the user was a male, a
female, or a brand (non-individual). The dataset contains 20,000 rows, each
with a user name, a random tweet, account profile and image, location, and
even link and sidebar color. |
|
Sentiment
analyses of single words or short phrases |
3,523 |
3,523 |
Sentiment Analysis |
Text |
K choose 2 |
Selection, Join |
Contributors looked at
four words or bigrams (bigrams are just word pairs) and ranked the most
positive and most negative ones in each set. For example, they saw quartets
like "nasty, failure, honored, females" and chose which word was
the most positive and most negative. Interestingly, each set was graded by
eight contributors instead of the usual three. Dataset contains all 3,523
rows, but has 28K judgments. |
|
Disasters
on social media |
10,877 |
10,877 |
Fact Evaluation |
Text |
2 choose 1 |
Selection, Join |
Contributors looked
at over 10,000 tweets culled with a variety of searches like
"ablaze", "quarantine", and "pandemonium", then
noted whether the tweet referred to a disaster event (as opposed to a joke
with the word or a movie review or something non-disastrous). |
|
Do
these chemicals contribute to a disease? |
5,160 |
5,160 |
Fact Evaluation |
Text |
2 choose 1 |
Selection, Join |
Contributors read
sentences in which both a chemical (like Aspirin) and a disease (or
side-effect) were present. They then determined if the chemical directly
contributed to the disease or caused it. Dataset includes chemical names,
disease name, and aggregated judgments of five (as opposed to the usual
three) contributors. |
|
First
GOP debate sentiment analysis |
14,000 |
14,000 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
We looked through
tens of thousands of tweets about the early August GOP debate in Ohio and
asked contributors to do both sentiment analysis and data categorization.
Contributors were asked if the tweet was relevant, which candidate was
mentioned, what subject was mentioned, and then what the sentiment was for a
given tweet. We've removed the non-relevant messages from the uploaded
dataset. |
|
URL
categorization |
31,085 |
31,085 |
URL categorization |
Text |
K choose 1 |
Selection, Categorize,
Join |
To create this large,
enriched dataset of categorized websites, contributors clicked provided links
and selected a main and sub-category for URLs. |
|
Classification
of political social media |
5,000 |
5,000 |
Classification |
Text |
K choose 1 |
Selection,
Categorize, Join |
Contributors looked
at thousands of social media messages from US Senators and other American
politicians to classify their content. Messages were broken down into
audience (national or the tweeter's constituency), bias (neutral/bipartisan,
or biased/partisan), and finally tagged as the actual substance of the
message itself (options ranged from informational, announcement of a media
appearance, an attack on another candidate, etc.) |
|
eCommerce
search relevance |
32,000 |
32,000 |
Relevance Finding |
Text |
Rating |
Selection, Categorize |
We used this dataset to
launch our Kaggle competition, but the set posted here contains far more
information than what served as the foundation for that contest. This set
contains image URLs, rank on page, description for each product, search query
that lead to each result, and more, each from five major English-language
ecommerce sites. |
|
Housing
and wheelchair accessibility |
10,00 |
10,00 |
Fact Evaluation |
Image |
Label |
Selection, Join |
Here, contributors
viewed 10,000 Google maps images and marked whether they were residential
areas. If they were, they noted which homes were most prevalent in the area
(apartments or houses) and whether the area had proper sidewalks that are
wheelchair friendly. |
|
Primary
emotions of statements |
2,400 |
2,400 |
Sentiment Analysis |
Text |
18 choose 1 |
Selection, Join |
Contributors looked at a
single sentence and rated it's emotional content based on Plutchik's wheel of
emotions. 18 emotional choices were presented to contributors for grading. |
|
U.S.
economic performance based on news articles |
5,000 |
5,000 |
Relevance Finding |
Text |
Rating(Rating the
indication on a scale of 1-9, with 1 being negative and 9 being positive.) |
Selection,
Categorize |
Contributors viewed
a new article headline and a short, bolded excerpt of a sentence or two from
the attendant article. Next, they decided if the sentence in question
provided an indication of the U.S. economy's health, then rated the
indication on a scale of 1-9, with 1 being negative and 9 being
positive. |
|
Police-involved
fatalities since May 2013 |
2,355 |
2,355 |
Classification |
Text/Image |
K choose 1 |
Selection, Categorize |
A data categorization
job where contributors compiled a database of police-involved shootings over
a two-year span. Information contained includes: race, gender, city, state,
whether the victim was armed, photos of the deceased, attending news stories,
and more. |
|
Comparing
pictures of people |
59,476 |
59,476 |
Image Retrieval |
Image |
Rating (5 choices) |
Selection,
Collection, Join |
In this job,
contributors viewed two pictures of people walking through the same room and
were then asked to compare the person on the left to the person on the right.
Questions center on observable traits (like skin color, hair length,
muscularity, etc.). |
|
Twitter
sentiment analysis: Self-driving cars |
7,015 |
7,015 |
Sentiment Analysis |
Text |
Rating( very positive,
slightly positive, neutral, slightly negative, or very negative.) |
Selection, Join |
A simple twitter
sentiment analysis job where contributors read tweets and classified them as
very positive, slightly positive, neutral, slightly negative, or very
negative. They were also prompted asked to mark if the tweet was not relevant
to self-driving cars. |
|
Blockbuster
database |
410 |
410 |
Categorization |
Text |
Survey |
Selection,
Categorize |
A data
categorization job where we asked the crowd to find out information about the
ten most popular movies, each year, for the past 40 years (1975-2015). |
|
Government
official database |
5,000 |
5,000 |
Categorization |
Text |
K choose 1 |
Categorization |
A simple data
categorization job wherein contributors viewed a cabinet member, minister,
ambassador, etc., and separated their names from their titles. Data set
contains names, positions, and years served. |
|
Wikipedia
image categorization |
976 |
976 |
Image Categorization |
Image |
K choose 1 |
Selection,
Collection, Join |
This data set
contains hundreds of Wikipedia images which contributors categorized in the
following ways:No person present/One person present/Several people present,
but one dominant/Several people present, but none are dominant/Unsure. If the
images were of one or several people, contributors further classified images
by gender. |
|
Image
attribute tagging |
3,235 |
3,235 |
Image Categorization |
Image |
K choose 1 |
Selection, Collection,
Join |
Contributors viewed
thousands of images and categorized each based on a given list of attributes.
These attributes ranged from objective and specific (like "child"
or "motorbike") to more subjective ones (like "afraid" or
"beautiful"). Data set includes URLs for all images, multiple tags
for each, and contributor agreement scores. |
|
Mobile
search relevance |
647 |
647 |
Relevance Finding |
Text |
K choose 1 |
Selection,
Categorize, Join |
Contributors viewed
a variety of searches for mobile apps and determined if the intent of those
searches was matched. One was a short query like "music player";
the other, a much longer one like "I would like to download an app that
plays the music on the phone from multiple sources like Spotify and Pandora
and my library." |
|
Progressive
issues sentiment analysis |
1,159 |
1,159 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
Contributors viewed
tweets regarding a variety of left-leaning issues like legalization of
abortion, feminism, Hillary Clinton, etc. They then classified if the tweets
in question were for, against, or neutral on the issue (with an option for
none of the above). After this, they further classified each statement as to
whether they expressed a subjective opinion or gave facts. |
|
Indian
terrorism deaths database |
27,233 |
27,233 |
Fact Evaluation |
Text |
Survey |
Selection, Join |
Contributors read
sentences from the South Asia Terrorism Portal and quantified them.
Contributors counted the deaths mentioned in a sentence and whether they were
terrorists, civilians, or security forces. Database contains original
sentences, state and district in which the deaths occurred, dates of the
deaths, and more. (Test questions have been removed from the database for
ease of visualization.) |
|
Drug
relation database |
2,020 |
2,020 |
Fact Evaluation |
Text |
K choose 1 |
Selection, Join |
Contributors read color
coded sentences and determined what the relationship of a drug was to certain
symptoms or diseases. |
|
Blurry
image comparison |
511 |
511 |
Relevance Finding |
Image |
K choose 1 |
Selection,
Categorize, Join |
Contributors viewed
a pair of purposely blurry or saturated images. They were then asked which
image more closely matched a particular word. Data set contains URLs for all
images and image pairs, aggregated agreement scores, and variance amounts.
Notably, a high number of contributors were polled for each image pairing (20
in total for each, giving this data set upwards of 10,000 judgements). |
|
Objective
truths of sentences/concept pairs |
8,227 |
8,227 |
Fact Evaluation |
Text |
Rating (5 choices) |
Selection, Join |
Contributors read a
sentence with two concepts. For example "a dog is a kind of animal"
or "captain can have the same meaning as master." They were then
asked if the sentence could be true and ranked it on a 1-5 scale. On the low
end was "strongly disagree" and on the upper, "strongly
agree." |
|
Image
sentiment polarity classification |
15,613 |
15,613 |
Image Sentiment
Analysis |
Image |
5 choose 1 |
Selection, Join |
This data set
contains over fifteen thousand sentiment-scored images. Contributors were
shown a variety of pictures (everything from portraits of celebrities to
landscapes to stock photography) and asked to score the images on typical
positive/negative sentiment. Data set contains URL of images, sentiment
scores of highly positive, positive, neutral, negative, and highly negative,
and contributor agreement. |
|
Smart
phone & tablet names database |
1,600 |
1,600 |
Fact Evaluation |
Text |
2 choose 1 |
Selection, Join |
Contributors viewed a
particular model code (like C6730 or LGMS323), then searched for the name of
the device itself (Kyocera C6730 Hydro or LG Optimus L70), then noted whether
the device was a phone or tablet. |
|
Free
text object descriptions |
1,225 |
1,225 |
Image Retrieval |
Text |
Description |
Selection,
Collection, Join |
Contributors viewed
a pair of items and were asked to write sentences that describe and
differentiated the two objects. In other words, if viewing an apple and a
orange, they could not write "this is a piece of fruit" twice, but
needed to note how they were different. Image pairings varied so that the
same image would appear in different couples and the second image was always
smaller. Data set contains URLs of images and three sentences written per
item, per image. |
|
News
article / Wikipedia page pairings |
3,000 |
3,000 |
Relevance Finding |
Text |
2 choose 1 |
Selection, Categorize,
Join |
Contributors read a
short article and were asked which of two Wikipedia articles it matched most
closely. For example, a brief biography of Mel Gibson could be paired with
Gibson's general Wikipedia page or Lethal Weapon; likewise, Iran election
results could be paired with a Wikipedia page on Iran in general or the 2009
protests. Data set contains URLs for both Wiki pages, the full text
contributors read, and their judgements on each row. |
|
Is-A
linguistic relationships |
3,297 |
3,297 |
Fact Evaluation |
Text |
2 choose 1 |
Selection, Join |
Contributors were
provided a pair of concepts in a constant sentence structure. Namely: [Noun
1] is a [noun 2]. They were then asked to simply note if this sentence was
then true or false. Data set contains all nouns and aggregated T/F
judgements. |
|
Weather
sentiment |
1000 |
1000 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
Here, contributors were
asked to grade the sentiment of a particular tweet relating to the weather.
The catch is that 20 contributors graded each tweet. We then ran an
additional job (the one below) where we asked 10 contributors to grade the original
sentiment evaluation. |
|
Weather
sentiment evaluated |
1000 |
1000 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
Here, contributors
were asked if the crowd graded the sentiment of a particular tweet
relating to the weather correctly. The original job (above this one, called
simply "Weather sentiment") involved 20 contributors noting the
sentiment of weather-related tweets. In this job, we asked 10 contributors to
check that original sentiment evaluation for accuracy. |
|
Image
classification: People and food |
587 |
587 |
Image classification |
Image |
K choose 1 |
Selection, Categorize,
Join |
A collection of images
of people eating fruits and cakes and other foodstuffs. Contributors
classified the images by male/female, then by age (adult or
child/teenager). |
|
"All
oranges are lemons," a.k.a. Semantic relationships between two concepts |
3,536 |
3,536 |
Fact Evaluation |
Text/Image |
2 choose 1 |
Selection, Join |
An interesting
language data set about the relationship of broad concepts.All questions were
phrased in the following way: "All [x] are [y]." For example, a
contributor would see something like "All Toyotas are vehicles" and
were then asked to say whether this claim was true or false. Contributors
were also provided images, in case they were unclear as to what either
concept is. This data set includes links to both images provided, the names
given for [x] and [y], and whether the statement that "All [x] are
[y]" was true or false. |
|
The
colors of #TheDress |
1,000 |
1,000 |
Image Retrieval |
Image |
2 choose 1 |
Selection, Collection,
Join |
On February 27th, 2015,
the internet was briefly obsessed with the color of a dress known simply as
#TheDress. We ran a survey job to 1000 contributors and asked them what
colors the dress was, as well as looked into a hypothesis that Night Owls and
Morning People saw the dress differently. |
|
McDonald's
review sentiment |
1,500 |
1,500 |
Sentiment Analysis |
Text |
8 choose 1 |
Selection, Join |
A sentiment analysis
of negative McDonald's reviews. Contributors were given reviews culled from
low-rated McDonald's from random metro areas and asked to classify why the
locations received low reviews. |
|
Gender
breakdown of Time Magazine covers |
<100 |
<100 |
Gender Identification |
Image |
2 choose 1 |
Selection, Categorize,
Join |
Contributors were shown
images of Time Magazine covers since the late 1920s and asked to classify if
the person was male or female. Data is broken down overall and on an annual
basis. |
|
Agreement
between long and short sentences |
2,000 |
2,000 |
Relevance Finding |
Text |
3 choose 1 |
Selection,
Categorize, Join |
Contributors were
asked to read two sentences (the first was an image caption and the second
was a shorter version) and judge whether the short sentence adequately
describes the event in the first sentence (image caption). |
|
Biomedical
image modality |
10,652 |
10,652 |
Image classification |
Image |
K choose 1 |
Selection, Categorize,
Join |
A large data set of
labeled biomedical images, ranging from x-ray and ultrasound to charts,
graphs, and even hand-drawn sketches. |
|
Academy
Awards demographics |
416 |
416 |
Fact Retrieval |
Text/Image |
K choose 1 |
Selection,
Categorize, Sort, Top-K, Join |
A data set
concerning the race, religion, age, and other demographic details of all
Oscars winners since 1928 in the following categories:Best Actor/Best
Actress/Best Supporting Actor/Best Supporting Actress/Best Director. |
|
Corporate
messaging |
3,118 |
3,118 |
Categorization |
Text |
3 choose 1 |
Categorization |
A data categorization
job concerning what corporations actually talk about on social media.
Contributors were asked to classify statements as information (objective
statements about the company or it's activities), dialog (replies to users,
etc.), or action (messages that ask for votes or ask users to click on links,
etc.). |
|
Body
part relationships |
1,892 |
1,892 |
Fact Evaluation |
Text |
2 choose 1 |
Selection, Join |
A data set where
contributors classified if certain body parts were part of other parts.
Questions were phrased like so: "[Part 1] is a part of [part 2],"
or, by way of example, "Nose is a part of spine" or "Ear is a
part of head." |
|
Wearable
technology database |
582 |
582 |
Fact Evaluation |
Text |
K choose 1 |
Selection, Categorize,
Sort, Top-K, Join |
A data set containing
information on hundreds of wearables. Contains data on prices, company name
and location, URLs for all wearables, as well as the location of the body on
which the wearable is worn. |
|
Image
descriptions |
225,000 |
225,000 |
Image description |
Image |
2 choose 1 |
Selection,
Categorize, Join |
Contributors were
shown a large variety of images and asked whether a given word described the
image shown. For example, they might see a picture of Mickey Mouse and the
word Disneyland, where they'd mark "yes." Conversely, if Mickey
Mouse's pair word was "oatmeal," they would mark no. |
|
Sentence
plausibility |
400 |
400 |
Fact Evaluation |
Text |
Rating (5 choices) |
Selection, Categorize,
Join |
Contributors read
strange sentences and ranked them on a scale of "implausible" (1)
to "plausible" (5). Sentences were phrased in the following manner:
"This is not an [x], it is a [y]." |
|
Coachella
2015 Twitter sentiment |
3,847 |
3,847 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
A sentiment analysis
job about the lineup of Coachella 2015. |
|
Apple
Computers Twitter sentiment |
3,969 |
3,969 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
Contributors were given
a tweet and asked whether the user was positive, negative, or neutral about
Apple. (They were also allowed to mark "the tweet is not about the
company Apple, Inc.) |
|
How
beautiful is this image? (Part 1: People) |
3,500 |
3,500 |
Image classification |
Image |
Rating(They were
given a five-point scale, from "unacceptable" (blurry, red-eyed
images) to "exceptional" (hi-res, professional-quality
portraiture).) |
Selection,
Categorize, Join |
Here, contributors
were asked to rate image quality (as opposed to how pretty the
people in the images actually are). They were given a five-point scale, from
"unacceptable" (blurry, red-eyed images) to "exceptional"
(hi-res, professional-quality portraiture) and ranked a series of images
based on that criteria. |
|
How
beautiful is this image? (Part 2: Buildings and Architecture) |
3,500 |
3,500 |
Image classification |
Image |
Rating( They were given
a five-point scale, from "unacceptable" (out-of-focus cityscapes)
to "exceptional" (hi-res photos that might appear in a city guide
book) .) |
Selection, Categorize,
Join |
Here, contributors were
asked to rate image quality (as opposed to how gorgeous the
buildings in the images actually are). They were given a five-point scale,
from "unacceptable" (out-of-focus cityscapes) to
"exceptional" (hi-res photos that might appear in a city guide
book) and ranked a series of images based on that criteria. |
|
How
beautiful is this image? (Part 3: Animals) |
3,500 |
3,500 |
Image classification |
Image |
Rating( They were
given a five-point scale, from "unacceptable" (blurry photos of
pets) to "exceptional" (hi-res photos that might appear in text
books or magazines).) |
Selection,
Categorize, Join |
Here, contributors
were asked to rate image quality (as opposed to how adorable the
animals in the images actually are). They were given a five-point scale, from
"unacceptable" (blurry photos of pets) to "exceptional"
(hi-res photos that might appear in text books or magazines) and ranked a
series of images based on that criteria. |
|
Language:
Certainty of Events |
13,386 |
13,386 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
A linguistical data set
concerning the certainly an author has about a certain word. For example, in
the following sentence: "The dog ran out the door," if the word
"ran" was asked about, the certainty that the event did or will
happen would be high. |
|
New
England Patriots Deflategate sentiment |
11,814 |
11,814 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
Before the 2015
Super Bowl, there was a great deal of chatter around deflated footballs and
whether the Patriots cheated. This data set looks at Twitter sentiment on
important days during the scandal to gauge public sentiment about the whole
ordeal. |
|
Sports
Illustrated covers |
32,000 |
32,000 |
Image Retrieval |
Image |
K choose 1 |
Selection, Collection,
Join |
A data set listing the
sports that have been on the cover of Sports Illustrated since 1955. |
|
The
data behind data scientists |
974 |
974 |
Fact Retrieval |
Text |
Survey |
Collection |
A look into what
skills data scientists need and what programs they use. A part of our 2015
data scientist report which you can download. |
|
Judge
the relatedness of familiar words and made-up ones |
300 |
300 |
Relevance Finding |
Text |
Rating(1-5, from
completely unrelated to very strongly related, respectively.) |
Selection, Categorize,
Join |
Contributors were given
a nonce word and a real word, for example, "leebaf" and
"iguana." They were given a sentence with the nonce word in it and
asked to note how related the nonce word and real word were. Here's a sample
question: "Large numbers of leebaf skins are exported to Latin America
to be made into handbags, shoes and watch straps." Contributors then ranked the relation
of "leebaf" to "iguana" on a scale of 1-5, from
completely unrelated to very strongly related, respectively. |
|
2015
New Year's resolutions |
5,011 |
5,011 |
Sentiment Analysis |
Text |
K choose 1 |
Selection, Join |
A Twitter sentiment
analysis of users' 2015 New Year's resolutions. Contains demographic and
geographical data of users and resolution categorizations. |
|
Smart
phone app functionality |
1,898 |
1,898 |
Fact Retrieval |
Text |
K choose 1 |
Collection |
Contributors read an app
description, then selected the app's functionality from a pre-chosen list.
Functionalities ranged from SMS to flashlight to weather to whether or not
they used a phone's contacts. Contributors were allowed to select as many
functionalities as applied for each app. |
|
Naturalness
of computer generated images |
600 |
600 |
Image classification |
Image |
K choose 1 |
Selection,
Categorize, Join |
Contributors viewed
two rather bizarre looking images and were asked which was more
"natural." Images were all computer generated faces of people in
various states of oddness. |
|
National
Park locations |
323 |
323 |
Fact Retrieval |
Text |
Label |
Collection |
A large data set
containing the official URLs of United States national and state parks. |
|
Colors
in 9 Languages |
4,000 |
4,000 |
Fact Retrieval |
Text |
K choose 1 |
Collection |
Dataset of 4000
crowd-named colors in 9 languages. Includes the RGB color, the native
language color, and the translated color. |
|
Judge
emotions about nuclear energy from Twitter |
190 |
190 |
Sentiment Analysis |
Text |
5 choose 1 |
Selection, Join |
This dataset is a
collection of tweets related to nuclear energy along with the crowd's
evaluation of the tweet's sentiment. The possible sentiment categories are:
"Positive", "Negative", "Neutral / author is just
sharing information", "Tweet NOT related to nuclear energy",
and "I can't tell". We also provide an estimation of the crowds'
confidence that each category is correct which can be used to identify tweets
whose sentiment may be unclear. |
|
Decide
whether two English sentences are related |
555 |
555 |
Relevance Finding |
Text |
Rating (5 choices) |
Selection,
Categorize, Join |
This dataset is a
collection of English sentence pairs. The crowd was asked about the truth
value of the second sentence if the first sentence were true and to what
extent the sentences are related on a scale of 1 to 5. The variance of this
score over the crowd's judgments is included as well. |
|
Similarity
judgement of word combinations |
6,274 |
6,274 |
Relevance Finding |
Text |
7 choose 1 |
Selection, Categorize,
Join |
Contributors were asked
to evaluate how similar are two sets of words on a seven point scale with 1
being "completely different" and 7 being "exactly the
same". |
|
Sentiment
Analysis - Global Warming/Climate Change |
6,090 |
6,090 |
Sentiment Analysis |
Text |
3 choose 1 |
Selection, Join |
Contributors
evaluated tweets for belief in the existence of global warming or climate
change. The possible answers were "Yes" if the tweet suggests
global warming is occuring, "No" if the tweet suggests global
warming is not occuring, and "I can't tell" if the wtweet is
ambiguous or unrelated to global warming. We also provide a confidence score
for the classification of each tweet. |
|
Judge
Emotion About Brands & Products |
9,093 |
9,093 |
Sentiment Analysis |
Text |
3 choose 1 |
Selection, Join |
Contributors evaluated
tweets about multiple brands and products. The crowd was asked if the tweet
expressed positive, negative or no emotion towards a brand and/or product. If
some emotion was expressed they were also asked to say which brand or product
was the target of that emotion. |
|
Claritin
Twitter |
4,900 |
4,900 |
Relevance Finding |
Text |
K choose 1 |
Selection,
Categorize |
This dataset has all
tweets that mention Claritin for October, 2012. The tweets are
tagged with sentiment, the author's gender, and whether or not they mention
any of the top 10 adverse events reported to the FDA. |
|
|
|
|
|