Space-based tokenisation can be used for most languages like (Select all that
apply)
There is at least one mistake.For example, choice (a) should be
False.
There is at least one mistake.For example, choice (b) should be
False.
There is at least one mistake.For example, choice (c) should be
True.
There is at least one mistake.For example, choice (d) should be
False.
There is at least one mistake.For example, choice (e) should be
True.
There is at least one mistake.For example, choice (f) should be
True.
Correct!
- False
- False
- True
- False
- True
- True
Space-based tokenisation can be done with unix tools like:
Choice (d) is Incorrect.
this removes specific parts of a line
based on position
We can remove punctuation since it holds no real meaning for machines
Byte-Pair Encoding is composed of 2 parts which are
Byte-Pair Encoding often includes subwords like morphemes
Stemming is a more sophisticated form of lemmatization
1
Reducing relational to relate via a rule that turns “ATIONAL” TO “ATE” is an
example of
The most problematic symbol when segmenting sentences:
Choice (c) is Incorrect.
that doesn’t even separate sentences…
What can we use to deal with symbols that may mark an end of sentence or part of a
word?
What can we use to mitigate email scams?
Hybrid machine translation relies on
The first layer of Text classification is
It is not possible to generate classification features by hand
Which of the following is not true about the BOW representation for text? (Bag of
words)
What solves the space dimensionality problem of BOW?
There is at least one mistake.For example, choice (a) should be
True.
There is at least one mistake.For example, choice (b) should be
False.
There is at least one mistake.For example, choice (c) should be
False.
There is at least one mistake.For example, choice (d) should be
True.
NBOW
is correct too :)
Correct!
- True
- False
- False
- True NBOW
is correct too :)
NBOW retains the problem of the Bag of Words where all words are treated
equally
Given the data below
| | Doc | Sentence | Class |
| | 1 | Captain Price cool guy | COD |
| | 2 | Ana, Winston captain? | OW |
| | 3 | Bravo six dark | COD |
| | 4 | Winston years ago | OW |
| | 5 | Ana Winston Winston | OW |
Find the probability for the classification of the following sentence “Winston
Ana dark” for it being the class OW (2 non zero decimal places)
Incorrect. Please try again
Given the data below
| | Doc | Sentence | Class |
| | 1 | Captain Price cool guy | COD |
| | 2 | Ana, Winston captain? | OW |
| | 3 | Bravo six dark | COD |
| | 4 | Winston years ago | OW |
| | 5 | Ana Winston Winston | OW |
Find the probability for the classification of the following sentence “Winston
Ana dark” for it being the class COD (2 non zero decimal places)
Incorrect. Please try again