When marked, your coursework will be available for collection in the tutorial. Please make sure that you pick up the coursework in person since the tutor may have questions about your answers. Failing to address them satisfatorily may cause you to loose marks.
Model solutions for each week's coursework will be released shortly after the hand-in deadline. I suggest you look at these.
There is a lot of confusion between transition tables for NFAs and DFAs, and the subset construction.
A transition table represents the transition function δ by having the state input to the function in the left-most column, the symbol input in the top row, and the output state(s) in the body of the table.
Now, δ for DFAs takes a single state as input and produces a single state as output. Therefore the entries in the left-most column should be single states, as should the entries in the body of the table. Whereas for an NFA, the input is a single state (and symbol), but the output is a set of states. Consequently, the entries in the left-most column should be single NFA states, but the entries in the body of the table should be sets of NFA states.
When performing the subset construction, you are constructing a transition table for a new DFA that is equivalent to the NFA you started with. Thus each entry (left column and body) should be a single state for it to be a valid DFA transition table. However, each state in this DFA represents a set of states in the NFA, and is named after that set of states. Thus each entry in the table will be named as a set of states, one of which will probably be named the empty set.
{a}{b,c} = {ab,ac} {a}{b,c} ≠ {ba,ca} {b,c}{a} = {ba,ca}
Only if stated in the definition of the language. In terms of finite automata, a language only accepts the empty word if at least one initial state is accepting.
When using the subset construction to transform an NFA into a DFA, remember that each state in the DFA represents a set of states from the NFA. So the initial state of a DFA is the set of all initial states from the NFA.
Each state in the DFA is a set of states from the NFA. If, for a given symbol, we can reach no state from the current set of states in an NFA, then the set of possible states is the empty set. This is still a valid set of states however, and needs to appear in the corresponding DFA. You also need to put on the transitions from this state, as a DFA must have a transition from every state for every possible input symbol. However, you will notice that these transitions always point back to the state containing the empty set.
The transition diagram (and really the transition table as well, if properly annotated with information on start and accepting states) is one representation of the DFA! Yes, one can go the last little bit and write it out as a five tuple of set of states, alphabet, and so on, but there is little point (unless explicitly asked) since all (essential) information is contained in the transition diagram.
Minimising NFAs is not as easy as minimising DFAs. You can (and will often be asked to) remove states from an NFA from which an accepting state cannot be reached, but this does not guarantee a minimal NFA. (It can save you a lot of work though.) However, for the purposes of this module you will not be required to minimise NFAs directly. You will only have to minimise DFAs using the table filling algorithm; so in the case of an NFA you will be asked to convert it to a DFA first.
Sometimes you can intuitively deduce a DFA from a language specification, but sometimes it is not immediately obvious how to go about it. In such cases, you will need to construct the DFA in stages, until you can identify which states can be merged.
If you compare the definitions of δ̂ for DFAs and NFAs, you will notice that in the case of a DFA it maps a state (and a word) to a state, but in the case of an NFA it maps a set of states (and a word) to a set of states.
As δ̂N takes a set of states as input, not a single state, it would be invalid to begin with just a single state. If the NFA only has one start state q0, you would begin with the set containing q0. Thus when an NFA has several start states, you would apply δ̂N to the set containing those start states.
The pumping lemma says:
Given a regular language L, there exists a constant natural number N such that all strings in the language (let w be a word in the language) that have length at least n (|w| ≥ n), can be split into three words, w = xyz, satisfying:
Notice that the third assertion quantifies over all natural numbers k. Thus if the language is regular, it will be true for all k. So if we assert that L is a regular language, we only need to find a single instance of k to be a counter example. This disproves the assertion, thereby proving L is not regular by contradiction.
I'd suggest trying very small values of k, such as 0 or 2, for ease of working. In most cases, any value of k should suffice. (But not k = 1, can you see why?)
When choosing x, y and z, we choose y such that it is the first cycle. By this we mean that there can be no other cycles within y, nor any cycles within x.
So, were there to have been a cycle within y, then that is the cycle we would have chosen as the first cycle.
The symbol epsilon is not a symbol in the alphabet. It instead denotes the empty word, which is the word made up of zero symbols. Whenever the empty word gets concatenated to any other word (whether that be another empty word, a single symbol, or a sequence of two or more symbols), the epsilon symbol can be removed. It is only required when representing an empty word.
The "directly derives" relation ⇒ means derived in exactly one production step. It is a relation from a string made up of terminals and nonterminals to another string made up of terminals and nonterminals. This is expressed mathematically as:
⇒ ⊆ (N ∪ T)* × (N ∪ T)*
So A ⇒ B is valid if we can derive B from A in one step by applying a production rule. A string is derivable if we can derive it from the start symbol by applying ⇒ zero or more times (written ⇒*). However, just because a string is derivable doesn't mean it is a word in the language. Only derivable strings consisting only of terminal symbols are words in the language.
However, when dealing with CFGs, the term is used with a slightly different meaning. We describe a binary operator as associating in a certain direction, based on the structure of the derivation for bracketless expressions involving multiple uses of that operator. This is best seen by example. Consider the expression∀ a b c. a ⊗ (b ⊗ c) = (a ⊗ b) ⊗ c
We say that ⊗ associates to the left if we want to consider the expression to have the structurea ⊗ b ⊗ c
and that ⊗ associates to the right if we want to consider the expression to have the structure(a ⊗ b) ⊗ c
We can also describe binary operators as non-associative if we desire bracketless expressions involving multiple uses of the operator to be invalid. For example,a ⊗ (b ⊗ c)
a ≤ b ≤ c
When it comes to writing a grammar to enforce these associativity directions, they generally have the following forms:
R → E ≤ E | E
E → E ⊕ T | T
T → F ⊗ T | F
To understand why this is the case, try drawing some derivation trees for simple expressions. First try to derive them with the desired structure, then try to derive them such that they associate in the other direction. You will soon see that only one derivation is possible (making the grammar unambiguous), and that it is the desired associativity.
If the language has parentheses, then it will usually be the case that it should be possible to use them to override the default associativity rules. However, because the parentheses have to be explicitly included in the word, the grammar can remain unambiguous. Parentheses usually have the highest precedence, and make a recursive call to the production with the lowest precedence.
R → E ≤ E | E E → E ⊕ T | T T → F ⊗ T | F F → ( R ) | P
The standard convention is that parentheses can contain any expression. This allows for somewhat unintuitive words in the case of non-associative binary operators, such as (x≤3)≥y, but usually such words are accepted. When implementing a compiler, it is in the (later) type-checking phase that invalid expressions of that form are rejected.
While it's entirely possible that your PDA only accepts the empty word, there is also a common mistake that you could be making.
Remember that ε-transitions do not read a symbol from the input word; they can be applied whenever the stack symbol matches. A common misconception is that they can only be applied when the stack is empty; this is not the case. And a common (but not certain) consequence of this mistake is that the PDA may appear to only accept the empty word.
Here's an example that illustrates the problem with with limited backtracking.
Consider
with parsing functionA → aA | ε
andtype Token = Char parseA :: [Token] -> Maybe [Token] parseA ('a' : ts) = parseA ts parseA ts = Just ts
with a parsing functionC → ab
Then we write a parsing function forparseC :: [Token] -> Maybe [Token] parseC ('a' : 'b' : ts) = Just ts parseC _ = Nothing
in the usual way:S → AC
Is it going to work? Consider the word "ab". It can be derived from "S" by using the productions "A → ε" and "C → ab". ButparseS :: [Token] -> Maybe [Token] parseS ts = case parseA ts of Nothing -> Nothing Just ts' -> parseC ts'
Why? BecauseparseS "ab" = Nothing
That is, it consumed the first "a", and thenparseA "ab" = Just "b"
So maybe we should write parseA the other way around?parseC "b" = Nothing
Now,parseA :: [Token] -> Maybe [Token] parseA ts = Just ts parseA ('a' : ts) = parseA ts
andparseA "ab" = Just "ab"
SoparseC "ab" = Just ""
Success. But considerparseS "ab" = Just ""
This is now going to fail. Why? Because this time, we need parseA to consume exactly one "a" That is, the first time round, it should succeed according to the productionparseS "aab"
and the second time round, it should succeed according to the productionA → aA
This example shows that there is no right order in which to try the productions for A. So limited backtracking is doomed to fail one way or another in general. It will work for certain cases, though, so it is not useless, but it has to be used with care.A → ε
Question 1 (the compulsory multiple choice one) spans the entire module. The remaining questions tend to be more focused. But I'm not saying they are completely focused, and I cannot comment on the exact distribution of topics over questions.
I would strongly discourage speculative, selective, revision of the kind "if I know DFAs and NFAs only, I'll be all set, as I then would have passed last year's exam". (As an example: I'm not saying that was the case, I have not checked.)
Last updated 3rd October 2011.