If you need encoding, pass in stream/reader with correct encoding.
This event may be reported during SLL prediction in cases where the
conflicting SLL configuration set provides sufficient information to
determine that the SLL conflict is truly an ambiguity. For example, if none
of the ATN configurations in the conflicting SLL configuration set have
traversed a global follow transition (i.e.
In some cases, the minimum represented alternative in the conflicting LL
configuration set is not equal to the minimum represented alternative in the
conflicting SLL configuration set. Grammars and inputs which result in this
scenario are unable to use
If
closure() tracks the depth of how far we dip into the outer context: depth > 0. Note that it may not be totally accurate depth since I don't ever decrement. TODO: make it a boolean then
For memory efficiency, the {@link #isPrecedenceFilterSuppressed} method is also backed by this field. Since the field is publicly accessible, the highest bit which would not cause the value to become negative is used to store this field. This choice minimizes the risk that code which only compares this value to 0 would be affected by the new purpose of the flag. It also ensures the performance of the existing {@link ATNConfig} constructors as well as certain operations like {@link ATNConfigSet#add(ATNConfig, DoubleKeyMap)} method are completely unaffected by the change.
This method updates {@link #dipsIntoOuterContext} and {@link #hasSemanticContext} when necessary.
This cache makes a huge difference in memory and a little bit in speed. For the Java grammar on java.*, it dropped the memory requirements at the end from 25M to 16M. We don't store any of the full context graphs in the DFA because they are limited to local context only, but apparently there's a lot of repetition there as well. We optimize the config contexts before storing the config set in the DFA states by literally rebuilding them with cached subgraphs only.
I tried a cache for use during closure operations, that was whacked after each adaptivePredict(). It cost a little bit more time I think and doesn't save on the overall footprint so it's not worth the complexity.
For the
In some cases, the unique alternative identified by LL prediction is not
equal to the minimum represented alternative in the conflicting SLL
configuration set. Grammars and inputs which result in this scenario are
unable to use
Parsing performance in ANTLR 4 is heavily influenced by both static factors (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the choice of input and the state of the DFA cache at the time profiling operations are started). For best results, gather and use aggregate statistics from a large sample of inputs representing the inputs expected in production before using the results to make changes in the grammar.
The value of this field contains the sum of differential results obtained by {@link System#nanoTime()}, and is not adjusted to compensate for JIT and/or garbage collection overhead. For best accuracy, use a modern JVM implementation that provides precise results from {@link System#nanoTime()}, and perform profiling in a separate process which is warmed up by parsing the input prior to profiling. If desired, call {@link ATNSimulator#clearDFA} to reset the DFA cache to its initial state before starting the profiling measurement pass.
If DFA caching of SLL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the SLL parsing algorithm will use ATN transitions exclusively.
@see #SLL_ATNTransitions @see ParserATNSimulator#computeTargetState @see LexerATNSimulator#computeTargetStateIf the ATN simulator implementation does not use DFA caching for SLL transitions, this value will be 0.
@see ParserATNSimulator#getExistingTargetState @see LexerATNSimulator#getExistingTargetStateNote that this value is not related to whether or not {@link PredictionMode#SLL} may be used successfully with a particular grammar. If the ambiguity resolution algorithm applied to the SLL conflicts for this decision produce the same result as LL prediction for this decision, {@link PredictionMode#SLL} would produce the same overall parsing result as {@link PredictionMode#LL}.
If DFA caching of LL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the LL parsing algorithm will use ATN transitions exclusively.
@see #LL_DFATransitions @see ParserATNSimulator#computeTargetState @see LexerATNSimulator#computeTargetStateIf the ATN simulator implementation does not use DFA caching for LL transitions, this value will be 0.
@see ParserATNSimulator#getExistingTargetState @see LexerATNSimulator#getExistingTargetStateMany lexer commands, including
For position-dependent actions, the input stream must already be positioned correctly prior to calling this method.
The executor tracks position information for position-dependent lexer actions
efficiently, ensuring that actions appearing only at the end of the rule do
not cause bloating of the
Normally, when the executor encounters lexer actions where
Prior to traversing a match transition in the ATN, the current offset from the token start index is assigned to all position-dependent lexer actions which have not already been assigned a fixed offset. By storing the offsets relative to the token start index, the DFA representation of lexer actions which appear in the middle of tokens remains efficient due to sharing among tokens of the same length, regardless of their absolute position in the input stream.
If the current executor already has offsets assigned to all
position-dependent lexer actions, the method returns
This method calls
If {@code speculative} is {@code true}, this method was called before {@link #consume} for the matched character. This method should call {@link #consume} before evaluating the predicate to ensure position sensitive values, including {@link Lexer#getText}, {@link Lexer#getLine}, and {@link Lexer#getCharPositionInLine}, properly reflect the current lexer state. This method should restore {@code input} and the simulator to the original state before returning (i.e. undo the actions made by the call to {@link #consume}.
@param input The input stream. @param ruleIndex The rule containing the predicate. @param predIndex The index of the predicate within the rule. @param speculative {@code true} if the current index in {@code input} is one character before the predicate's location. @return {@code true} if the specified predicate evaluates to {@code true}.We track these variables separately for the DFA and ATN simulation because the DFA simulation often has to fail over to the ATN simulation. If the ATN simulation fails, we need the DFA to fall back to its previously accepted state, if any. If the ATN succeeds, then the ATN does the accept and the DFA simulator that invoked it can simply return the predicted token type.
This action is implemented by calling
This class may represent embedded actions created with the {...}
syntax in ANTLR 4, as well as actions created for lexer commands where the
command argument could not be evaluated when the grammar was compiled.
Custom actions are position-dependent since they may represent a
user-defined embedded action which makes calls to methods like
Custom actions are implemented by calling
This action is not serialized as part of the ATN, and is only required for
position-dependent lexer actions which appear at a location other than the
end of a rule. For more information about DFA optimizations employed for
lexer actions, see
Note: This class is only required for lexer actions for which
This method calls
This action is implemented by calling
The
This action is implemented by calling
The
This action is implemented by calling
This action is implemented by calling
The
This action is implemented by calling
This action is implemented by calling
If {@code ctx} is {@code null} and the end of the rule containing {@code s} is reached, {@link Token#EPSILON} is added to the result set. If {@code ctx} is not {@code null} and the end of the outermost rule is reached, {@link Token#EOF} is added to the result set.
@param s the ATN state @param ctx the complete parser context, or {@code null} if the context should be ignored @return The set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}.If {@code ctx} is {@code null} and the end of the rule containing {@code s} is reached, {@link Token#EPSILON} is added to the result set. If {@code ctx} is not {@code null} and the end of the outermost rule is reached, {@link Token#EOF} is added to the result set.
@param s the ATN state @param stopState the ATN state to stop at. This can be a {@link BlockEndState} to detect epsilon paths through a closure. @param ctx the complete parser context, or {@code null} if the context should be ignored @return The set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}.This value is the sum of {@link #getTotalSLLATNLookaheadOps} and {@link #getTotalLLATNLookaheadOps}.
The basic complexity of the adaptive strategy makes it harder to understand. We begin with ATN simulation to build paths in a DFA. Subsequent prediction requests go through the DFA first. If they reach a state without an edge for the current symbol, the algorithm fails over to the ATN simulation to complete the DFA path for the current input (until it finds a conflict state or uniquely predicting state).
All of that is done without using the outer context because we want to create a DFA that is not dependent upon the rule invocation stack when we do a prediction. One DFA works in all contexts. We avoid using context not necessarily because it's slower, although it can be, but because of the DFA caching problem. The closure routine only considers the rule invocation stack created during prediction beginning in the decision rule. For example, if prediction occurs without invoking another rule's ATN, there are no context stacks in the configurations. When lack of context leads to a conflict, we don't know if it's an ambiguity or a weakness in the strong LL(*) parsing strategy (versus full LL(*)).
When SLL yields a configuration set with conflict, we rewind the input and retry the ATN simulation, this time using full outer context without adding to the DFA. Configuration context stacks will be the full invocation stacks from the start rule. If we get a conflict using full context, then we can definitively say we have a true ambiguity for that input sequence. If we don't get a conflict, it implies that the decision is sensitive to the outer context. (It is not context-sensitive in the sense of context-sensitive grammars.)
The next time we reach this DFA state with an SLL conflict, through DFA simulation, we will again retry the ATN simulation using full context mode. This is slow because we can't save the results and have to "interpret" the ATN each time we get that input.
CACHING FULL CONTEXT PREDICTIONS
We could cache results from full context to predicted alternative easily and that saves a lot of time but doesn't work in presence of predicates. The set of visible predicates from the ATN start state changes depending on the context, because closure can fall off the end of a rule. I tried to cache tuples (stack context, semantic context, predicted alt) but it was slower than interpreting and much more complicated. Also required a huge amount of memory. The goal is not to create the world's fastest parser anyway. I'd like to keep this algorithm simple. By launching multiple threads, we can improve the speed of parsing across a large number of files.
There is no strict ordering between the amount of input used by SLL vs LL, which makes it really hard to build a cache for full context. Let's say that we have input A B C that leads to an SLL conflict with full context X. That implies that using X we might only use A B but we could also use A B C D to resolve conflict. Input A B C D could predict alternative 1 in one position in the input and A B C E could predict alternative 2 in another position in input. The conflicting SLL configurations could still be non-unique in the full context prediction, which would lead us to requiring more input than the original A B C. To make a prediction cache work, we have to track the exact input used during the previous prediction. That amounts to a cache that maps X to a specific DFA for that context.
Something should be done for left-recursive expression predictions. They are likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry with full LL thing Sam does.
AVOIDING FULL CONTEXT PREDICTION
We avoid doing full context retry when the outer context is empty, we did not dip into the outer context by falling off the end of the decision state rule, or when we force SLL mode.
As an example of the not dip into outer context case, consider as super constructor calls versus function calls. One grammar might look like this:
ctorBody
: '{' superCall? stat* '}'
;
Or, you might see something like
stat
: superCall ';'
| expression ';'
| ...
;
In both cases I believe that no closure operations will dip into the outer context. In the first case ctorBody in the worst case will stop at the '}'. In the 2nd case it should stop at the ';'. Both cases should stay within the entry rule and not dip into the outer context.
PREDICATES
Predicates are always evaluated if present in either SLL or LL both. SLL and LL simulation deals with predicates differently. SLL collects predicates as it performs closure operations like ANTLR v3 did. It delays predicate evaluation until it reaches and accept state. This allows us to cache the SLL ATN simulation whereas, if we had evaluated predicates on-the-fly during closure, the DFA state configuration sets would be different and we couldn't build up a suitable DFA.
When building a DFA accept state during ATN simulation, we evaluate any predicates and return the sole semantically valid alternative. If there is more than 1 alternative, we report an ambiguity. If there are 0 alternatives, we throw an exception. Alternatives without predicates act like they have true predicates. The simple way to think about it is to strip away all alternatives with false predicates and choose the minimum alternative that remains.
When we start in the DFA and reach an accept state that's predicated, we test those and return the minimum semantically viable alternative. If no alternatives are viable, we throw an exception.
During full LL ATN simulation, closure always evaluates predicates and on-the-fly. This is crucial to reducing the configuration set size during closure. It hits a landmine when parsing with the Java grammar, for example, without this on-the-fly evaluation.
SHARING DFA
All instances of the same parser share the same decision DFAs through a static field. Each instance gets its own ATN simulator but they share the same {@link #decisionToDFA} field. They also share a {@link PredictionContextCache} object that makes sure that all {@link PredictionContext} objects are shared among the DFA states. This makes a big size difference.
THREAD SAFETY
The {@link ParserATNSimulator} locks on the {@link #decisionToDFA} field when it adds a new DFA object to that array. {@link #addDFAEdge} locks on the DFA for the current decision when setting the {@link DFAState#edges} field. {@link #addDFAState} locks on the DFA for the current decision when looking up a DFA state to see if it already exists. We must make sure that all requests to add DFA states that are equivalent result in the same shared DFA object. This is because lots of threads will be trying to update the DFA at once. The {@link #addDFAState} method also locks inside the DFA lock but this time on the shared context cache when it rebuilds the configurations' {@link PredictionContext} objects using cached subgraphs/nodes. No other locking occurs, even during DFA simulation. This is safe as long as we can guarantee that all threads referencing {@code s.edge[t]} get the same physical target {@link DFAState}, or {@code null}. Once into the DFA, the DFA simulation does not reference the {@link DFA#states} map. It follows the {@link DFAState#edges} field to new targets. The DFA simulator will either find {@link DFAState#edges} to be {@code null}, to be non-{@code null} and {@code dfa.edges[t]} null, or {@code dfa.edges[t]} to be non-null. The {@link #addDFAEdge} method could be racing to set the field but in either case the DFA simulator works; if {@code null}, and requests ATN simulation. It could also race trying to get {@code dfa.edges[t]}, but either way it will work because it's not doing a test and set operation.
Starting with SLL then failing to combined SLL/LL (Two-Stage Parsing)
Sam pointed out that if SLL does not give a syntax error, then there is no point in doing full LL, which is slower. We only have to try LL if we get a syntax error. For maximum speed, Sam starts the parser set to pure SLL mode with the {@link BailErrorStrategy}:
parser.{@link Parser#getInterpreter() getInterpreter()}.{@link #setPredictionMode setPredictionMode}{@code (}{@link PredictionMode#SLL}{@code )};
parser.{@link Parser#setErrorHandler setErrorHandler}(new {@link BailErrorStrategy}());
If it does not get a syntax error, then we're done. If it does get a syntax error, we need to retry with the combined SLL/LL strategy.
The reason this works is as follows. If there are no SLL conflicts, then the grammar is SLL (at least for that input set). If there is an SLL conflict, the full LL analysis must yield a set of viable alternatives which is a subset of the alternatives reported by SLL. If the LL set is a singleton, then the grammar is LL but not SLL. If the LL set is the same size as the SLL set, the decision is SLL. If the LL set has size > 1, then that decision is truly ambiguous on the current input. If the LL set is smaller, then the SLL conflict resolution might choose an alternative that the full LL would rule out as a possibility based upon better context information. If that's the case, then the SLL parse will definitely get an error because the full LL analysis says it's not viable. If SLL conflict resolution chooses an alternative within the LL set, them both SLL and LL would choose the same alternative because they both choose the minimum of multiple conflicting alternatives.
Let's say we have a set of SLL conflicting alternatives {@code {1, 2, 3}} and a smaller LL set called s. If s is {@code {2, 3}}, then SLL parsing will get an error because SLL will pursue alternative 1. If s is {@code {1, 2}} or {@code {1, 3}} then both SLL and LL will choose the same alternative because alternative one is the minimum of either set. If s is {@code {2}} or {@code {3}} then SLL will get a syntax error. If s is {@code {1}} then SLL will succeed.
Of course, if the input is invalid, then we will get an error for sure in both SLL and LL parsing. Erroneous input will therefore require 2 passes over the input.
When {@code lookToEndOfRule} is true, this method uses {@link ATN#nextTokens} for each configuration in {@code configs} which is not already in a rule stop state to see if a rule stop state is reachable from the configuration via epsilon-only transitions.
@param configs the configuration set to update @param lookToEndOfRule when true, this method checks for rule stop states reachable by epsilon-only transitions from each configuration in {@code configs}. @return {@code configs} if all configurations in {@code configs} are in a rule stop state, otherwise return a new configuration set containing only the configurations from {@code configs} which are in a rule stop stateThe prediction context must be considered by this filter to address situations like the following.
grammar TA;
prog: statement* EOF;
statement: letterA | statement letterA 'b' ;
letterA: 'a';
If the above grammar, the ATN state immediately before the token reference {@code 'a'} in {@code letterA} is reachable from the left edge of both the primary and closure blocks of the left-recursive rule {@code statement}. The prediction context associated with each of these configurations distinguishes between them, and prevents the alternative which stepped out to {@code prog} (and then back in to {@code statement} from being eliminated by the filter.
@param configs The configuration set computed by {@link #computeStartState} as the start state for the DFA. @return The transformed configuration set representing the start state for a precedence DFA at a particular precedence level (determined by calling {@link Parser#getPrecedence}).The default implementation of this method uses the following algorithm to identify an ATN configuration which successfully parsed the decision entry rule. Choosing such an alternative ensures that the {@link ParserRuleContext} returned by the calling rule will be complete and valid, and the syntax error will be reported later at a more localized location.
In some scenarios, the algorithm described above could predict an alternative which will result in a {@link FailedPredicateException} in the parser. Specifically, this could occur if the only configuration capable of successfully parsing to the end of the decision rule is blocked by a semantic predicate. By choosing this alternative within {@link #adaptivePredict} instead of throwing a {@link NoViableAltException}, the resulting {@link FailedPredicateException} in the parser will identify the specific predicate which is preventing the parser from successfully parsing the decision rule, which helps developers identify and correct logic errors in semantic predicates.
@param configs The ATN configurations which were valid immediately before the {@link #ERROR} state was reached @param outerContext The is the \gamma_0 initial parser context from the paper or the parser stack at the instant before prediction commences. @return The value to return from {@link #adaptivePredict}, or {@link ATN#INVALID_ALT_NUMBER} if a suitable alternative was not identified and {@link #adaptivePredict} should report an error instead.This method might not be called for every semantic context evaluated during the prediction process. In particular, we currently do not evaluate the following but it may change in the future:
If {@code to} is {@code null}, this method returns {@code null}. Otherwise, this method returns the {@link DFAState} returned by calling {@link #addDFAState} for the {@code to} state.
@param dfa The DFA @param from The source state for the edge @param t The input symbol @param to The target state for the edge @return If {@code to} is {@code null}, this method returns {@code null}; otherwise this method returns the result of calling {@link #addDFAState} on {@code to}If {@code D} is {@link #ERROR}, this method returns {@link #ERROR} and does not change the DFA.
@param dfa The dfa @param D The DFA state to add @return The state stored in the DFA. This will be either the existing state if {@code D} is already in the DFA, or {@code D} itself if the state was not already present.
When using this prediction mode, the parser will either return a correct
parse tree (i.e. the same parse tree that would be returned with the
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
When using this prediction mode, the parser will make correct decisions for all syntactically-correct grammar and input combinations. However, in cases where the grammar is truly ambiguous this prediction mode might not report a precise answer for exactly which alternatives are ambiguous.
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
This prediction mode may be used for diagnosing ambiguities during grammar development. Due to the performance overhead of calculating sets of ambiguous alternatives, this prediction mode should be avoided when the exact results are not necessary.
This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.
This method computes the SLL prediction termination condition for both of the following cases.
COMBINED SLL+LL PARSING
When LL-fallback is enabled upon SLL conflict, correct predictions are ensured regardless of how the termination condition is computed by this method. Due to the substantially higher cost of LL prediction, the prediction should only fall back to LL when the additional lookahead cannot lead to a unique SLL prediction.
Assuming combined SLL+LL parsing, an SLL configuration set with only
conflicting subsets should fall back to full LL, even if the
configuration sets don't resolve to the same alternative (e.g.
Here's the prediction termination rule them: SLL (for SLL+LL parsing) stops when it sees only conflicting configuration subsets. In contrast, full LL keeps going when there is uncertainty.
HEURISTIC
As a heuristic, we stop prediction when we see any conflicting subset unless we see a state that only has one alternative associated with it. The single-alt-state thing lets prediction continue upon rules like (otherwise, it would admit defeat too soon):
When the ATN simulation reaches the state before
It also let's us continue for this rule:
After matching input A, we reach the stop state for rule A, state 1. State 8 is the state right before B. Clearly alternatives 1 and 2 conflict and no amount of further lookahead will separate the two. However, alternative 3 will be able to continue and so we do not stop working on this state. In the previous example, we're concerned with states associated with the conflicting alternatives. Here alt 3 is not associated with the conflicting configs, but since we can continue looking for input reasonably, don't declare the state done.
PURE SLL PARSING
To handle pure SLL parsing, all we have to do is make sure that we combine stack contexts for configurations that differ only by semantic predicate. From there, we can do the usual SLL termination heuristic.
PREDICATES IN SLL+LL PARSING
SLL decisions don't evaluate predicates until after they reach DFA stop states because they need to create the DFA cache that works in all semantic situations. In contrast, full LL evaluates predicates collected during start state computation so it can ignore predicates thereafter. This means that SLL termination detection can totally ignore semantic predicates.
Implementation-wise,
Before testing these configurations against others, we have to merge
If the configuration set has predicates (as indicated by
Can we stop looking ahead during ATN simulation or is there some uncertainty as to which alternative we will ultimately pick, after consuming more input? Even if there are partial conflicts, we might know that everything is going to resolve to the same minimum alternative. That means we can stop since no more lookahead will change that fact. On the other hand, there might be multiple conflicts that resolve to different minimums. That means we need more look ahead to decide which of those alternatives we should predict.
The basic idea is to split the set of configurations
map[c] U= c.
getAlt()
# map hash/equals uses s and x, not
alt and not pred
The values in
If
Reduce the subsets to singletons by choosing a minimum of each subset. If the union of these alternative subsets is a singleton, then no amount of more lookahead will help us. We will always pick that alternative. If, however, there is more than one alternative, then we are uncertain which alternative to predict and must continue looking for resolution. We may or may not discover an ambiguity in the future, even if there are no conflicting subsets this round.
The biggest sin is to terminate early because it means we've made a decision but were uncertain as to the eventual outcome. We haven't used enough lookahead. On the other hand, announcing a conflict too late is no big deal; you will still have the conflict. It's just inefficient. It might even look until the end of file.
No special consideration for semantic predicates is required because predicates are evaluated on-the-fly for full LL prediction, ensuring that no configuration contains a semantic context during the termination check.
CONFLICTING CONFIGS
Two configurations
For simplicity, I'm doing a equality check between
CONTINUE/STOP RULE
Continue if union of resolved alternative sets from non-conflicting and conflicting alternative subsets has more than one alternative. We are uncertain about which alternative to predict.
The complete set of alternatives,
CASES
EXACT AMBIGUITY DETECTION
If all states report the same conflicting set of alternatives, then we know we have the exact ambiguity set.
|A_i|>1 and
A_i = A_j for all i, j.
In other words, we continue examining lookahead until all
map[c] U= c.
getAlt()
# map hash/equals uses s and x, not
alt and not pred
map[c.
] U= c.
This is a computed property that is calculated during ATN deserialization
and stored for use in
This is a one way link. It emanates from a state (usually via a list of transitions) and has a target state.
Since we never have to change the ATN transitions once we construct it, we can fix these transitions as specific classes. The DFA transitions on the other hand need to update the labels as it adds transitions to the states. We'll use the term Edge for the DFA to distinguish them from ATN transitions.
The default implementation returns
This error strategy is useful in the following scenarios.
This token stream ignores the value of
This field is set to -1 when the stream is first constructed or when
For example,
If
These properties share a field to reduce the memory footprint of
If
This token factory does not explicitly copy token text when constructing tokens.
The default value is
When
The
This token stream provides access to all tokens by index or when calling
methods like
By default, tokens are placed on the default channel
(
Note: lexer rules which use the
The default value is
This implementation prints messages to
line line:charPositionInLine msg
The default implementation simply calls
The default implementation simply calls
The default implementation returns immediately if the handler is already
in error recovery mode. Otherwise, it calls
The default implementation resynchronizes the parser by consuming tokens until we find one in the resynchronization set--loosely the set of tokens that can follow the current rule.
Implements Jim Idle's magic sync mechanism in closures and optional subrules. E.g.,
a : sync ( stuff sync )* ;
sync : {consume to what can follow sync} ;
At the start of a sub rule upon error,
If the sub rule is optional (
During loop iteration, it consumes until it sees a token that can start a sub rule or what follows loop. Yes, that is pretty aggressive. We opt to stay in the loop as long as possible.
ORIGINS
Previous versions of ANTLR did a poor job of their recovery within loops. A single mismatch token or missing token would force the parser to bail out of the entire rules surrounding the loop. So, for rule
classDef : 'class' ID '{' member* '}'
input with an extra token between members would force the parser to
consume until it found the next class definition rather than the next
member definition of the current class.
This functionality cost a little bit of effort because the parser has to compare token set at the start of the loop and at each iteration. If for some reason speed is suffering for you, you can turn off this functionality by simply overriding this method as a blank { }.
This method is called when
The default implementation simply returns if the handler is already in
error recovery mode. Otherwise, it calls
This method is called when
The default implementation simply returns if the handler is already in
error recovery mode. Otherwise, it calls
The default implementation attempts to recover from the mismatched input
by using single token insertion and deletion as described below. If the
recovery attempt fails, this method throws an
EXTRA TOKEN (single token deletion)
This recovery strategy is implemented by
MISSING TOKEN (single token insertion)
If current token (at
This recovery strategy is implemented by
EXAMPLE
For example, Input
stat → expr → atom
and it will be trying to match the
=> ID '=' '(' INT ')' ('+' atom)* ';'
^
The attempt to match
This method determines whether or not single-token insertion is viable by
checking if the
If the single-token deletion is successful, this method calls
I use a set of ATNConfig objects not simple states. An ATNConfig is both a state (ala normal conversion) and a RuleContext describing the chain of rules (if any) followed to arrive at that state.
A DFA state may have multiple references to a particular state, but with different ATN contexts (with same or different alts) meaning that state was reached via a different set of rule invocations.
We only use these for non-{@link #requiresFullContext} but conflicting states. That means we know from the context (it's $ or we don't dip into outer context) that it's an ambiguity not a conflict.
This list is computed by {@link ParserATNSimulator#predicateDFAState}.
Because the number of alternatives and number of ATN configurations are finite, there is a finite number of DFA states that can be processed. This is necessary to show that the algorithm terminates.
Cannot test the DFA state numbers here because in {@link ParserATNSimulator#addDFAState} we need to know if any other state exists that has this exact set of ATN configurations. The {@link #stateNumber} is irrelevant.
The
TODO: what to do about lexers
Note that the calling code will not report an error if this method
returns successfully. The error strategy implementation is responsible
for calling
The generated code currently contains calls to
For an implementation based on Jim Idle's "magic sync" mechanism, see
Initializing Methods: Some methods in this interface have unspecified behavior if no call to an initializing method has occurred after the stream was constructed. The following is a list of initializing methods:
This method is guaranteed to succeed if any of the following are true:
If
The return value is unspecified if
The returned mark is an opaque handle (type
The behavior of this method is unspecified if no call to an
This method does not change the current position in the input stream.
The following example shows the use of
IntStream stream = ...;
int index = -1;
int mark = stream.mark();
try {
index = stream.index();
// perform work here...
} finally {
if (index != -1) {
stream.seek(index);
}
stream.release(mark);
}
For more information and an example, see
The behavior of this method is unspecified if no call to an
Each full-context prediction which does not result in a syntax error
will call either
When
When
When the
If one or more configurations in
Each full-context prediction which does not result in a syntax error
will call either
For prediction implementations that only evaluate full-context
predictions when an SLL conflict is found (including the default
Note that the definition of "context sensitivity" in this method
differs from the concept in
The non-negative numbers less than
Errors from the lexer are never passed to the parser. Either you want to keep
going or you do not upon token recognition error. If you do not want to
continue lexing then you do not want to continue parsing. Just throw an
exception not under
The preconditions for this method are the same as the preconditions of
The symbol referred to by
TokenStream stream = ...;
String text = "";
for (int i = interval.a; i <= interval.b; i++) {
text += stream.get(i).getText();
}
TokenStream stream = ...;
String text = stream.getText(new Interval(0, stream.size()));
If
TokenStream stream = ...;
String text = stream.getText(ctx.getSourceInterval());
If the specified
For streams which ensure that the
TokenStream stream = ...;
String text = "";
for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) {
text += stream.get(i).getText();
}
The following table shows examples of lexer rules and the literal names assigned to the corresponding token types.
| Rule | Literal Name | Java String Literal |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
n/a |
|
This method supports token types defined by any of the following methods:
The following table shows examples of lexer rules and the literal names assigned to the corresponding token types.
| Rule | Symbolic Name |
|---|---|
|
|
|
|
|
|
|
|
|
ANTLR provides a default implementation of this method, but
applications are free to override the behavior in any manner which makes
sense for the application. The default implementation returns the first
result from the following list which produces a non-
If the final token in the list is an
This method is similar to
This class is able to represent sets containing any combination of values in
the range
If the symbol type does not match,
If the symbol type does not match,
Note that if we are not building parse trees, rule contexts only point upwards. When a rule exits, it returns the context but that gets garbage collected if nobody holds a reference. It points upwards but nobody points at it.
When we build parse trees, we are adding all of these contexts to
To support output-preserving grammar transformations (including but not
limited to left-recursion removal, automated left-factoring, and
optimized code generation), calls to listener methods during the parse
may differ substantially from calls made by
With the following specific exceptions, calls to listener events are deterministic, i.e. for identical input the calls to listener methods will be the same.
If
ParseTree t = parser.expr();
ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr);
ParseTreeMatch m = p.match(t);
String id = m.get("ID");
E.g., given the following input with
A B
^
If the parser is not in error recovery mode, the consumed symbol is added
to the parse tree using
return getExpectedTokens().contains(symbol);
If the state number is not known, this method returns -1.
If the set of expected tokens is not known and could not be computed,
this method returns
If the context is not available, this method returns
If the input stream is not available, this method returns
If the recognizer is not available, this method returns
Used for XPath and tree pattern compilation.
Used for XPath and tree pattern compilation.
For interpreters, we don't know their serialized ATN despite having created the interpreter from it.
You can insert stuff, replace, and delete chunks. Note that the operations
are done lazily--only if you convert the buffer to a
This rewriter makes no modifications to the token stream. It does not ask the
stream to fill itself up nor does it advance the input cursor. The token
stream
The rewriter only works on tokens that you have in the buffer and ignores the
current input cursor. If you are buffering tokens on-demand, calling
Since the operations are done lazily at
Because operations never actually alter the buffer, you may always get the original token stream back without undoing anything. Since the instructions are queued up, you can easily simulate transactions and roll back any changes if there is an error just by removing instructions. For example,
CharStream input = new ANTLRFileStream("input");
TLexer lex = new TLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lex);
T parser = new T(tokens);
TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens);
parser.startRule();
Then in the rules, you can execute (assuming rewriter is visible):
Token t,u;
...
rewriter.insertAfter(t, "text to put after t");}
rewriter.insertAfter(u, "text after u");}
System.out.println(tokens.toString());
You can also have multiple "instruction streams" and get multiple rewrites from a single pass over the input. Just name the instruction streams and use that name again when printing the buffer. This could be useful for generating a C file and also its header file--all from the same buffer:
tokens.insertAfter("pass1", t, "text to put after t");}
tokens.insertAfter("pass2", u, "text after u");}
System.out.println(tokens.toString("pass1"));
System.out.println(tokens.toString("pass2"));
If you don't use named rewrite streams, a "default" stream is used as the first example shows.
The default implementation calls
The default implementation initializes the aggregate result to
The default implementation is not safe for use in visitors that modify the tree structure. Visitors that modify the tree should override this method to behave properly in respect to the specific algorithm in use.
The default implementation returns the result of
The default implementation returns the result of
The base implementation returns
The default implementation returns
The default implementation always returns
The payload is either a
If source interval is unknown, this returns
ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>();
values.put(tree, 36);
int x = values.get(tree);
values.removeFrom(tree);
You would make one decl (values here) in the listener and use lots of times
in your event methods.
The method
For example, for pattern
Pattern tags like
If the
The map includes special entries corresponding to the names of rules and
tokens referenced in tags in the original pattern. For additional
information, see the description of
Patterns are strings of source input text with special tags representing token or rule references such as:
Given a pattern start rule such as
Pattern
The
For efficiency, you can compile a tree pattern in string form to a
See
The lexer and parser that you pass into the
Normally a parser does not accept token
Delimiters are
Rule tag tokens are always placed on the
This method returns the rule tag formatted with
Rule tag tokens have types assigned according to the rule bypass transitions created during ATN deserialization.
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
The implementation for
Split path into words and separators
The basic interface is
p = new
XPath
(parser, pathString);
return p.
evaluate
(tree);
See
and path elements:
Whitespace is not allowed.
This is not the buffer capacity, that's
The
The specific marker value used for this class allows for some level of
protection against misuse where
This is not the buffer capacity, that's
The
This value is used to set the token indexes if the stream provides tokens
that implement
The specific marker value used for this class allows for some level of
protection against misuse where
No literal or symbol names are assigned to token types, so