Antlr4.Runtime.Standard

This is an that is loaded from a file all at once when you construct the object.

How many characters are actually in the buffer

0..n-1 index into string of next char

What is name or source of this char stream?

Reset the stream so that it's in the same state it was when the object was created *except* the data array is not touched.

Return the current input symbol index 0..n where n indicates the last symbol has been read.

Return the current input symbol index 0..n where n indicates the last symbol has been read. The index is the index of char to be returned from LA(1).

mark/release do nothing; we have entire buffer

consume() ahead until p==index; can't just set p=index as we must update line and charPositionInLine.

consume() ahead until p==index; can't just set p=index as we must update line and charPositionInLine. If we seek backwards, just set p

Vacuum all input from a / and then treat it like a char[] buffer. Can also pass in a or char[] to use.

If you need encoding, pass in stream/reader with correct encoding.

The data being scanned

Copy data in string to a local char array

This is the preferred constructor for strings as no data is copied

Alternative to which treats the input as a series of Unicode code points, instead of a series of UTF-16 code units. Use this if you need to parse input which potentially contains Unicode values > U+FFFF.

Sam Harwell

This class represents profiling event information for an ambiguity.

This class represents profiling event information for an ambiguity. Ambiguities are decisions where a particular input resulted in an SLL conflict, followed by LL prediction also reaching a conflict state (indicating a true ambiguity in the grammar).

This event may be reported during SLL prediction in cases where the conflicting SLL configuration set provides sufficient information to determine that the SLL conflict is truly an ambiguity. For example, if none of the ATN configurations in the conflicting SLL configuration set have traversed a global follow transition (i.e. is for all configurations), then the result of SLL prediction for that input is known to be equivalent to the result of LL prediction for that input.

In some cases, the minimum represented alternative in the conflicting LL configuration set is not equal to the minimum represented alternative in the conflicting SLL configuration set. Grammars and inputs which result in this scenario are unable to use , which in turn means they cannot use the two-stage parsing strategy to improve parsing performance for that input.

4.3 The set of alternative numbers for this decision event that lead to a valid parse.

Constructs a new instance of the class with the specified detailed ambiguity information.

The decision number The final configuration set identifying the ambiguous alternatives for the current input The set of alternatives in the decision that lead to a valid parse. The predicted alt is the min(ambigAlts) The input token stream The start index for the current prediction The index at which the ambiguity was identified during prediction @code true} if the ambiguity was identified during LL prediction; otherwise, {@code false} if the ambiguity was identified during SLL prediction Parent can be null only if full ctx mode and we make an array from {@link #EMPTY} and non-empty. We merge {@link #EMPTY} by using null parent and returnState == {@link #EMPTY_RETURN_STATE}. Sorted for merge, no duplicates; if present, {@link #EMPTY_RETURN_STATE} is always last.

Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them.

Each subrule/rule is a decision point and we must track them so we can go back later and build DFA predictors for them. This includes all the rules, subrules, optional blocks, ()+, ()* etc...

Maps from rule index to starting state number.

Maps from rule index to stop state number.

The type of the ATN.

The maximum value for any symbol recognized by a transition in the ATN.

For lexer ATNs, this maps the rule index to the resulting token type.

For lexer ATNs, this maps the rule index to the resulting token type. For parser ATNs, this maps the rule index to the generated bypass token type if the deserialization option was specified; otherwise, this is .

For lexer ATNs, this is an array of objects which may be referenced by action transitions in the ATN.

Used for runtime deserialization of ATNs from strings

Compute the set of valid tokens that can occur starting in state . If is , the set of tokens will not include what can follow the rule surrounding . In other words, the set will be restricted to tokens reachable staying within 's rule.

Compute the set of valid tokens that can occur starting in and staying in same rule. is in set if we reach end of rule.

Computes the set of input symbols which could follow ATN state number in the specified full . This method considers the complete parser context, but does not evaluate semantic predicates (i.e. all predicates encountered during the calculation are assumed true). If a path in the ATN exists from the starting state to the of the outermost context without matching any symbols, is added to the returned set.

If is , it is treated as .

the ATN state number the full parse context The set of potentially valid input symbols which could follow the specified state in the specified context. if the ATN does not contain a state with number A tuple: (ATN state, predicted alt, syntactic, semantic context). The syntactic context is a graph-structured stack node whose path(s) to the root is the rule invocation(s) chain used to arrive at the state. The semantic context is the tree of semantic predicates encountered before reaching an ATN state. This field stores the bit mask for implementing the {@link #isPrecedenceFilterSuppressed} property as a bit within the existing {@link #reachesIntoOuterContext} field. The ATN state associated with this configuration What alt (or lexer rule) is predicted by this configuration The stack of invoking states leading to the rule/states associated with this config. We track only those contexts pushed during execution of the ATN simulator. We cannot execute predicates dependent upon local context unless we know for sure we are in the correct context. Because there is no way to do this efficiently, we simply cannot evaluate dependent predicates unless we are in the rule that initially invokes the ATN simulator.

closure() tracks the depth of how far we dip into the outer context: depth > 0. Note that it may not be totally accurate depth since I don't ever decrement. TODO: make it a boolean then

For memory efficiency, the {@link #isPrecedenceFilterSuppressed} method is also backed by this field. Since the field is publicly accessible, the highest bit which would not cause the value to become negative is used to store this field. This choice minimizes the risk that code which only compares this value to 0 would be affected by the new purpose of the flag. It also ensures the performance of the existing {@link ATNConfig} constructors as well as certain operations like {@link ATNConfigSet#add(ATNConfig, DoubleKeyMap)} method are completely unaffected by the change.

This method gets the value of the {@link #reachesIntoOuterContext} field as it existed prior to the introduction of the {@link #isPrecedenceFilterSuppressed} method. An ATN configuration is equal to another if both have the same state, they predict the same alternative, and syntactic/semantic contexts are the same. Indicates that the set of configurations is read-only. Do not allow any code to manipulate the set; DFA states will point at the sets and they must not change. This does not protect the other fields; in particular, conflictingAlts is set after we've made this readonly. All configs but hashed by (s, i, _, pi) not including context. Wiped out when we go readonly as this set becomes a DFA state. Track the elements as they are added to the set; supports get(i) Currently this is only used when we detect SLL conflict; this does not necessarily represent the ambiguous alternatives. In fact, I should also point out that this seems to include predicated alternatives that have predicates that evaluate to false. Computed in computeTargetState(). Indicates that this configuration set is part of a full context LL prediction. It will be used to determine how to merge $. With SLL it's a wildcard whereas it is not for LL context merge. Adding a new config means merging contexts with existing configs for {@code (s, i, pi, _)}, where {@code s} is the {@link ATNConfig#state}, {@code i} is the {@link ATNConfig#alt}, and {@code pi} is the {@link ATNConfig#semanticContext}. We use {@code (s,i,pi)} as key.

This method updates {@link #dipsIntoOuterContext} and {@link #hasSemanticContext} when necessary.

Return a List holding list of configs Gets the complete set of represented alternatives for the configuration set. @return the set of represented alternatives in this configuration set @since 4.3 The reason that we need this is because we don't want the hash map to use the standard hash code and equals. We need all configurations with the same {@code (s,i,_,semctx)} to be equal. Unfortunately, this key effectively doubles the number of objects associated with ATNConfigs. The other solution is to use a hash table that lets us specify the equals/hashcode operation. Sam Harwell Sam Harwell

Analyze the states in the specified ATN to set the field to the correct value.

The ATN. Must distinguish between missing edge and edge we know leads nowhere The context cache maps all PredictionContext objects that are equals() to a single cached copy. This cache is shared across all contexts in all ATNConfigs in all DFA states. We rebuild each ATNConfigSet to use only cached nodes/graphs in addDFAState(). We don't want to fill this during closure() since there are lots of contexts that pop up but are not used ever again. It also greatly slows down closure().

This cache makes a huge difference in memory and a little bit in speed. For the Java grammar on java.*, it dropped the memory requirements at the end from 25M to 16M. We don't store any of the full context graphs in the DFA because they are limited to local context only, but apparently there's a lot of repetition there as well. We optimize the config contexts before storing the config set in the DFA states by literally rebuilding them with cached subgraphs only.

I tried a cache for use during closure operations, that was whacked after each adaptivePredict(). It cost a little bit more time I think and doesn't save on the overall footprint so it's not worth the complexity.

Clear the DFA cache used by the current instance. Since the DFA cache may be shared by multiple ATN simulators, this method may affect the performance (but not accuracy) of other parsers which are being used concurrently. @throws UnsupportedOperationException if the current instance does not support clearing the DFA. @since 4.3

Represents the type of recognizer an ATN applies to.

Represents the type of recognizer an ATN applies to. Sam Harwell

TODO: make all transitions sets? no, should remove set edges

The token type or character value; or, signifies special label.

The token type or character value; or, signifies special label. Sam Harwell Sam Harwell

Terminal node of a simple (a|b|c) block.

The start of a regular (...) block.

This class stores information about a configuration conflict.

Sam Harwell

Gets the set of conflicting alternatives for the configuration set.

Gets whether or not the configuration conflict is an exact conflict.

Gets whether or not the configuration conflict is an exact conflict. An exact conflict occurs when the prediction algorithm determines that the represented alternatives for a particular configuration set cannot be further reduced by consuming additional input. After reaching an exact conflict during an SLL prediction, only switch to full-context prediction could reduce the set of viable alternatives. In LL prediction, an exact conflict indicates a true ambiguity in the input.

For the prediction mode, accept states are conflicting but not exact are treated as non-accept states.

This class represents profiling event information for a context sensitivity.

This class represents profiling event information for a context sensitivity. Context sensitivities are decisions where a particular input resulted in an SLL conflict, but LL prediction produced a single unique alternative.

In some cases, the unique alternative identified by LL prediction is not equal to the minimum represented alternative in the conflicting SLL configuration set. Grammars and inputs which result in this scenario are unable to use , which in turn means they cannot use the two-stage parsing strategy to improve parsing performance for that input.

4.3

Constructs a new instance of the class with the specified detailed context sensitivity information.

The decision number The final configuration set identifying the ambiguous alternatives for the current input The input token stream The start index for the current prediction The index at which the context sensitivity was identified during full-context prediction

This is the base class for gathering detailed information about prediction events which occur during parsing.

This is the base class for gathering detailed information about prediction events which occur during parsing. 4.3

The invoked decision number which this event is related to.

The configuration set containing additional information relevant to the prediction state when the current event occurred, or {@code null} if no additional information is relevant or available.

The input token stream which is being parsed.

The token index in the input stream at which the current prediction was originally invoked.

The token index in the input stream at which the current event occurred.

if the current event occurred during LL prediction; otherwise, if the input occurred during SLL prediction.

This class contains profiling gathered for a particular decision.

Parsing performance in ANTLR 4 is heavily influenced by both static factors (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the choice of input and the state of the DFA cache at the time profiling operations are started). For best results, gather and use aggregate statistics from a large sample of inputs representing the inputs expected in production before using the results to make changes in the grammar.

4.3 The decision number, which is an index into {@link ATN#decisionToState}. The total number of times {@link ParserATNSimulator#adaptivePredict} was invoked for this decision. The total time spent in {@link ParserATNSimulator#adaptivePredict} for this decision, in nanoseconds.

The value of this field contains the sum of differential results obtained by {@link System#nanoTime()}, and is not adjusted to compensate for JIT and/or garbage collection overhead. For best accuracy, use a modern JVM implementation that provides precise results from {@link System#nanoTime()}, and perform profiling in a separate process which is warmed up by parsing the input prior to profiling. If desired, call {@link ATNSimulator#clearDFA} to reset the DFA cache to its initial state before starting the profiling measurement pass.

The sum of the lookahead required for SLL prediction for this decision. Note that SLL prediction is used before LL prediction for performance reasons even when {@link PredictionMode#LL} or {@link PredictionMode#LL_EXACT_AMBIG_DETECTION} is used. Gets the minimum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the maximum lookahead required for any single SLL prediction to complete for this decision, by reaching a unique prediction, reaching an SLL conflict state, or encountering a syntax error. Gets the {@link LookaheadEventInfo} associated with the event where the {@link #SLL_MaxLook} value was set. The sum of the lookahead required for LL prediction for this decision. Note that LL prediction is only used when SLL prediction reaches a conflict state. Gets the minimum lookahead required for any single LL prediction to complete for this decision. An LL prediction completes when the algorithm reaches a unique prediction, a conflict state (for {@link PredictionMode#LL}, an ambiguity state (for {@link PredictionMode#LL_EXACT_AMBIG_DETECTION}, or a syntax error. Gets the maximum lookahead required for any single LL prediction to complete for this decision. An LL prediction completes when the algorithm reaches a unique prediction, a conflict state (for {@link PredictionMode#LL}, an ambiguity state (for {@link PredictionMode#LL_EXACT_AMBIG_DETECTION}, or a syntax error. Gets the {@link LookaheadEventInfo} associated with the event where the {@link #LL_MaxLook} value was set. A collection of {@link ContextSensitivityInfo} instances describing the context sensitivities encountered during LL prediction for this decision. @see ContextSensitivityInfo A collection of {@link ErrorInfo} instances describing the parse errors identified during calls to {@link ParserATNSimulator#adaptivePredict} for this decision. @see ErrorInfo A collection of {@link AmbiguityInfo} instances describing the ambiguities encountered during LL prediction for this decision. @see AmbiguityInfo A collection of {@link PredicateEvalInfo} instances describing the results of evaluating individual predicates during prediction for this decision. @see PredicateEvalInfo The total number of ATN transitions required during SLL prediction for this decision. An ATN transition is determined by the number of times the DFA does not contain an edge that is required for prediction, resulting in on-the-fly computation of that edge.

If DFA caching of SLL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the SLL parsing algorithm will use ATN transitions exclusively.

@see #SLL_ATNTransitions @see ParserATNSimulator#computeTargetState @see LexerATNSimulator#computeTargetState The total number of DFA transitions required during SLL prediction for this decision.

If the ATN simulator implementation does not use DFA caching for SLL transitions, this value will be 0.

@see ParserATNSimulator#getExistingTargetState @see LexerATNSimulator#getExistingTargetState Gets the total number of times SLL prediction completed in a conflict state, resulting in fallback to LL prediction.

Note that this value is not related to whether or not {@link PredictionMode#SLL} may be used successfully with a particular grammar. If the ambiguity resolution algorithm applied to the SLL conflicts for this decision produce the same result as LL prediction for this decision, {@link PredictionMode#SLL} would produce the same overall parsing result as {@link PredictionMode#LL}.

The total number of ATN transitions required during LL prediction for this decision. An ATN transition is determined by the number of times the DFA does not contain an edge that is required for prediction, resulting in on-the-fly computation of that edge.

If DFA caching of LL transitions is employed by the implementation, ATN computation may cache the computed edge for efficient lookup during future parsing of this decision. Otherwise, the LL parsing algorithm will use ATN transitions exclusively.

@see #LL_DFATransitions @see ParserATNSimulator#computeTargetState @see LexerATNSimulator#computeTargetState The total number of DFA transitions required during LL prediction for this decision.

If the ATN simulator implementation does not use DFA caching for LL transitions, this value will be 0.

@see ParserATNSimulator#getExistingTargetState @see LexerATNSimulator#getExistingTargetState Constructs a new instance of the {@link DecisionInfo} class to contain statistics for a particular decision. @param decision The decision number the rule index of a precedence rule for which this transition is returning from, where the precedence value is 0; otherwise, -1. 4.4.1

This class represents profiling event information for a syntax error identified during prediction.

This class represents profiling event information for a syntax error identified during prediction. Syntax errors occur when the prediction algorithm is unable to identify an alternative which would lead to a successful parse. 4.3

Constructs a new instance of the class with the specified detailed syntax error information.

The decision number The final configuration set reached during prediction prior to reaching the {@link ATNSimulator#ERROR} state The input token stream The start index for the current prediction The index at which the syntax error was identified {@code true} if the syntax error was identified during LL prediction; otherwise, {@code false} if the syntax error was identified during SLL prediction

Represents a single action which can be executed following the successful match of a lexer rule.

Represents a single action which can be executed following the successful match of a lexer rule. Lexer actions are used for both embedded action syntax and ANTLR 4's new lexer command syntax. Sam Harwell 4.2

Gets the serialization type of the lexer action.

Gets the serialization type of the lexer action. The serialization type of the lexer action.

Gets whether the lexer action is position-dependent.

Gets whether the lexer action is position-dependent. Position-dependent actions may have different semantics depending on the index at the time the action is executed.

Many lexer commands, including type , skip , and more , do not check the input index during their execution. Actions like this are position-independent, and may be stored more efficiently as part of the .

if the lexer action semantics can be affected by the position of the input at the time it is executed; otherwise, .

Execute the lexer action in the context of the specified .

For position-dependent actions, the input stream must already be positioned correctly prior to calling this method.

The lexer instance.

Represents an executor for a sequence of lexer actions which traversed during the matching operation of a lexer rule (token).

The executor tracks position information for position-dependent lexer actions efficiently, ensuring that actions appearing only at the end of the rule do not cause bloating of the created for the lexer.

Sam Harwell 4.2

Caches the result of since the hash code is an element of the performance-critical operation.

Constructs an executor for a sequence of actions.

The lexer actions to execute.

Creates a which executes the actions for the input followed by a specified .

The executor for actions already traversed by the lexer while matching a token within a particular . If this is , the method behaves as though it were an empty executor. The lexer action to execute after the actions specified in . A for executing the combine actions of and .

Creates a which encodes the current offset for position-dependent lexer actions.

Normally, when the executor encounters lexer actions where returns , it calls on the input to set the input position to the end of the current token. This behavior provides for efficient DFA representation of lexer actions which appear at the end of a lexer rule, even when the lexer rule matches a variable number of characters.

Prior to traversing a match transition in the ATN, the current offset from the token start index is assigned to all position-dependent lexer actions which have not already been assigned a fixed offset. By storing the offsets relative to the token start index, the DFA representation of lexer actions which appear in the middle of tokens remains efficient due to sharing among tokens of the same length, regardless of their absolute position in the input stream.

If the current executor already has offsets assigned to all position-dependent lexer actions, the method returns this .

The current offset to assign to all position-dependent lexer actions which do not already have offsets assigned. A which stores input stream offsets for all position-dependent lexer actions.

Gets the lexer actions to be executed by this executor.

Gets the lexer actions to be executed by this executor. The lexer actions to be executed by this executor.

Execute the actions encapsulated by this executor within the context of a particular .

This method calls to set the position of the prior to calling on a position-dependent action. Before the method returns, the input position will be restored to the same position it was in when the method was invoked.

The lexer instance. The input stream which is the source for the current token. When this method is called, the current for should be the start of the following token, i.e. 1 character past the end of the current token. The token start index. This value may be passed to to set the position to the beginning of the token. Sam Harwell 4.2 This is the backing field for {@link #getLexerActionExecutor}. Gets the {@link LexerActionExecutor} capable of executing the embedded action(s) for the current configuration.

"dup" of ParserInterpreter

The current token's starting index into the character stream. Shared across DFA to ATN simulation in case the ATN fails and the DFA did not have a previous accept state. In this case, we use the ATN-generated exception object. line number 1..n within the input The index of the character relative to the beginning of the line 0..n-1 Used during DFA/ATN exec to record the most recent accept configuration info Get an existing target state for an edge in the DFA. If the target state for the edge has not yet been computed or is otherwise not available, this method returns {@code null}. @param s The current DFA state @param t The next input symbol @return The existing target DFA state for the given input symbol {@code t}, or {@code null} if the target state for this edge is not already cached Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. @param input The input stream @param s The current DFA state @param t The next input symbol @return The computed target DFA state for the given input symbol {@code t}. If {@code t} does not lead to a valid DFA state, this method returns {@link #ERROR}. Given a starting configuration set, figure out all ATN configurations we can reach upon input {@code t}. Parameter {@code reach} is a return parameter. Since the alternatives within any lexer decision are ordered by preference, this method stops pursuing the closure as soon as an accept state is reached. After the first accept state is reached by depth-first search from {@code config}, all other (potentially reachable) states for this rule would have a lower priority. @return {@code true} if an accept state is reached, otherwise {@code false}. Evaluate a predicate specified in the lexer.

If {@code speculative} is {@code true}, this method was called before {@link #consume} for the matched character. This method should call {@link #consume} before evaluating the predicate to ensure position sensitive values, including {@link Lexer#getText}, {@link Lexer#getLine}, and {@link Lexer#getCharPositionInLine}, properly reflect the current lexer state. This method should restore {@code input} and the simulator to the original state before returning (i.e. undo the actions made by the call to {@link #consume}.

@param input The input stream. @param ruleIndex The rule containing the predicate. @param predIndex The index of the predicate within the rule. @param speculative {@code true} if the current index in {@code input} is one character before the predicate's location. @return {@code true} if the specified predicate evaluates to {@code true}. Add a new DFA state if there isn't one with this set of configurations already. This method also detects the first configuration containing an ATN rule stop state. Later, when traversing the DFA, we will know which rule to accept. Get the text matched so far for the current token. When we hit an accept state in either the DFA or the ATN, we have to notify the character stream to start buffering characters via {@link IntStream#mark} and record the current state. The current sim state includes the current index into the input, the current line, and current character position in that line. Note that the Lexer is tracking the starting line and characterization of the token. These variables track the "state" of the simulator when it hits an accept state.

We track these variables separately for the DFA and ATN simulation because the DFA simulation often has to fail over to the ATN simulation. If the ATN simulation fails, we need the DFA to fall back to its previously accepted state, if any. If the ATN succeeds, then the ATN does the accept and the DFA simulator that invoked it can simply return the predicted token type.

Implements the channel lexer action by calling with the assigned channel.

Sam Harwell 4.2

Constructs a new action with the specified channel value.

The channel value to pass to .

Gets the channel to use for the created by the lexer.

The channel to use for the created by the lexer.

This method returns .

This action is implemented by calling with the value provided by .

Executes a custom lexer action by calling with the rule and action indexes assigned to the custom action. The implementation of a custom action is added to the generated code for the lexer in an override of when the grammar is compiled.

This class may represent embedded actions created with the {...} syntax in ANTLR 4, as well as actions created for lexer commands where the command argument could not be evaluated when the grammar was compiled.

Sam Harwell 4.2

Constructs a custom lexer action with the specified rule and action indexes.

Constructs a custom lexer action with the specified rule and action indexes. The rule index to use for calls to . The action index to use for calls to .

Gets the rule index to use for calls to .

The rule index for the custom action.

Gets the action index to use for calls to .

The action index for the custom action.

This method returns .

Gets whether the lexer action is position-dependent.

Gets whether the lexer action is position-dependent. Position-dependent actions may have different semantics depending on the index at the time the action is executed.

Custom actions are position-dependent since they may represent a user-defined embedded action which makes calls to methods like .

This method returns .

Custom actions are implemented by calling with the appropriate rule and action indexes.

This implementation of is used for tracking input offsets for position-dependent actions within a .

This action is not serialized as part of the ATN, and is only required for position-dependent lexer actions which appear at a location other than the end of a rule. For more information about DFA optimizations employed for lexer actions, see and .

Sam Harwell 4.2

Constructs a new indexed custom action by associating a character offset with a .

Note: This class is only required for lexer actions for which returns .

The offset into the input , relative to the token start index, at which the specified lexer action should be executed. The lexer action to execute at a particular offset in the input .

Gets the location in the input at which the lexer action should be executed. The value is interpreted as an offset relative to the token start index.

The location in the input at which the lexer action should be executed.

Gets the lexer action to execute.

Gets the lexer action to execute. A object which executes the lexer action.

This method returns the result of calling on the returned by .

This method returns .

This method calls on the result of using the provided .

Implements the mode lexer action by calling with the assigned mode.

Sam Harwell 4.2

Constructs a new action with the specified mode value.

The mode value to pass to .

Get the lexer mode this action should transition the lexer to.

Get the lexer mode this action should transition the lexer to. The lexer mode for this mode command.

This method returns .

This action is implemented by calling with the value provided by .

Implements the more lexer action by calling .

The more command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2

Provides a singleton instance of this parameterless lexer action.

Constructs the singleton instance of the lexer more command.

This method returns .

This action is implemented by calling .

Implements the popMode lexer action by calling .

The popMode command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2

Provides a singleton instance of this parameterless lexer action.

Constructs the singleton instance of the lexer popMode command.

This method returns .

This action is implemented by calling .

Implements the pushMode lexer action by calling with the assigned mode.

Sam Harwell 4.2

Constructs a new pushMode action with the specified mode value.

The mode value to pass to .

Get the lexer mode this action should transition the lexer to.

Get the lexer mode this action should transition the lexer to. The lexer mode for this pushMode command.

This method returns .

This action is implemented by calling with the value provided by .

Implements the skip lexer action by calling .

The skip command does not have any parameters, so this action is implemented as a singleton instance exposed by .

Sam Harwell 4.2

Provides a singleton instance of this parameterless lexer action.

Constructs the singleton instance of the lexer skip command.

This method returns .

This action is implemented by calling .

Implements the type lexer action by calling with the assigned type.

Sam Harwell 4.2

Constructs a new action with the specified token type value.

The type to assign to the token using .

Gets the type to assign to a token created by the lexer.

Gets the type to assign to a token created by the lexer. The type to assign to a token created by the lexer.

This method returns .

This action is implemented by calling with the value provided by .

Special value added to the lookahead sets to indicate that we hit a predicate during analysis if {@code seeThruPreds==false}. Calculates the SLL(1) expected lookahead set for each outgoing transition of an {@link ATNState}. The returned array has one element for each outgoing transition in {@code s}. If the closure from transition i leads to a semantic predicate before matching a symbol, the element at index i of the result will be {@code null}. @param s the ATN state @return the expected symbols for each outgoing transition of {@code s}. Compute set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}.

If {@code ctx} is {@code null} and the end of the rule containing {@code s} is reached, {@link Token#EPSILON} is added to the result set. If {@code ctx} is not {@code null} and the end of the outermost rule is reached, {@link Token#EOF} is added to the result set.

@param s the ATN state @param ctx the complete parser context, or {@code null} if the context should be ignored @return The set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}. Compute set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}.

@param s the ATN state @param stopState the ATN state to stop at. This can be a {@link BlockEndState} to detect epsilon paths through a closure. @param ctx the complete parser context, or {@code null} if the context should be ignored @return The set of tokens that can follow {@code s} in the ATN in the specified {@code ctx}.

This class represents profiling event information for tracking the lookahead depth required in order to make a prediction.

This class represents profiling event information for tracking the lookahead depth required in order to make a prediction. 4.3

The alternative chosen by adaptivePredict(), not necessarily the outermost alt shown for a rule; left-recursive rules have user-level alts that differ from the rewritten rule with a (...) block and a (..)* loop.

Constructs a new instance of the class with the specified detailed lookahead information.

The decision number The final configuration set containing the necessary information to determine the result of a prediction, or {@code null} if the final configuration set is not available The input token stream The start index for the current prediction The index at which the prediction was finally made if the current lookahead is part of an LL prediction; otherwise, if the current lookahead is part of an SLL prediction

Mark the end of a * or + loop.

Mark the end of a * or + loop. This class provides access to specific and aggregate statistics gathered during profiling of a parser. @since 4.3 Gets an array of {@link DecisionInfo} instances containing the profiling information gathered for each decision in the ATN. @return An array of {@link DecisionInfo} instances, indexed by decision number. Gets the decision numbers for decisions that required one or more full-context predictions during parsing. These are decisions for which {@link DecisionInfo#LL_Fallback} is non-zero. @return A list of decision numbers which required one or more full-context predictions during parsing. Gets the total time spent during prediction across all decisions made during parsing. This value is the sum of {@link DecisionInfo#timeInPrediction} for all decisions. Gets the total number of SLL lookahead operations across all decisions made during parsing. This value is the sum of {@link DecisionInfo#SLL_TotalLook} for all decisions. Gets the total number of LL lookahead operations across all decisions made during parsing. This value is the sum of {@link DecisionInfo#LL_TotalLook} for all decisions. Gets the total number of ATN lookahead operations for SLL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for LL prediction across all decisions made during parsing. Gets the total number of ATN lookahead operations for SLL and LL prediction across all decisions made during parsing.

This value is the sum of {@link #getTotalSLLATNLookaheadOps} and {@link #getTotalLLATNLookaheadOps}.

Gets the total number of DFA states stored in the DFA cache for all decisions in the ATN. Gets the total number of DFA states stored in the DFA cache for a particular decision. The embodiment of the adaptive LL(*), ALL(*), parsing strategy.

The basic complexity of the adaptive strategy makes it harder to understand. We begin with ATN simulation to build paths in a DFA. Subsequent prediction requests go through the DFA first. If they reach a state without an edge for the current symbol, the algorithm fails over to the ATN simulation to complete the DFA path for the current input (until it finds a conflict state or uniquely predicting state).

All of that is done without using the outer context because we want to create a DFA that is not dependent upon the rule invocation stack when we do a prediction. One DFA works in all contexts. We avoid using context not necessarily because it's slower, although it can be, but because of the DFA caching problem. The closure routine only considers the rule invocation stack created during prediction beginning in the decision rule. For example, if prediction occurs without invoking another rule's ATN, there are no context stacks in the configurations. When lack of context leads to a conflict, we don't know if it's an ambiguity or a weakness in the strong LL(*) parsing strategy (versus full LL(*)).

When SLL yields a configuration set with conflict, we rewind the input and retry the ATN simulation, this time using full outer context without adding to the DFA. Configuration context stacks will be the full invocation stacks from the start rule. If we get a conflict using full context, then we can definitively say we have a true ambiguity for that input sequence. If we don't get a conflict, it implies that the decision is sensitive to the outer context. (It is not context-sensitive in the sense of context-sensitive grammars.)

The next time we reach this DFA state with an SLL conflict, through DFA simulation, we will again retry the ATN simulation using full context mode. This is slow because we can't save the results and have to "interpret" the ATN each time we get that input.

CACHING FULL CONTEXT PREDICTIONS

We could cache results from full context to predicted alternative easily and that saves a lot of time but doesn't work in presence of predicates. The set of visible predicates from the ATN start state changes depending on the context, because closure can fall off the end of a rule. I tried to cache tuples (stack context, semantic context, predicted alt) but it was slower than interpreting and much more complicated. Also required a huge amount of memory. The goal is not to create the world's fastest parser anyway. I'd like to keep this algorithm simple. By launching multiple threads, we can improve the speed of parsing across a large number of files.

There is no strict ordering between the amount of input used by SLL vs LL, which makes it really hard to build a cache for full context. Let's say that we have input A B C that leads to an SLL conflict with full context X. That implies that using X we might only use A B but we could also use A B C D to resolve conflict. Input A B C D could predict alternative 1 in one position in the input and A B C E could predict alternative 2 in another position in input. The conflicting SLL configurations could still be non-unique in the full context prediction, which would lead us to requiring more input than the original A B C. To make a prediction cache work, we have to track the exact input used during the previous prediction. That amounts to a cache that maps X to a specific DFA for that context.

Something should be done for left-recursive expression predictions. They are likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry with full LL thing Sam does.

AVOIDING FULL CONTEXT PREDICTION

We avoid doing full context retry when the outer context is empty, we did not dip into the outer context by falling off the end of the decision state rule, or when we force SLL mode.

As an example of the not dip into outer context case, consider as super constructor calls versus function calls. One grammar might look like this:

             ctorBody
               : '{' superCall? stat* '}'
               ;

Or, you might see something like

             stat
               : superCall ';'
               | expression ';'
               | ...
               ;

In both cases I believe that no closure operations will dip into the outer context. In the first case ctorBody in the worst case will stop at the '}'. In the 2nd case it should stop at the ';'. Both cases should stay within the entry rule and not dip into the outer context.

PREDICATES

Predicates are always evaluated if present in either SLL or LL both. SLL and LL simulation deals with predicates differently. SLL collects predicates as it performs closure operations like ANTLR v3 did. It delays predicate evaluation until it reaches and accept state. This allows us to cache the SLL ATN simulation whereas, if we had evaluated predicates on-the-fly during closure, the DFA state configuration sets would be different and we couldn't build up a suitable DFA.

When building a DFA accept state during ATN simulation, we evaluate any predicates and return the sole semantically valid alternative. If there is more than 1 alternative, we report an ambiguity. If there are 0 alternatives, we throw an exception. Alternatives without predicates act like they have true predicates. The simple way to think about it is to strip away all alternatives with false predicates and choose the minimum alternative that remains.

When we start in the DFA and reach an accept state that's predicated, we test those and return the minimum semantically viable alternative. If no alternatives are viable, we throw an exception.

During full LL ATN simulation, closure always evaluates predicates and on-the-fly. This is crucial to reducing the configuration set size during closure. It hits a landmine when parsing with the Java grammar, for example, without this on-the-fly evaluation.

SHARING DFA

All instances of the same parser share the same decision DFAs through a static field. Each instance gets its own ATN simulator but they share the same {@link #decisionToDFA} field. They also share a {@link PredictionContextCache} object that makes sure that all {@link PredictionContext} objects are shared among the DFA states. This makes a big size difference.

THREAD SAFETY

The {@link ParserATNSimulator} locks on the {@link #decisionToDFA} field when it adds a new DFA object to that array. {@link #addDFAEdge} locks on the DFA for the current decision when setting the {@link DFAState#edges} field. {@link #addDFAState} locks on the DFA for the current decision when looking up a DFA state to see if it already exists. We must make sure that all requests to add DFA states that are equivalent result in the same shared DFA object. This is because lots of threads will be trying to update the DFA at once. The {@link #addDFAState} method also locks inside the DFA lock but this time on the shared context cache when it rebuilds the configurations' {@link PredictionContext} objects using cached subgraphs/nodes. No other locking occurs, even during DFA simulation. This is safe as long as we can guarantee that all threads referencing {@code s.edge[t]} get the same physical target {@link DFAState}, or {@code null}. Once into the DFA, the DFA simulation does not reference the {@link DFA#states} map. It follows the {@link DFAState#edges} field to new targets. The DFA simulator will either find {@link DFAState#edges} to be {@code null}, to be non-{@code null} and {@code dfa.edges[t]} null, or {@code dfa.edges[t]} to be non-null. The {@link #addDFAEdge} method could be racing to set the field but in either case the DFA simulator works; if {@code null}, and requests ATN simulation. It could also race trying to get {@code dfa.edges[t]}, but either way it will work because it's not doing a test and set operation.

Starting with SLL then failing to combined SLL/LL (Two-Stage Parsing)

Sam pointed out that if SLL does not give a syntax error, then there is no point in doing full LL, which is slower. We only have to try LL if we get a syntax error. For maximum speed, Sam starts the parser set to pure SLL mode with the {@link BailErrorStrategy}:

             parser.{@link Parser#getInterpreter() getInterpreter()}.{@link #setPredictionMode setPredictionMode}{@code (}{@link PredictionMode#SLL}{@code )};
             parser.{@link Parser#setErrorHandler setErrorHandler}(new {@link BailErrorStrategy}());

If it does not get a syntax error, then we're done. If it does get a syntax error, we need to retry with the combined SLL/LL strategy.

The reason this works is as follows. If there are no SLL conflicts, then the grammar is SLL (at least for that input set). If there is an SLL conflict, the full LL analysis must yield a set of viable alternatives which is a subset of the alternatives reported by SLL. If the LL set is a singleton, then the grammar is LL but not SLL. If the LL set is the same size as the SLL set, the decision is SLL. If the LL set has size > 1, then that decision is truly ambiguous on the current input. If the LL set is smaller, then the SLL conflict resolution might choose an alternative that the full LL would rule out as a possibility based upon better context information. If that's the case, then the SLL parse will definitely get an error because the full LL analysis says it's not viable. If SLL conflict resolution chooses an alternative within the LL set, them both SLL and LL would choose the same alternative because they both choose the minimum of multiple conflicting alternatives.

Let's say we have a set of SLL conflicting alternatives {@code {1, 2, 3}} and a smaller LL set called s. If s is {@code {2, 3}}, then SLL parsing will get an error because SLL will pursue alternative 1. If s is {@code {1, 2}} or {@code {1, 3}} then both SLL and LL will choose the same alternative because alternative one is the minimum of either set. If s is {@code {2}} or {@code {3}} then SLL will get a syntax error. If s is {@code {1}} then SLL will succeed.

Of course, if the input is invalid, then we will get an error for sure in both SLL and LL parsing. Erroneous input will therefore require 2 passes over the input.

SLL, LL, or LL + exact ambig detection? Each prediction operation uses a cache for merge of prediction contexts. Don't keep around as it wastes huge amounts of memory. DoubleKeyMap isn't synchronized but we're ok since two threads shouldn't reuse same parser/atnsim object because it can only handle one input at a time. This maps graphs a and b to merged result c. (a,b)→c. We can avoid the merge if we ever see a and b again. Note that (b,a)→c should also be examined during cache lookup. Testing only! Performs ATN simulation to compute a predicted alternative based * upon the remaining input, but also updates the DFA cache to avoid * having to traverse the ATN again for the same input sequence. There are some key conditions we're looking for after computing a new set of ATN configs (proposed DFA state): * if the set is empty, there is no viable alternative for current symbol * does the state uniquely predict an alternative? * does the state have a conflict that would prevent us from putting it on the work list? We also have some key operations to do: * add an edge from previous DFA state to potentially new DFA state, D, upon current symbol but only if adding to work list, which means in all cases except no viable alternative (and possibly non-greedy decisions?) * collecting predicates and adding semantic context to DFA accept states * adding rule context to context-sensitive DFA accept states * consuming an input symbol * reporting a conflict * reporting an ambiguity * reporting a context sensitivity * reporting insufficient predicates cover these cases: dead end single alt single alt + preds conflict conflict + preds Get an existing target state for an edge in the DFA. If the target state for the edge has not yet been computed or is otherwise not available, this method returns {@code null}. @param previousD The current DFA state @param t The next input symbol @return The existing target DFA state for the given input symbol {@code t}, or {@code null} if the target state for this edge is not already cached Compute a target state for an edge in the DFA, and attempt to add the computed state and corresponding edge to the DFA. @param dfa The DFA @param previousD The current DFA state @param t The next input symbol @return The computed target DFA state for the given input symbol {@code t}. If {@code t} does not lead to a valid DFA state, this method returns {@link #ERROR}. Return a configuration set containing only the configurations from {@code configs} which are in a {@link RuleStopState}. If all configurations in {@code configs} are already in a rule stop state, this method simply returns {@code configs}.

When {@code lookToEndOfRule} is true, this method uses {@link ATN#nextTokens} for each configuration in {@code configs} which is not already in a rule stop state to see if a rule stop state is reachable from the configuration via epsilon-only transitions.

@param configs the configuration set to update @param lookToEndOfRule when true, this method checks for rule stop states reachable by epsilon-only transitions from each configuration in {@code configs}. @return {@code configs} if all configurations in {@code configs} are in a rule stop state, otherwise return a new configuration set containing only the configurations from {@code configs} which are in a rule stop state This method transforms the start state computed by {@link #computeStartState} to the special start state used by a precedence DFA for a particular precedence value. The transformation process applies the following changes to the start state's configuration set.

Evaluate the precedence predicates for each configuration using {@link SemanticContext#evalPrecedence}.
When {@link ATNConfig#isPrecedenceFilterSuppressed} is {@code false}, remove all configurations which predict an alternative greater than 1, for which another configuration that predicts alternative 1 is in the same ATN state with the same prediction context. This transformation is valid for the following reasons:
- The closure block cannot contain any epsilon transitions which bypass the body of the closure, so all states reachable via alternative 1 are part of the precedence alternatives of the transformed left-recursive rule.
- The "primary" portion of a left recursive rule cannot contain an epsilon transition, so the only way an alternative other than 1 can exist in a state that is also reachable via alternative 1 is by nesting calls to the left-recursive rule, with the outer calls not being at the preferred precedence level. The {@link ATNConfig#isPrecedenceFilterSuppressed} property marks ATN configurations which do not meet this condition, and therefore are not eligible for elimination during the filtering process.

The prediction context must be considered by this filter to address situations like the following.


                          grammar TA;
             prog: statement* EOF;
             statement: letterA | statement letterA 'b' ;
             letterA: 'a';

If the above grammar, the ATN state immediately before the token reference {@code 'a'} in {@code letterA} is reachable from the left edge of both the primary and closure blocks of the left-recursive rule {@code statement}. The prediction context associated with each of these configurations distinguishes between them, and prevents the alternative which stepped out to {@code prog} (and then back in to {@code statement} from being eliminated by the filter.

@param configs The configuration set computed by {@link #computeStartState} as the start state for the DFA. @return The transformed configuration set representing the start state for a precedence DFA at a particular precedence level (determined by calling {@link Parser#getPrecedence}). This method is used to improve the localization of error messages by choosing an alternative rather than throwing a {@link NoViableAltException} in particular prediction scenarios where the {@link #ERROR} state was reached during ATN simulation.

The default implementation of this method uses the following algorithm to identify an ATN configuration which successfully parsed the decision entry rule. Choosing such an alternative ensures that the {@link ParserRuleContext} returned by the calling rule will be complete and valid, and the syntax error will be reported later at a more localized location.

If a syntactically valid path or paths reach the end of the decision rule and they are semantically valid if predicated, return the min associated alt.
Else, if a semantically invalid but syntactically valid path exist or paths exist, return the minimum associated alt.
Otherwise, return {@link ATN#INVALID_ALT_NUMBER}.

In some scenarios, the algorithm described above could predict an alternative which will result in a {@link FailedPredicateException} in the parser. Specifically, this could occur if the only configuration capable of successfully parsing to the end of the decision rule is blocked by a semantic predicate. By choosing this alternative within {@link #adaptivePredict} instead of throwing a {@link NoViableAltException}, the resulting {@link FailedPredicateException} in the parser will identify the specific predicate which is preventing the parser from successfully parsing the decision rule, which helps developers identify and correct logic errors in semantic predicates.

@param configs The ATN configurations which were valid immediately before the {@link #ERROR} state was reached @param outerContext The is the \gamma_0 initial parser context from the paper or the parser stack at the instant before prediction commences. @return The value to return from {@link #adaptivePredict}, or {@link ATN#INVALID_ALT_NUMBER} if a suitable alternative was not identified and {@link #adaptivePredict} should report an error instead. Walk the list of configurations and split them according to those that have preds evaluating to true/false. If no pred, assume true pred and include in succeeded set. Returns Pair of sets. Create a new set so as not to alter the incoming parameter. Assumption: the input stream has been restored to the starting point prediction, which is where predicates need to evaluate. Look through a list of predicate/alt pairs, returning alts for the pairs that win. A {@code NONE} predicate indicates an alt containing an unpredicated config which behaves as "always true." If !complete then we stop at the first predicate that evaluates to true. This includes pairs with null predicates. Evaluate a semantic context within a specific parser context.

This method might not be called for every semantic context evaluated during the prediction process. In particular, we currently do not evaluate the following but it may change in the future:

Precedence predicates (represented by {@link SemanticContext.PrecedencePredicate}) are not currently evaluated through this method.
Operator predicates (represented by {@link SemanticContext.AND} and {@link SemanticContext.OR}) are evaluated as a single semantic context, rather than evaluating the operands individually. Implementations which require evaluation results from individual predicates should override this method to explicitly handle evaluation of the operands within operator predicates.

@param pred The semantic context to evaluate @param parserCallStack The parser context in which to evaluate the semantic context @param alt The alternative which is guarded by {@code pred} @param fullCtx {@code true} if the evaluation is occurring during LL prediction; otherwise, {@code false} if the evaluation is occurring during SLL prediction @since 4.3 Do the actual work of walking epsilon edges Implements first-edge (loop entry) elimination as an optimization during closure operations. See antlr/antlr4#1398. The optimization is to avoid adding the loop entry config when the exit path can only lead back to the same StarLoopEntryState after popping context at the rule end state (traversing only epsilon edges, so we're still in closure, in this same rule). We need to detect any state that can reach loop entry on epsilon w/o exiting rule. We don't have to look at FOLLOW links, just ensure that all stack tops for config refer to key states in LR rule. To verify we are in the right situation we must first check closure is at a StarLoopEntryState generated during LR removal. Then we check that each stack top of context is a return state from one of these cases: 1. 'not' expr, '(' type ')' expr. The return state points at loop entry state 2. expr op expr. The return state is the block end of internal block of (...)* 3. 'between' expr 'and' expr. The return state of 2nd expr reference. That state points at block end of internal block of (...)*. 4. expr '?' expr ':' expr. The return state points at block end, which points at loop entry state. If any is true for each stack top, then closure does not add a config to the current config set for edge[0], the loop entry branch. Conditions fail if any context for the current config is: a. empty (we'd fall out of expr to do a global FOLLOW which could even be to some weird spot in expr) or, b. lies outside of expr or, c. lies within expr but at a state not the BlockEndState generated during LR removal Do we need to evaluate predicates ever in closure for this case? No. Predicates, including precedence predicates, are only evaluated when computing a DFA start state. I.e., only before the lookahead (but not parser) consumes a token. There are no epsilon edges allowed in LR rule alt blocks or in the "primary" part (ID here). If closure is in StarLoopEntryState any lookahead operation will have consumed a token as there are no epsilon-paths that lead to StarLoopEntryState. We do not have to evaluate predicates therefore if we are in the generated StarLoopEntryState of a LR rule. Note that when making a prediction starting at that decision point, decision d=2, compute-start-state performs closure starting at edges[0], edges[1] emanating from StarLoopEntryState. That means it is not performing closure on StarLoopEntryState during compute-start-state. How do we know this always gives same prediction answer? Without predicates, loop entry and exit paths are ambiguous upon remaining input +b (in, say, a+b). Either paths lead to valid parses. Closure can lead to consuming + immediately or by falling out of this call to expr back into expr and loop back again to StarLoopEntryState to match +b. In this special case, we choose the more efficient path, which is to take the bypass path. The lookahead language has not changed because closure chooses one path over the other. Both paths lead to consuming the same remaining input during a lookahead operation. If the next token is an operator, lookahead will enter the choice block with operators. If it is not, lookahead will exit expr. Same as if closure had chosen to enter the choice block immediately. Closure is examining one config (some loopentrystate, some alt, context) which means it is considering exactly one alt. Closure always copies the same alt to any derived configs. How do we know this optimization doesn't mess up precedence in our parse trees? Looking through expr from left edge of stat only has to confirm that an input, say, a+b+c; begins with any valid interpretation of an expression. The precedence actually doesn't matter when making a decision in stat seeing through expr. It is only when parsing rule expr that we must use the precedence to get the right interpretation and, hence, parse tree. @since 4.6 Gets a {@link BitSet} containing the alternatives in {@code configs} which are part of one or more conflicting alternative subsets. @param configs The {@link ATNConfigSet} to analyze. @return The alternatives in {@code configs} which are part of one or more conflicting alternative subsets. If {@code configs} does not contain any conflicting subsets, this method returns an empty {@link BitSet}. Sam pointed out a problem with the previous definition, v3, of ambiguous states. If we have another state associated with conflicting alternatives, we should keep going. For example, the following grammar s : (ID | ID ID?) ';' ; When the ATN simulation reaches the state before ';', it has a DFA state that looks like: [12|1|[], 6|2|[], 12|2|[]]. Naturally 12|1|[] and 12|2|[] conflict, but we cannot stop processing this node because alternative to has another way to continue, via [6|2|[]]. The key is that we have a single state that has config's only associated with a single alternative, 2, and crucially the state transitions among the configurations are all non-epsilon transitions. That means we don't consider any conflicts that include alternative 2. So, we ignore the conflict between alts 1 and 2. We ignore a set of conflicting alts when there is an intersection with an alternative associated with a single alt state in the state→config-list map. It's also the case that we might have two conflicting configurations but also a 3rd nonconflicting configuration for a different alternative: [1|1|[], 1|2|[], 8|3|[]]. This can come about from grammar: a : A | A | A B ; After matching input A, we reach the stop state for rule A, state 1. State 8 is the state right before B. Clearly alternatives 1 and 2 conflict and no amount of further lookahead will separate the two. However, alternative 3 will be able to continue and so we do not stop working on this state. In the previous example, we're concerned with states associated with the conflicting alternatives. Here alt 3 is not associated with the conflicting configs, but since we can continue looking for input reasonably, I don't declare the state done. We ignore a set of conflicting alts when we have an alternative that we still need to pursue. Used for debugging in adaptivePredict around execATN but I cut it out for clarity now that alg. works well. We can leave this "dead" code for a bit. Add an edge to the DFA, if possible. This method calls {@link #addDFAState} to ensure the {@code to} state is present in the DFA. If {@code from} is {@code null}, or if {@code t} is outside the range of edges that can be represented in the DFA tables, this method returns without adding the edge to the DFA.

If {@code to} is {@code null}, this method returns {@code null}. Otherwise, this method returns the {@link DFAState} returned by calling {@link #addDFAState} for the {@code to} state.

@param dfa The DFA @param from The source state for the edge @param t The input symbol @param to The target state for the edge @return If {@code to} is {@code null}, this method returns {@code null}; otherwise this method returns the result of calling {@link #addDFAState} on {@code to} Add state {@code D} to the DFA if it is not already present, and return the actual instance stored in the DFA. If a state equivalent to {@code D} is already in the DFA, the existing state is returned. Otherwise this method returns {@code D} after adding it to the DFA.

If {@code D} is {@link #ERROR}, this method returns {@link #ERROR} and does not change the DFA.

@param dfa The dfa @param D The DFA state to add @return The state stored in the DFA. This will be either the existing state if {@code D} is already in the DFA, or {@code D} itself if the state was not already present. If context sensitive parsing, we know it's ambiguity not conflict

Start of (A|B|...)+ loop. Technically a decision state, but we don't use for code generation; somebody might need it, so I'm defining it for completeness. In reality, the node is the real decision-making note for A+ .

Decision state for A+ and (A|B)+ . It has two transitions: one to the loop back to start of the block and one to exit.

Sam Harwell

This class represents profiling event information for semantic predicate evaluations which occur during prediction.

This class represents profiling event information for semantic predicate evaluations which occur during prediction. 4.3

The semantic context which was evaluated.

The alternative number for the decision which is guarded by the semantic context . Note that other ATN configurations may predict the same alternative which are guarded by other semantic contexts and/or .

The result of evaluating the semantic context .

Constructs a new instance of the class with the specified detailed predicate evaluation information.

The decision number The input token stream The start index for the current prediction The index at which the predicate evaluation was triggered. Note that the input stream may be reset to other positions for the actual evaluation of individual predicates. The semantic context which was evaluated The results of evaluating the semantic context The alternative number for the decision which is guarded by the semantic context . See for more information. {@code true} if the semantic context was evaluated during LL prediction; otherwise, {@code false} if the semantic context was evaluated during SLL prediction

TODO: this is old comment: A tree of semantic predicates from the grammar AST if label==SEMPRED.

TODO: this is old comment: A tree of semantic predicates from the grammar AST if label==SEMPRED. In the ATN, labels will always be exactly one predicate, but the DFA may have to combine a bunch of them as it collects predicates from multiple ATN configurations into a single DFA state. Add a context to the cache and return it. If the context already exists, return that one instead and do not add a new context to the cache. Protect shared cache from unsafe thread access.

This enumeration defines the prediction modes available in ANTLR 4 along with utility methods for analyzing configuration sets for conflicts and/or ambiguities.

The SLL(*) prediction mode.

The SLL(*) prediction mode. This prediction mode ignores the current parser context when making predictions. This is the fastest prediction mode, and provides correct results for many grammars. This prediction mode is more powerful than the prediction mode provided by ANTLR 3, but may result in syntax errors for grammar and input combinations which are not SLL.

When using this prediction mode, the parser will either return a correct parse tree (i.e. the same parse tree that would be returned with the prediction mode), or it will report a syntax error. If a syntax error is encountered when using the prediction mode, it may be due to either an actual syntax error in the input or indicate that the particular combination of grammar and input requires the more powerful prediction abilities to complete successfully.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

The LL(*) prediction mode.

The LL(*) prediction mode. This prediction mode allows the current parser context to be used for resolving SLL conflicts that occur during prediction. This is the fastest prediction mode that guarantees correct parse results for all combinations of grammars with syntactically correct inputs.

When using this prediction mode, the parser will make correct decisions for all syntactically-correct grammar and input combinations. However, in cases where the grammar is truly ambiguous this prediction mode might not report a precise answer for exactly which alternatives are ambiguous.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

The LL(*) prediction mode with exact ambiguity detection.

The LL(*) prediction mode with exact ambiguity detection. In addition to the correctness guarantees provided by the prediction mode, this prediction mode instructs the prediction algorithm to determine the complete and exact set of ambiguous alternatives for every ambiguous decision encountered while parsing.

This prediction mode may be used for diagnosing ambiguities during grammar development. Due to the performance overhead of calculating sets of ambiguous alternatives, this prediction mode should be avoided when the exact results are not necessary.

This prediction mode does not provide any guarantees for prediction behavior for syntactically-incorrect inputs.

A Map that uses just the state and the stack context as the key.

The hash code is only a function of the and .

Computes the SLL prediction termination condition.

This method computes the SLL prediction termination condition for both of the following cases.

The usual SLL+LL fallback upon SLL conflict
Pure SLL without LL fallback

COMBINED SLL+LL PARSING

When LL-fallback is enabled upon SLL conflict, correct predictions are ensured regardless of how the termination condition is computed by this method. Due to the substantially higher cost of LL prediction, the prediction should only fall back to LL when the additional lookahead cannot lead to a unique SLL prediction.

Assuming combined SLL+LL parsing, an SLL configuration set with only conflicting subsets should fall back to full LL, even if the configuration sets don't resolve to the same alternative (e.g. 1,2}} and 3,4}}. If there is at least one non-conflicting configuration, SLL could continue with the hopes that more lookahead will resolve via one of those non-conflicting configurations.

Here's the prediction termination rule them: SLL (for SLL+LL parsing) stops when it sees only conflicting configuration subsets. In contrast, full LL keeps going when there is uncertainty.

HEURISTIC

As a heuristic, we stop prediction when we see any conflicting subset unless we see a state that only has one alternative associated with it. The single-alt-state thing lets prediction continue upon rules like (otherwise, it would admit defeat too soon):

[12|1|[], 6|2|[], 12|2|[]]. s : (ID | ID ID?) ';' ;

When the ATN simulation reaches the state before ';' , it has a DFA state that looks like: [12|1|[], 6|2|[], 12|2|[]] . Naturally 12|1|[] and 12|2|[] conflict, but we cannot stop processing this node because alternative to has another way to continue, via [6|2|[]] .

It also let's us continue for this rule:

[1|1|[], 1|2|[], 8|3|[]] a : A | A | A B ;

After matching input A, we reach the stop state for rule A, state 1. State 8 is the state right before B. Clearly alternatives 1 and 2 conflict and no amount of further lookahead will separate the two. However, alternative 3 will be able to continue and so we do not stop working on this state. In the previous example, we're concerned with states associated with the conflicting alternatives. Here alt 3 is not associated with the conflicting configs, but since we can continue looking for input reasonably, don't declare the state done.

PURE SLL PARSING

To handle pure SLL parsing, all we have to do is make sure that we combine stack contexts for configurations that differ only by semantic predicate. From there, we can do the usual SLL termination heuristic.

PREDICATES IN SLL+LL PARSING

SLL decisions don't evaluate predicates until after they reach DFA stop states because they need to create the DFA cache that works in all semantic situations. In contrast, full LL evaluates predicates collected during start state computation so it can ignore predicates thereafter. This means that SLL termination detection can totally ignore semantic predicates.

Implementation-wise, combines stack contexts but not semantic predicate contexts so we might see two configurations like the following.

(s, 1, x, ), (s, 1, x', {p})}

Before testing these configurations against others, we have to merge x and x' (without modifying the existing configurations). For example, we test (x+x')==x'' when looking for conflicts in the following configurations.

(s, 1, x, ), (s, 1, x', {p}), (s, 2, x'', {})}

If the configuration set has predicates (as indicated by ), this algorithm makes a copy of the configurations to strip out all of the predicates so that a standard will merge everything ignoring predicates.

Checks if any configuration in is in a . Configurations meeting this condition have reached the end of the decision rule (local context) or end of start rule (full context).

the configuration set to test if any configuration in is in a , otherwise

Checks if all configurations in are in a . Configurations meeting this condition have reached the end of the decision rule (local context) or end of start rule (full context).

the configuration set to test if all configurations in are in a , otherwise

Full LL prediction termination.

Can we stop looking ahead during ATN simulation or is there some uncertainty as to which alternative we will ultimately pick, after consuming more input? Even if there are partial conflicts, we might know that everything is going to resolve to the same minimum alternative. That means we can stop since no more lookahead will change that fact. On the other hand, there might be multiple conflicts that resolve to different minimums. That means we need more look ahead to decide which of those alternatives we should predict.

The basic idea is to split the set of configurations C , into conflicting subsets (s, _, ctx, _) and singleton subsets with non-conflicting configurations. Two configurations conflict if they have identical and values but different value, e.g. (s, i, ctx, _) and (s, j, ctx, _) for i!=j .

Reduce these configuration subsets to the set of possible alternatives. You can compute the alternative subsets in one pass as follows:

A_s,ctx = i | (s, i, ctx, _)}} for each configuration in C holding s and ctx fixed.

Or in pseudo-code, for each configuration c in C :

             map[c] U= c.
             getAlt()
             # map hash/equals uses s and x, not
             alt and not pred

The values in map are the set of A_s,ctx sets.

If |A_s,ctx|=1 then there is no conflict associated with s and ctx .

Reduce the subsets to singletons by choosing a minimum of each subset. If the union of these alternative subsets is a singleton, then no amount of more lookahead will help us. We will always pick that alternative. If, however, there is more than one alternative, then we are uncertain which alternative to predict and must continue looking for resolution. We may or may not discover an ambiguity in the future, even if there are no conflicting subsets this round.

The biggest sin is to terminate early because it means we've made a decision but were uncertain as to the eventual outcome. We haven't used enough lookahead. On the other hand, announcing a conflict too late is no big deal; you will still have the conflict. It's just inefficient. It might even look until the end of file.

No special consideration for semantic predicates is required because predicates are evaluated on-the-fly for full LL prediction, ensuring that no configuration contains a semantic context during the termination check.

CONFLICTING CONFIGS

Two configurations (s, i, x) and (s, j, x') , conflict when i!=j but x=x' . Because we merge all (s, i, _) configurations together, that means that there are at most n configurations associated with state s for n possible alternatives in the decision. The merged stacks complicate the comparison of configuration contexts x and x' . Sam checks to see if one is a subset of the other by calling merge and checking to see if the merged result is either x or x' . If the x associated with lowest alternative i is the superset, then i is the only possible prediction since the others resolve to min(i) as well. However, if x is associated with j>i then at least one stack configuration for j is not in conflict with alternative i . The algorithm should keep going, looking for more lookahead due to the uncertainty.

For simplicity, I'm doing a equality check between x and x' that lets the algorithm continue to consume lookahead longer than necessary. The reason I like the equality is of course the simplicity but also because that is the test you need to detect the alternatives that are actually in conflict.

CONTINUE/STOP RULE

Continue if union of resolved alternative sets from non-conflicting and conflicting alternative subsets has more than one alternative. We are uncertain about which alternative to predict.

The complete set of alternatives, [i for (_,i,_)] , tells us which alternatives are still in the running for the amount of input we've consumed at this point. The conflicting sets let us to strip away configurations that won't lead to more states because we resolve conflicts to the configuration with a minimum alternate for the conflicting set.

CASES

no conflicts and more than 1 alternative in set => continue
(s, 1, x) , (s, 2, x) , (s, 3, z) , (s', 1, y) , (s', 2, y) yields non-conflicting set 3}} U conflicting sets min( 1,2})} U min( 1,2})} = 1,3}} => continue
(s, 1, x) , (s, 2, x) , (s', 1, y) , (s', 2, y) , (s'', 1, z) yields non-conflicting set 1}} U conflicting sets min( 1,2})} U min( 1,2})} = 1}} => stop and predict 1
(s, 1, x) , (s, 2, x) , (s', 1, y) , (s', 2, y) yields conflicting, reduced sets 1}} U 1}} = 1}} => stop and predict 1, can announce ambiguity 1,2}}
(s, 1, x) , (s, 2, x) , (s', 2, y) , (s', 3, y) yields conflicting, reduced sets 1}} U 2}} = 1,2}} => continue
(s, 1, x) , (s, 2, x) , (s', 3, y) , (s', 4, y) yields conflicting, reduced sets 1}} U 3}} = 1,3}} => continue

EXACT AMBIGUITY DETECTION

If all states report the same conflicting set of alternatives, then we know we have the exact ambiguity set.

|A_i|>1 and A_i = A_j for all i, j.

In other words, we continue examining lookahead until all A_i have more than one alternative and all A_i are the same. If A= {1,2}, {1,3}}}, then regular LL prediction would terminate because the resolved set is 1}}. To determine what the real ambiguity is, we have to know whether the ambiguity is between one and two or one and three so we keep going. We can only stop prediction when we need exact ambiguity detection when the sets look like A= {1,2}}} or {1,2},{1,2}}}, etc...

Determines if every alternative subset in contains more than one alternative.

a collection of alternative subsets if every in has cardinality > 1, otherwise

Determines if any single alternative subset in contains exactly one alternative.

a collection of alternative subsets if contains a with cardinality 1, otherwise

Determines if any single alternative subset in contains more than one alternative.

a collection of alternative subsets if contains a with cardinality > 1, otherwise

Determines if every alternative subset in is equivalent.

a collection of alternative subsets if every member of is equal to the others, otherwise

Returns the unique alternative predicted by all alternative subsets in . If no such alternative exists, this method returns .

a collection of alternative subsets

Gets the complete set of represented alternatives for a collection of alternative subsets.

Gets the complete set of represented alternatives for a collection of alternative subsets. This method returns the union of each in . a collection of alternative subsets the set of represented alternatives in

This function gets the conflicting alt subsets from a configuration set.

This function gets the conflicting alt subsets from a configuration set. For each configuration c in :

            map[c] U= c.
            getAlt()
            # map hash/equals uses s and x, not
            alt and not pred

Get a map from state to alt subset from a configuration set.

Get a map from state to alt subset from a configuration set. For each configuration c in :

            map[c.
            
            ] U= c.

@since 4.3 At the point of LL failover, we record how SLL would resolve the conflict so that we can determine whether or not a decision / input pair is context-sensitive. If LL gives a different result than SLL's predicted alternative, we have a context sensitivity for sure. The converse is not necessarily true, however. It's possible that after conflict resolution chooses minimum alternatives, SLL could get the same answer as LL. Regardless of whether or not the result indicates an ambiguity, it is not treated as a context sensitivity because LL prediction was not required in order to produce a correct prediction for this decision and input sequence. It may in fact still be a context sensitivity but we don't know by looking at the minimum alternatives for the current input.

The last node in the ATN for a rule, unless that rule is the start symbol.

The last node in the ATN for a rule, unless that rule is the start symbol. In that case, there is one transition to EOF. Later, we might encode references to all calls to this rule to compute FOLLOW sets for error handling.

Ptr to the rule definition object for this rule ref

What node to begin computations following ref to rule

A transition containing a set of values.

A transition containing a set of values. Sam Harwell

The block that begins a closure loop.

Indicates whether this state can benefit from a precedence DFA during SLL decision making.

This is a computed property that is calculated during ATN deserialization and stored for use in and .

The Tokens rule start state linking to each lexer rule start state

An ATN transition between any two ATN states.

An ATN transition between any two ATN states. Subclasses define atom, set, epsilon, action, predicate, rule transitions.

This is a one way link. It emanates from a state (usually via a list of transitions) and has a target state.

Since we never have to change the ATN transitions once we construct it, we can fix these transitions as specific classes. The DFA transitions on the other hand need to update the labels as it adds transitions to the states. We'll use the term Edge for the DFA to distinguish them from ATN transitions.

The target of this transition.

Determines if the transition is an "epsilon" transition.

The default implementation returns .

if traversing this transition in the ATN does not consume an input symbol; otherwise, if traversing this transition consumes (matches) an input symbol.

This implementation of responds to syntax errors by immediately canceling the parse operation with a . The implementation ensures that the field is set for all parse tree nodes that were not completed prior to encountering the error.

This error strategy is useful in the following scenarios.

Two-stage parsing: This error strategy allows the first stage of two-stage parsing to immediately terminate if an error is encountered, and immediately fall back to the second stage. In addition to avoiding wasted work by attempting to recover from errors here, the empty implementation of improves the performance of the first stage.
Silent validation: When syntax errors are not being reported or logged, and the parse result is simply ignored if errors occur, the avoids wasting work on recovering from errors when the result will be ignored either way.

myparser.setErrorHandler(new BailErrorStrategy());

Instead of recovering from exception , re-throw it wrapped in a so it is not caught by the rule function catches. Use to get the original .

Make sure we don't attempt to recover inline; if the parser successfully recovers, it won't throw an exception.

Make sure we don't attempt to recover from problems in subrules.

Provides an empty default implementation of . The default implementation of each method does nothing, but can be overridden as necessary.

Sam Harwell

This implementation of loads tokens from a on-demand, and places the tokens in a buffer to provide access to any previous token by index.

This token stream ignores the value of . If your parser requires the token stream filter tokens to only those on a particular channel, such as or , use a filtering token stream such a .

The from which tokens for this stream are fetched.

A collection of all tokens fetched from the token source.

A collection of all tokens fetched from the token source. The list is considered a complete view of the input once is set to .

The index into of the current token (next token to ). [ ] should be LT(1) .

This field is set to -1 when the stream is first constructed or when is called, indicating that the first token has not yet been fetched from the token source. For additional information, see the documentation of for a description of Initializing Methods.

Indicates whether the token has been fetched from and added to . This field improves performance for the following cases:

: The lookahead check in to prevent consuming the EOF symbol is optimized by checking the values of and instead of calling .
: The check to prevent adding multiple EOF symbols into is trivial with this field.

Make sure index in tokens has a token.

if a token is located at index , otherwise .

Add elements to buffer.

The actual number of elements added to the buffer.

Get all tokens from start..stop inclusively.

Allowed derived classes to modify the behavior of operations which change the current stream position by adjusting the target token index of a seek operation.

Allowed derived classes to modify the behavior of operations which change the current stream position by adjusting the target token index of a seek operation. The default implementation simply returns . If an exception is thrown in this method, the current stream index should not be changed.

For example, overrides this method to ensure that the seek target is always an on-channel token.

The target token index. The adjusted target token index.

Reset this token stream by setting its token source.

Given a start and stop index, return a List of all tokens in the token type BitSet . Return if no tokens were found. This method looks at both on and off channel tokens.

Given a starting index, return the index of the next token on channel.

Given a starting index, return the index of the next token on channel. Return if tokens[i] is on channel. Return the index of the EOF token if there are no tokens on channel between and EOF.

Given a starting index, return the index of the previous token on channel.

Given a starting index, return the index of the previous token on channel. Return if tokens[i] is on channel. Return -1 if there are no tokens on channel between and 0.

If specifies an index at or after the EOF token, the EOF token index is returned. This is due to the fact that the EOF token is treated as though it were on every channel.

Collect all tokens on specified channel to the right of the current token up until we see a token on or EOF. If is -1 , find any non default channel token.

Collect all hidden tokens (any off-default channel) to the right of the current token up until we see a token on or EOF.

Collect all tokens on specified channel to the left of the current token up until we see a token on . If is -1 , find any non default channel token.

Collect all hidden tokens (any off-default channel) to the left of the current token up until we see a token on .

Get the text of all tokens in this buffer.

Get all tokens from lexer until EOF.

Utility class to create s from various sources of string data. The methods in this utility class support the full range of Unicode code points up to U+10FFFF, unlike , which is limited to 16-bit Unicode code units up to U+FFFF.

Creates an given a path to a UTF-8 encoded file on disk. Reads the entire contents of the file into the result before returning.

Creates an given a path to a file on disk and the encoding of the bytes contained in the file. Reads the entire contents of the file into the result before returning.

Creates an given an opened . Reads the entire contents of the TextReader then closes the reader before returning.

Creates an given an opened from which UTF-8 encoded bytes can be read. Reads the entire contents of the stream into the result then closes the stream before returning.

Creates an given an opened as well as the encoding of the bytes to be read from the stream. Reads the entire contents of the stream into the result then closes the stream before returning.

Creates an given a .

An empty which is used as the default value of for tokens that do not have a source.

This is the backing field for the property.

This is the backing field for and .

These properties share a field to reduce the memory footprint of . Tokens created by a from the same source and input stream share a reference to the same containing these values.

This is the backing field for the property.

Constructs a new with the specified token type.

The token type.

Constructs a new with the specified token type and text.

The token type. The text of the token.

Constructs a new as a copy of another .

If is also a instance, the newly constructed token will share a reference to the field and the stored in . Otherwise, will be assigned the result of calling , and will be constructed from the result of and .

The token to copy.

Explicitly set the text for this token.

Explicitly set the text for this token. If {code text} is not , then will return this value rather than extracting the text from the input. The explicit text of the token, or if the text should be obtained from the input along with the start and stop indexes of the token.

This default implementation of creates objects.

The default instance.

This token factory does not explicitly copy token text when constructing tokens.

Indicates whether should be called after constructing tokens to explicitly set the text. This is useful for cases where the input stream might not be able to provide arbitrary substrings of text from the input after the lexer creates a token (e.g. the implementation of in throws an ). Explicitly setting the token text allows to be called at any time regardless of the input stream implementation.

The default value is to avoid the performance and memory overhead of copying text for every token unless explicitly requested.

Constructs a with the specified value for .

When is , the instance should be used instead of constructing a new instance.

The value for .

Constructs a with set to .

The instance should be used instead of calling this directly.

This class extends with functionality to filter token streams to tokens on a particular channel (tokens where returns a particular value).

This token stream provides access to all tokens by index or when calling methods like . The channel filtering is only used for code accessing tokens via the lookahead methods , , and .

By default, tokens are placed on the default channel ( ), but may be reassigned by using the ->channel(HIDDEN) lexer command, or by using an embedded action to call .

Note: lexer rules which use the ->skip lexer command or call do not produce tokens at all, so input text matched by such a rule will not be available as part of the token stream, regardless of channel.

Specifies the channel to use for filtering tokens.

The default value is , which matches the default channel assigned to tokens created by the lexer.

Constructs a new using the specified token source and the default token channel ( ).

The token source.

Constructs a new using the specified token source and filtering tokens to the specified channel. Only tokens whose matches or have the equal to will be returned by the token stream lookahead methods.

The token source. The channel to use for filtering tokens.

Count EOF just once.

Count EOF just once. Sam Harwell

Provides a default instance of .

This implementation prints messages to containing the values of , , and using the following format.

            line line:charPositionInLine msg

This is the default implementation of used for error reporting and recovery in ANTLR parsers.

Indicates whether the error strategy is currently "recovering from an error".

Indicates whether the error strategy is currently "recovering from an error". This is used to suppress reporting multiple error messages while attempting to recover from a detected syntax error.

The index into the input stream where the last error occurred.

The index into the input stream where the last error occurred. This is used to prevent infinite loops where an error is found but no token is consumed during recovery...another error is found, ad nauseum. This is a failsafe mechanism to guarantee that at least one token/tree node is consumed for two errors. This field is used to propagate information about the lookahead following the previous match. Since prediction prefers completing the current rule to error recovery efforts, error reporting may occur later than the original point where it was discoverable. The original context is used to compute the true expected sets as though the reporting occurred as early as possible. @see #nextTokensContext

The default implementation simply calls to ensure that the handler is not in error recovery mode.

This method is called to enter error recovery mode when a recognition exception is reported.

This method is called to enter error recovery mode when a recognition exception is reported. the parser instance

This method is called to leave error recovery mode after recovering from a recognition exception.

The default implementation simply calls .

The default implementation returns immediately if the handler is already in error recovery mode. Otherwise, it calls and dispatches the reporting task based on the runtime type of according to the following table.

: Dispatches the call to
: Dispatches the call to
: Dispatches the call to
All other types: calls to report the exception

The default implementation resynchronizes the parser by consuming tokens until we find one in the resynchronization set--loosely the set of tokens that can follow the current rule.

The default implementation of makes sure that the current lookahead symbol is consistent with what were expecting at this point in the ATN. You can call this anytime but ANTLR only generates code to check before subrules/loops and each iteration.

Implements Jim Idle's magic sync mechanism in closures and optional subrules. E.g.,

            a : sync ( stuff sync )* ;
            sync : {consume to what can follow sync} ;

At the start of a sub rule upon error, performs single token deletion, if possible. If it can't do that, it bails on the current rule and uses the default error recovery, which consumes until the resynchronization set of the current rule.

If the sub rule is optional ( (...)? , (...)* , or block with an empty alternative), then the expected set includes what follows the subrule.

During loop iteration, it consumes until it sees a token that can start a sub rule or what follows loop. Yes, that is pretty aggressive. We opt to stay in the loop as long as possible.

ORIGINS

Previous versions of ANTLR did a poor job of their recovery within loops. A single mismatch token or missing token would force the parser to bail out of the entire rules surrounding the loop. So, for rule

            classDef : 'class' ID '{' member* '}'

input with an extra token between members would force the parser to consume until it found the next class definition rather than the next member definition of the current class.

This functionality cost a little bit of effort because the parser has to compare token set at the start of the loop and at each iteration. If for some reason speed is suffering for you, you can turn off this functionality by simply overriding this method as a blank { }.

This is called by when the exception is a .

the parser instance the recognition exception

This is called by when the exception is an .

the parser instance the recognition exception

This is called by when the exception is a .

the parser instance the recognition exception

This method is called to report a syntax error which requires the removal of a token from the input stream.

This method is called to report a syntax error which requires the removal of a token from the input stream. At the time this method is called, the erroneous symbol is current LT(1) symbol and has not yet been removed from the input stream. When this method returns, is in error recovery mode.

This method is called when identifies single-token deletion as a viable recovery strategy for a mismatched input error.

The default implementation simply returns if the handler is already in error recovery mode. Otherwise, it calls to enter error recovery mode, followed by calling .

the parser instance

This method is called to report a syntax error which requires the insertion of a missing token into the input stream.

This method is called to report a syntax error which requires the insertion of a missing token into the input stream. At the time this method is called, the missing token has not yet been inserted. When this method returns, is in error recovery mode.

This method is called when identifies single-token insertion as a viable recovery strategy for a mismatched input error.

The default implementation simply returns if the handler is already in error recovery mode. Otherwise, it calls to enter error recovery mode, followed by calling .

the parser instance

The default implementation attempts to recover from the mismatched input by using single token insertion and deletion as described below. If the recovery attempt fails, this method throws an .

EXTRA TOKEN (single token deletion)

LA(1) is not what we are looking for. If LA(2) has the right token, however, then assume LA(1) is some extra spurious token and delete it. Then consume and return the next token (which was the LA(2) token) as the successful result of the match operation.

This recovery strategy is implemented by .

MISSING TOKEN (single token insertion)

If current token (at LA(1) ) is consistent with what could come after the expected LA(1) token, then assume the token is missing and use the parser's to create it on the fly. The "insertion" is performed by returning the created token as the successful result of the match operation.

This recovery strategy is implemented by .

EXAMPLE

For example, Input i=(3; is clearly missing the ')' . When the parser returns from the nested call to expr , it will have call chain:

            stat → expr → atom

and it will be trying to match the ')' at this point in the derivation:

            => ID '=' '(' INT ')' ('+' atom)* ';'
            ^

The attempt to match ')' will fail when it sees ';' and call . To recover, it sees that LA(1)==';' is in the set of tokens that can follow the ')' token reference in rule atom . It can assume that you forgot the ')' .

This method implements the single-token insertion inline error recovery strategy.

This method implements the single-token insertion inline error recovery strategy. It is called by if the single-token deletion strategy fails to recover from the mismatched input. If this method returns , will be in error recovery mode.

This method determines whether or not single-token insertion is viable by checking if the LA(1) input symbol could be successfully matched if it were instead the LA(2) symbol. If this method returns , the caller is responsible for creating and inserting a token with the correct type to produce this behavior.

the parser instance if single-token insertion is a viable recovery strategy for the current mismatched input, otherwise

This method implements the single-token deletion inline error recovery strategy.

This method implements the single-token deletion inline error recovery strategy. It is called by to attempt to recover from mismatched input. If this method returns null, the parser and error handler state will not have changed. If this method returns non-null, will not be in error recovery mode since the returned token was a successful match.

If the single-token deletion is successful, this method calls to report the error, followed by to actually "delete" the extraneous token. Then, before returning is called to signal a successful match.

the parser instance the successfully matched instance if single-token deletion successfully recovers from the mismatched input, otherwise

Conjure up a missing token during error recovery.

Conjure up a missing token during error recovery. The recognizer attempts to recover from single missing symbols. But, actions might refer to that missing symbol. For example, x=ID {f($x);}. The action clearly assumes that there has been an identifier matched previously and that $x points at that token. If that token is missing, but the next token in the stream is what we want we assume that this token is missing and we keep going. Because we have to return some token to replace the missing token, we have to conjure one up. This method gives the user control over the tokens returned for missing tokens. Mostly, you will want to create something special for identifier tokens. For literals such as '{' and ',', the default action in the parser or tree parser works. It simply creates a CommonToken of the appropriate type. The text will be the token. If you change what tokens must be created by the lexer, override this method to create the appropriate tokens.

How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out.

How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out. Override in that case to use t.toString() (which, for CommonToken, dumps everything about the token). This is better than forcing you to override a method in your token objects because you don't have to go modify your lexer so that it creates a new Java type.

Consume tokens until one matches the given token set.

Consume tokens until one matches the given token set. Sam Harwell Sam Harwell

Stores information about a which is an accept state under some condition. Certain settings, such as , may be used in addition to this information to determine whether or not a particular state is an accept state.

Sam Harwell

Gets the prediction made by this accept state.

Gets the prediction made by this accept state. Note that this value assumes the predicates, if any, in the evaluate to . If predicate evaluation is enabled, the final prediction of the accept state will be determined by the result of predicate evaluation.

Gets the which can be used to execute actions and/or commands after the lexer matches a token.

Sam Harwell A set of all DFA states. Use {@link Map} so we can get old state back ({@link Set} only allows you to see if it's there). From which ATN state did we create this DFA? {@code true} if this DFA is for a precedence decision; otherwise, {@code false}. This is the backing field for {@link #isPrecedenceDfa}. Gets whether this DFA is a precedence DFA. Precedence DFAs use a special start state {@link #s0} which is not stored in {@link #states}. The {@link DFAState#edges} array for this start state contains outgoing edges supplying individual start states corresponding to specific precedence values. @return {@code true} if this is a precedence DFA; otherwise, {@code false}. @see Parser#getPrecedence() Get the start state for a specific precedence value. @param precedence The current precedence. @return The start state corresponding to the specified precedence, or {@code null} if no start state exists for the specified precedence. @throws IllegalStateException if this is not a precedence DFA. @see #isPrecedenceDfa() Set the start state for a specific precedence value. @param precedence The current precedence. @param startState The start state corresponding to the specified precedence. @throws IllegalStateException if this is not a precedence DFA. @see #isPrecedenceDfa() Return a list of all states in this DFA, ordered by state number.

A DFA walker that knows how to dump them to serialized strings.

A DFA state represents a set of possible ATN configurations.

A DFA state represents a set of possible ATN configurations. As Aho, Sethi, Ullman p. 117 says "The DFA uses its state to keep track of all possible states the ATN can be in after reading each input symbol. That is to say, after reading input a1a2..an, the DFA is in a state that represents the subset T of the states of the ATN that are reachable from the ATN's start state along some path labeled a1a2..an." In conventional NFA→DFA conversion, therefore, the subset T would be a bitset representing the set of states the ATN could be in. We need to track the alt predicted by each state as well, however. More importantly, we need to maintain a stack of states, tracking the closure operations as they jump from rule to rule, emulating rule invocations (method calls). I have to add a stack to simulate the proper lookahead sequences for the underlying LL grammar from which the ATN was derived.

I use a set of ATNConfig objects not simple states. An ATNConfig is both a state (ala normal conversion) and a RuleContext describing the chain of rules (if any) followed to arrive at that state.

A DFA state may have multiple references to a particular state, but with different ATN contexts (with same or different alts) meaning that state was reached via a different set of rule invocations.

{@code edges[symbol]} points to target of symbol. Shift up by 1 so (-1) {@link Token#EOF} maps to {@code edges[0]}. if accept state, what ttype do we match or alt do we predict? This is set to {@link ATN#INVALID_ALT_NUMBER} when {@link #predicates}{@code !=null} or {@link #requiresFullContext}. Indicates that this state was created during SLL prediction that discovered a conflict between the configurations in the state. Future {@link ParserATNSimulator#execATN} invocations immediately jumped doing full context prediction if this field is true. During SLL parsing, this is a list of predicates associated with the ATN configurations of the DFA state. When we have predicates, {@link #requiresFullContext} is {@code false} since full context prediction evaluates predicates on-the-fly. If this is not null, then {@link #prediction} is {@link ATN#INVALID_ALT_NUMBER}.

We only use these for non-{@link #requiresFullContext} but conflicting states. That means we know from the context (it's $ or we don't dip into outer context) that it's an ambiguity not a conflict.

This list is computed by {@link ParserATNSimulator#predicateDFAState}.

Get the set of all alts mentioned by all ATN configurations in this DFA state. Two {@link DFAState} instances are equal if their ATN configuration sets are the same. This method is used to see if a state already exists.

Because the number of alternatives and number of ATN configurations are finite, there is a finite number of DFA states that can be processed. This is necessary to show that the algorithm terminates.

Cannot test the DFA state numbers here because in {@link ParserATNSimulator#addDFAState} we need to know if any other state exists that has this exact set of ATN configurations. The {@link #stateNumber} is irrelevant.

Map a predicate to a predicted alternative.

This implementation of represents an empty edge map.

Sam Harwell Sam Harwell Sam Harwell Sam Harwell

This implementation of can be used to identify certain potential correctness and performance problems in grammars. "Reports" are made by calling with the appropriate message.

Ambiguities: These are cases where more than one path through the grammar can match the input.
Weak context sensitivity: These are cases where full-context prediction resolved an SLL conflict to a unique alternative which equaled the minimum alternative of the SLL conflict.
Strong (forced) context sensitivity: These are cases where the full-context prediction resolved an SLL conflict to a unique alternative, and the minimum alternative of the SLL conflict was found to not be a truly viable alternative. Two-stage parsing cannot be used for inputs where this situation occurs.

Sam Harwell

When , only exactly known ambiguities are reported.

Initializes a new instance of which only reports exact ambiguities.

Initializes a new instance of , specifying whether all ambiguities or only exact ambiguities are reported.

to report only exact ambiguities, otherwise to report all ambiguities.

Computes the set of conflicting or ambiguous alternatives from a configuration set, if that information was not already provided by the parser.

Computes the set of conflicting or ambiguous alternatives from a configuration set, if that information was not already provided by the parser. The set of conflicting or ambiguous alternatives, as reported by the parser. The conflicting or ambiguous configuration set. Returns if it is not , otherwise returns the set of alternatives represented in .

A semantic predicate failed during validation.

A semantic predicate failed during validation. Validation of predicates occurs when normally parsing the alternative just like matching a token. Disambiguating predicate evaluation occurs when we test a predicate during prediction.

How to emit recognition errors.

Upon syntax error, notify any interested parties.

Upon syntax error, notify any interested parties. This is not how to recover from errors or compute error messages. specifies how to recover from syntax errors and how to compute error messages. This listener's job is simply to emit a computed message, though it has enough information to create its own message in many cases.

The is non-null for all syntax errors except when we discover mismatched token errors that we can recover from in-line, without returning from the surrounding rule (via the single token insertion and deletion mechanism).

Where the error should be written. What parser got the error. From this object, you can access the context as well as the input stream. The offending token in the input token stream, unless recognizer is a lexer (then it's null). If no viable alternative error, has token at which we started production for the decision. The line number in the input where the error occurred. The character position within that line where the error occurred. The message to emit. The exception generated by the parser that led to the reporting of an error. It is null in the case where the parser was able to recover in line without exiting the surrounding rule.

The interface for defining strategies to deal with syntax errors encountered during a parse by ANTLR-generated parsers.

The interface for defining strategies to deal with syntax errors encountered during a parse by ANTLR-generated parsers. We distinguish between three different kinds of errors:

The parser could not figure out which path to take in the ATN (none of the available alternatives could possibly match)
The current input does not match what we were looking for
A predicate evaluated to false

Implementations of this interface report syntax errors by calling .

TODO: what to do about lexers

Reset the error handler state for the specified .

the parser instance

This method is called when an unexpected symbol is encountered during an inline match operation, such as . If the error strategy successfully recovers from the match failure, this method returns the instance which should be treated as the successful result of the match.

Note that the calling code will not report an error if this method returns successfully. The error strategy implementation is responsible for calling as appropriate.

the parser instance if the error strategy was not able to recover from the unexpected input symbol

This method is called to recover from exception . This method is called after by the default exception handler generated for a rule method.

the parser instance the recognition exception to recover from if the error strategy could not recover from the recognition exception

This method provides the error handler with an opportunity to handle syntactic or semantic errors in the input stream before they result in a .

The generated code currently contains calls to after entering the decision state of a closure block ( (...)* or (...)+ ).

For an implementation based on Jim Idle's "magic sync" mechanism, see .

the parser instance if an error is detected by the error strategy but cannot be automatically recovered at the current state in the parsing process

Tests whether or not is in the process of recovering from an error. In error recovery mode, adds symbols to the parse tree by calling instead of .

the parser instance if the parser is currently recovering from a parse error, otherwise

This method is called by when the parser successfully matches an input symbol.

This method is called by when the parser successfully matches an input symbol. the parser instance

Report any kind of . This method is called by the default exception handler generated for a rule method.

the parser instance the recognition exception to report

A source of characters for an ANTLR lexer.

This method returns the text for a range of characters within this input stream.

This method returns the text for a range of characters within this input stream. This method is guaranteed to not throw an exception if the specified lies entirely within a marked range. For more information about marked ranges, see . an interval within the stream the text of the specified interval if is if interval.a < 0 , or if interval.b < interval.a - 1 , or if interval.b lies at or past the end of the stream if the stream does not support getting the text of the specified interval

A simple stream of symbols whose values are represented as integers.

A simple stream of symbols whose values are represented as integers. This interface provides marked ranges with support for a minimum level of buffering necessary to implement arbitrary lookahead during prediction. For more information on marked ranges, see .

Initializing Methods: Some methods in this interface have unspecified behavior if no call to an initializing method has occurred after the stream was constructed. The following is a list of initializing methods:

Consumes the current symbol in the stream.

Consumes the current symbol in the stream. This method has the following effects:

Forward movement: The value of index() before calling this method is less than the value of index() after calling this method.
Ordered lookahead: The value of LA(1) before calling this method becomes the value of LA(-1) after calling this method.

Note that calling this method does not guarantee that index() is incremented by exactly 1, as that would preclude the ability to implement filtering streams (e.g. which distinguishes between "on-channel" and "off-channel" tokens). if an attempt is made to consume the the end of the stream (i.e. if LA(1)== EOF before calling consume ).

Gets the value of the symbol at offset from the current position. When i==1 , this method returns the value of the current symbol in the stream (which is the next symbol to be consumed). When i==-1 , this method returns the value of the previously read symbol in the stream. It is not valid to call this method with i==0 , but the specific behavior is unspecified because this method is frequently called from performance-critical code.

This method is guaranteed to succeed if any of the following are true:

i>0
i==-1 and index() returns a value greater than the value of index() after the stream was constructed and LA(1) was called in that order. Specifying the current index() relative to the index after the stream was created allows for filtering implementations that do not return every symbol from the underlying source. Specifying the call to LA(1) allows for lazily initialized streams.
LA(i) refers to a symbol consumed within a marked region that has not yet been released.

If represents a position at or beyond the end of the stream, this method returns .

The return value is unspecified if i<0 and fewer than -i calls to consume() have occurred from the beginning of the stream before calling this method.

if the stream does not support retrieving the value of the specified symbol

A mark provides a guarantee that seek() operations will be valid over a "marked range" extending from the index where mark() was called to the current index() . This allows the use of streaming input sources by specifying the minimum buffering requirements to support arbitrary lookahead during prediction.

The returned mark is an opaque handle (type int ) which is passed to release() when the guarantees provided by the marked range are no longer necessary. When calls to mark() / release() are nested, the marks must be released in reverse order of which they were obtained. Since marked regions are used during performance-critical sections of prediction, the specific behavior of invalid usage is unspecified (i.e. a mark is not released, or a mark is released twice, or marks are not released in reverse order from which they were created).

The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.

This method does not change the current position in the input stream.

The following example shows the use of mark() , release(mark) , index() , and seek(index) as part of an operation to safely work within a marked region, then restore the stream position to its original value and release the mark.

            IntStream stream = ...;
            int index = -1;
            int mark = stream.mark();
            try {
            index = stream.index();
            // perform work here...
            } finally {
            if (index != -1) {
            stream.seek(index);
            }
            stream.release(mark);
            }

An opaque marker which should be passed to release() when the marked range is no longer required.

This method releases a marked range created by a call to mark() . Calls to release() must appear in the reverse order of the corresponding calls to mark() . If a mark is released twice, or if marks are not released in reverse order of the corresponding calls to mark() , the behavior is unspecified.

For more information and an example, see .

A marker returned by a call to mark() .

Return the index into the stream of the input symbol referred to by LA(1) .

The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.

Set the input cursor to the position indicated by . If the specified index lies past the end of the stream, the operation behaves as though was the index of the EOF symbol. After this method returns without throwing an exception, the at least one of the following will be true.

index() will return the index of the first symbol appearing at or after the specified . Specifically, implementations which filter their sources should automatically adjust forward the minimum amount required for the operation to target a non-ignored symbol.
LA(1) returns

This operation is guaranteed to not throw an exception if lies within a marked region. For more information on marked regions, see . The behavior of this method is unspecified if no call to an initializing method has occurred after this stream was constructed.

The absolute index to seek to. if is less than 0 if the stream does not support seeking to the specified index

Returns the total number of symbols in the stream, including a single EOF symbol.

Returns the total number of symbols in the stream, including a single EOF symbol. if the size of the stream is unknown.

Gets the name of the underlying symbol source.

Gets the name of the underlying symbol source. This method returns a non-null, non-empty string. If such a name is not known, this method returns .

The value returned by LA() when the end of the stream is reached.

The value returned by when the actual name of the underlying source is not known.

This signifies any kind of mismatched input exceptions such as when the current input does not match the expected token.

This class extends by allowing the value of to be explicitly set for the context.

does not include field storage for the rule index since the context classes created by the code generator override the method to return the correct value for that context. Since the parser interpreter does not use the context classes generated for a parser, this class (with slightly more memory overhead per node) is used to provide equivalent functionality.

This is the backing field for .

Constructs a new with the specified parent, invoking state, and rule index.

The parent context. The invoking state number. The rule index for the current context.

How to emit recognition errors for parsers.

This method is called by the parser when a full-context prediction results in an ambiguity.

Each full-context prediction which does not result in a syntax error will call either or .

When is not null, it contains the set of potentially viable alternatives identified by the prediction algorithm. When is null, use to obtain the represented alternatives from the argument.

When is , all of the potentially viable alternatives are truly viable, i.e. this is reporting an exact ambiguity. When is , at least two of the potentially viable alternatives are viable for the current input, but the prediction algorithm terminated as soon as it determined that at least the minimum potentially viable alternative is truly viable.

When the prediction mode is used, the parser is required to identify exact ambiguities so will always be .

the parser instance the DFA for the current decision the input index where the decision started the input input where the ambiguity was identified if the ambiguity is exactly known, otherwise . This is always when is used. the potentially ambiguous alternatives, or to indicate that the potentially ambiguous alternatives are the complete set of represented alternatives in the ATN configuration set where the ambiguity was identified

This method is called when an SLL conflict occurs and the parser is about to use the full context information to make an LL decision.

If one or more configurations in configs contains a semantic predicate, the predicates are evaluated before this method is called. The subset of alternatives which are still viable after predicates are evaluated is reported in .

the parser instance the DFA for the current decision the input index where the decision started the input index where the SLL conflict occurred The specific conflicting alternatives. If this is , the conflicting alternatives are all alternatives represented in configs . the ATN configuration set where the ambiguity was identified

This method is called by the parser when a full-context prediction has a unique result.

Each full-context prediction which does not result in a syntax error will call either or .

For prediction implementations that only evaluate full-context predictions when an SLL conflict is found (including the default implementation), this method reports cases where SLL conflicts were resolved to unique full-context predictions, i.e. the decision was context-sensitive. This report does not necessarily indicate a problem, and it may appear even in completely unambiguous grammars.

configs may have more than one represented alternative if the full-context prediction algorithm does not evaluate predicates before beginning the full-context prediction. In all cases, the final prediction is passed as the argument.

Note that the definition of "context sensitivity" in this method differs from the concept in . This method reports all instances where an SLL conflict occurred but LL parsing produced a unique result, whether or not that unique result matches the minimum alternative in the SLL conflicting set.

the parser instance the DFA for the current decision the input index where the decision started the input index where the context sensitivity was finally determined the unambiguous result of the full-context prediction the ATN configuration set where the unambiguous prediction was determined

A token has properties: text, type, line, character position in the line (so we can ignore tabs), token channel, index, and source from which we obtained this token.

Get the text of the token.

Get the token type of the token.

The line number on which the 1st character of this token was matched, line=1..n

The index of the first character of this token relative to the beginning of the line at which it occurs, 0..n-1

Return the channel this token.

Return the channel this token. Each token can arrive at the parser on a different channel, but the parser only "tunes" to a single channel. The parser ignores everything not on DEFAULT_CHANNEL.

An index from 0..n-1 of the token object in the input stream.

An index from 0..n-1 of the token object in the input stream. This must be valid in order to print token streams and use TokenRewriteStream. Return -1 to indicate that this token was conjured up since it doesn't have a valid index.

The starting character index of the token This method is optional; return -1 if not implemented.

The last character index of the token.

The last character index of the token. This method is optional; return -1 if not implemented.

Gets the which created this token.

Gets the from which this token was derived.

During lookahead operations, this "token" signifies we hit rule end ATN state and did not follow it despite needing to.

All tokens go to the parser (unless skip() is called in that rule) on a particular "channel".

All tokens go to the parser (unless skip() is called in that rule) on a particular "channel". The parser tunes to a particular channel so that whitespace etc... can go to the parser on a "hidden" channel.

Anything on different channel than DEFAULT_CHANNEL is not parsed by parser.

This is the minimum constant value which can be assigned to a user-defined token channel.

The non-negative numbers less than are assigned to the predefined channels and .

The default mechanism for creating tokens.

The default mechanism for creating tokens. It's used by default in Lexer and the error handling strategy (to create missing tokens). Notifying the parser of a new factory means that it notifies it's token source and error strategy.

This is the method used to create tokens in the lexer and in the error handling strategy.

This is the method used to create tokens in the lexer and in the error handling strategy. If text!=null, than the start and stop positions are wiped to -1 in the text override is set in the CommonToken.

Generically useful

A source of tokens must provide a sequence of tokens via and also must reveal it's source of characters; 's text is computed from a ; it only store indices into the char stream.

Errors from the lexer are never passed to the parser. Either you want to keep going or you do not upon token recognition error. If you do not want to continue lexing then you do not want to continue parsing. Just throw an exception not under and Java will naturally toss you all the way out of the recognizers. If you want to continue lexing then you should not throw an exception to the parser--it has already requested a token. Keep lexing until you get a valid one. Just report errors and keep going, looking for a valid token.

Return a object from your input stream (usually a ). Do not fail/return upon lexing error; keep chewing on the characters until you get a good one; errors are not passed through to the parser.

Get the line number for the current position in the input stream.

Get the line number for the current position in the input stream. The first line in the input is line 1. The line number for the current position in the input stream, or 0 if the current token source does not track line numbers.

Get the index into the current line for the current position in the input stream.

Get the index into the current line for the current position in the input stream. The first character on a line has position 0. The line number for the current position in the input stream, or -1 if the current token source does not track character positions.

Get the from which this token source is currently providing tokens.

The associated with the current position in the input, or if no input stream is available for the token source.

Gets the name of the underlying input source.

Gets the name of the underlying input source. This method returns a non-null, non-empty string. If such a name is not known, this method returns .

Set the this token source should use for creating objects from the input.

The to use for creating tokens.

Gets the this token source is currently using for creating objects from the input.

The currently used by this token source.

An whose symbols are instances.

Get the instance associated with the value returned by LA(k) . This method has the same pre- and post-conditions as . In addition, when the preconditions of this method are met, the return value is non-null and the value of LT(k).getType()==LA(k) .

Gets the at the specified index in the stream. When the preconditions of this method are met, the return value is non-null.

The preconditions for this method are the same as the preconditions of . If the behavior of seek(index) is unspecified for the current state and given index , then the behavior of this method is also unspecified.

The symbol referred to by index differs from seek() only in the case of filtering streams where index lies before the end of the stream. Unlike seek() , this method does not adjust index to point to a non-ignored symbol.

if {code index} is less than 0 if the stream does not support retrieving the token at the specified index

Gets the underlying which provides tokens for this stream.

Return the text of all tokens within the specified . This method behaves like the following code (including potential exceptions for violating preconditions of , but may be optimized by the specific implementation.

            TokenStream stream = ...;
            String text = "";
            for (int i = interval.a; i <= interval.b; i++) {
            text += stream.get(i).getText();
            }

The interval of tokens within this stream to get text for. The text of all tokens within the specified interval in this stream. if is

Return the text of all tokens in the stream.

Return the text of all tokens in the stream. This method behaves like the following code, including potential exceptions from the calls to and , but may be optimized by the specific implementation.

            TokenStream stream = ...;
            String text = stream.getText(new Interval(0, stream.size()));

The text of all tokens in the stream.

Return the text of all tokens in the source interval of the specified context.

Return the text of all tokens in the source interval of the specified context. This method behaves like the following code, including potential exceptions from the call to , but may be optimized by the specific implementation.

If ctx.getSourceInterval() does not return a valid interval of tokens provided by this stream, the behavior is unspecified.

            TokenStream stream = ...;
            String text = stream.getText(ctx.getSourceInterval());

The context providing the source interval of tokens to get text for. The text of all tokens within the source interval of .

Return the text of all tokens in this stream between and (inclusive).

If the specified or token was not provided by this stream, or if the occurred before the token, the behavior is unspecified.

For streams which ensure that the method is accurate for all of its provided tokens, this method behaves like the following code. Other streams may implement this method in other ways provided the behavior is consistent with this at a high level.

            TokenStream stream = ...;
            String text = "";
            for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) {
            text += stream.get(i).getText();
            }

The first token in the interval to get text for. The last token in the interval to get text for (inclusive). The text of all tokens lying between the specified and tokens. if this stream does not support this method for the specified tokens

This interface provides information about the vocabulary used by a recognizer.

This interface provides information about the vocabulary used by a recognizer. Sam Harwell

Gets the string literal associated with a token type.

Gets the string literal associated with a token type. The string returned by this method, when not , can be used unaltered in a parser grammar to represent this token type.

The following table shows examples of lexer rules and the literal names assigned to the corresponding token types.

Rule	Literal Name	Java String Literal
THIS : 'this';	'this'	"'this'"
SQUOTE : '\'';	'\''	"'\\''"
ID : [A-Z]+;	n/a

The token type. The string literal associated with the specified token type, or if no string literal is associated with the type.

Gets the symbolic name associated with a token type.

Gets the symbolic name associated with a token type. The string returned by this method, when not , can be used unaltered in a parser grammar to represent this token type.

This method supports token types defined by any of the following methods:

Tokens created by lexer rules.
Tokens defined in a tokens block in a lexer or parser grammar.
The implicitly defined EOF token, which has the token type .

The following table shows examples of lexer rules and the literal names assigned to the corresponding token types.

Rule	Symbolic Name
THIS : 'this';	THIS
SQUOTE : '\'';	SQUOTE
ID : [A-Z]+;	ID

The token type. The symbolic name associated with the specified token type, or if no symbolic name is associated with the type.

Gets the display name of a token type.

ANTLR provides a default implementation of this method, but applications are free to override the behavior in any manner which makes sense for the application. The default implementation returns the first result from the following list which produces a non- result.

The result of
The result of
The result of

The token type. The display name of the token type, for use in error reporting or other user-visible messages which reference specific token types.

A lexer is recognizer that draws input symbols from a character stream.

A lexer is recognizer that draws input symbols from a character stream. lexer grammars result in a subclass of this object. A Lexer object uses simplified match() and error recovery mechanisms in the interest of speed.

How to create token objects

The goal of all lexer rules/methods is to create a token object.

The goal of all lexer rules/methods is to create a token object. This is an instance variable as multiple rules may collaborate to create a single token. nextToken will return this object after matching lexer rule(s). If you subclass to allow multiple token emissions, then set this to the last token to be matched or something nonnull so that the auto token emit mechanism will not emit another token.

What character index in the stream did the current token start at? Needed, for example, to get the text for current token.

What character index in the stream did the current token start at? Needed, for example, to get the text for current token. Set at the start of nextToken.

The line on which the first character of the token resides

The character position of first character within the line

Once we see EOF on char stream, next token will be EOF.

Once we see EOF on char stream, next token will be EOF. If you have DONE : EOF ; then you see DONE EOF.

The channel number for the current token

The token type for the current token

You can set the text for the current token to override what is in the input char buffer.

You can set the text for the current token to override what is in the input char buffer. Use setText() or can set this instance var.

Return a token from this source; i.e., match a token on the char stream.

Instruct the lexer to skip creating a token for current lexer rule and look for another token.

Instruct the lexer to skip creating a token for current lexer rule and look for another token. nextToken() knows to keep looking when a lexer rule finishes with token set to SKIP_TOKEN. Recall that if token==null at end of any token rule, it creates one for you and emits it.

Set the char stream and reset the lexer

By default does not support multiple emits per nextToken invocation for efficiency reasons.

By default does not support multiple emits per nextToken invocation for efficiency reasons. Subclass and override this method, nextToken, and getToken (to push tokens into a list and pull from that list rather than a single variable as this implementation does).

The standard method called to automatically emit a token at the outermost lexical rule.

The standard method called to automatically emit a token at the outermost lexical rule. The token object should point into the char buffer start..stop. If there is a text override in 'text', use that to set the token's text. Override this method to emit custom Token objects or provide a new factory.

What is the index of the current character of lookahead?

Return the text matched so far for the current token or any text override.

Set the complete text of this token; it wipes any previous changes to the text.

Override if emitting multiple tokens.

Return a list of all Token objects in input char stream.

Return a list of all Token objects in input char stream. Forces load of all tokens. Does not include EOF token.

Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out.

Lexers can normally match any char in it's vocabulary after matching a token, so do the easy thing and just kill a character and hope it all works out. You can instead use the rule invocation stack to do sophisticated error recovery if you are in a fragment rule.

Matching attempted at what input index?

Which configurations did we try at input.index() that couldn't match input.LA(1)?

Provides an implementation of as a wrapper around a list of objects.

If the final token in the list is an token, it will be used as the EOF token for every call to after the end of the list is reached. Otherwise, an EOF token will be created.

The wrapped collection of objects to return.

The name of the input source.

The name of the input source. If this value is , a call to should return the source name used to create the the next token in (or the previous token if the end of the input has been reached).

The index into of token to return by the next call to . The end of the input is indicated by this value being greater than or equal to the number of items in .

This field caches the EOF token for the token source.

This is the backing field for the property.

Constructs a new instance from the specified collection of objects.

The collection of objects to provide as a . NullPointerException if is

Constructs a new instance from the specified collection of objects and source name.

The collection of objects to provide as a . The name of the . If this value is , will attempt to infer the name from the next (or the previous token if the end of the input has been reached). NullPointerException if is

Sam Harwell if is .

A generic set of integers.

Adds the specified value to the current set.

Adds the specified value to the current set. the value to add IllegalStateException if the current set is read-only

Modify the current object to contain all elements that are present in itself, the specified , or both.

The set to add to the current set. A argument is treated as though it were an empty set. this (to support chained calls) IllegalStateException if the current set is read-only

Return a new object containing all elements that are present in both the current set and the specified set .

The set to intersect with the current set. A argument is treated as though it were an empty set. A new instance containing the intersection of the current set and . The value may be returned in place of an empty result set.

Return a new object containing all elements that are present in but not present in the current set. The following expressions are equivalent for input non-null instances x and y .

x.complement(y)
y.subtract(x)

The set to compare with the current set. A argument is treated as though it were an empty set. A new instance containing the elements present in but not present in the current set. The value may be returned in place of an empty result set.

Return a new object containing all elements that are present in the current set, the specified set , or both.

This method is similar to , but returns a new instance instead of modifying the current set.

The set to union with the current set. A argument is treated as though it were an empty set. A new instance containing the union of the current set and . The value may be returned in place of an empty result set.

Return a new object containing all elements that are present in the current set but not present in the input set . The following expressions are equivalent for input non-null instances x and y .

y.subtract(x)
x.complement(y)

The set to compare with the current set. A argument is treated as though it were an empty set. A new instance containing the elements present in elements but not present in the current set. The value may be returned in place of an empty result set.

Return the total number of elements represented by the current set.

Return the total number of elements represented by the current set. the total number of elements represented by the current set, regardless of the manner in which the elements are stored.

Returns if this set contains no elements.

if the current set contains no elements; otherwise, .

Returns the single value contained in the set, if is 1; otherwise, returns .

the single value contained in the set, if is 1; otherwise, returns .

Returns if the set contains the specified element.

The element to check for. if the set contains ; otherwise .

Removes the specified value from the current set.

Removes the specified value from the current set. If the current set does not contain the element, no changes are made. the value to remove IllegalStateException if the current set is read-only

Return a list containing the elements represented by the current set.

Return a list containing the elements represented by the current set. The list is returned in ascending numerical order. A list containing all element present in the current set, sorted in ascending numerical order.

An immutable inclusive interval a..b.

The start of the interval.

The end of the interval (inclusive).

Interval objects are used readonly so share all with the same single value a==b up to some max size.

Interval objects are used readonly so share all with the same single value a==b up to some max size. Use an array as a perfect hash. Return shared object for 0..INTERVAL_POOL_MAX_VALUE or a new Interval object with a..a in it. On Java.g4, 218623 IntervalSets have a..a (set with 1 element).

return number of elements between a and b inclusively.

return number of elements between a and b inclusively. x..x is length 1. if b < a, then length is 0. 9..10 has length 2.

Does this start completely before other? Disjoint

Does this start at or before other? Nondisjoint

Does this.a start after other.b? May or may not be disjoint

Does this start completely after other? Disjoint

Does this start after other? NonDisjoint

Are both ranges disjoint? I.e., no overlap?

Are two intervals adjacent such as 0..41 and 42..42?

Return the interval computed from combining this and other

Return the interval in common between this and o

Return the interval with elements from this not in ; must not be totally enclosed (properly contained) within this , which would result in two disjoint intervals instead of the single one returned by this method.

This class implements the backed by a sorted array of non-overlapping intervals. It is particularly efficient for representing large collections of numbers, where the majority of elements appear as part of a sequential range of numbers that are all part of the set. For example, the set { 1, 2, 3, 4, 7, 8 } may be represented as { [1, 4], [7, 8] }.

This class is able to represent sets containing any combination of values in the range to (inclusive).

The list of sorted, disjoint intervals.

Create a set with a single element, el.

Create a set with all ints within range [a..b] (inclusive)

Add a single element to the set.

Add a single element to the set. An isolated element is stored as a range el..el.

Add interval; i.e., add all integers from a to b to set.

Add interval; i.e., add all integers from a to b to set. If b<a, do nothing. Keep list in sorted order (by left range value). If overlap, combine ranges. For example, If this is {1..5, 10..20}, adding 6..7 yields {1..5, 6..7, 10..20}. Adding 4..8 yields {1..8, 10..20}.

combine all sets in the array returned the or'd value

Compute the set difference between two interval sets.

Compute the set difference between two interval sets. The specific operation is left - right . If either of the input sets is , it is treated as though it was an empty set.

Returns the maximum value contained in the set.

Returns the maximum value contained in the set. the maximum value contained in the set. If the set is empty, this method returns .

Returns the minimum value contained in the set.

Returns the minimum value contained in the set. the minimum value contained in the set. If the set is empty, this method returns .

Return a list of Interval objects.

Are two IntervalSets equal? Because all intervals are sorted and disjoint, equals is a simple linear walk over both lists to make sure they are the same.

Are two IntervalSets equal? Because all intervals are sorted and disjoint, equals is a simple linear walk over both lists to make sure they are the same. Interval.equals() is used by the List.equals() method to check the ranges. Sam Harwell

Initialize the hash using the default seed value.

Initialize the hash using the default seed value. the intermediate hash value

Initialize the hash using the specified .

the seed the intermediate hash value

Update the intermediate hash value for the next input .

the intermediate hash value the value to add to the current hash the updated intermediate hash value

Update the intermediate hash value for the next input .

the intermediate hash value the value to add to the current hash the updated intermediate hash value

Apply the final computation steps to the intermediate value to form the final result of the MurmurHash 3 hash function.

the intermediate hash value the number of integer values added to the hash the final hash result

Utility function to compute the hash code of an array using the MurmurHash algorithm.

Utility function to compute the hash code of an array using the MurmurHash algorithm. the array data the seed for the MurmurHash algorithm the hash code of the data

This exception is thrown to cancel a parsing operation.

This exception is thrown to cancel a parsing operation. This exception does not extend , allowing it to bypass the standard error recovery mechanisms. throws this exception in response to a parse error. Sam Harwell Sam Harwell

Convert array of strings to string→index map.

Convert array of strings to string→index map. Useful for converting rulenames to name→ruleindex map.

Indicates that the parser could not decide which of two or more paths to take based upon the remaining input.

Indicates that the parser could not decide which of two or more paths to take based upon the remaining input. It tracks the starting token of the offending input and also knows where the parser was in the various paths when the error. Reported by reportNoViableAlternative()

Which configurations did we try at input.index() that couldn't match input.LT(1)?

The token object at the start index; the input stream might not be buffering tokens so get a reference to it.

The token object at the start index; the input stream might not be buffering tokens so get a reference to it. (At the time the error occurred, of course the stream needs to keep a buffer all of the tokens but later we might not have access to those.)

This is all the parsing support code essentially; most of it is error recovery stuff.

This field maps from the serialized ATN string to the deserialized with bypass alternatives.

The error handling strategy for the parser.

The error handling strategy for the parser. The default value is a new instance of .

The input stream.

The object for the currently executing rule. This is always non-null during the parsing process.

Specifies whether or not the parser should construct a parse tree during the parsing process.

Specifies whether or not the parser should construct a parse tree during the parsing process. The default value is .

When (true) is called, a reference to the is stored here so it can be easily removed in a later call to (false) . The listener itself is implemented as a parser listener so this field is not directly used by other parser methods.

The list of listeners registered to receive events during the parse.

The number of syntax errors reported during parsing.

The number of syntax errors reported during parsing. This value is incremented each time is called.

reset the parser's state

Match current input symbol against . If the symbol type matches, and are called to complete the match process.

If the symbol type does not match, is called on the current error strategy to attempt recovery. If is and the token index of the symbol returned by is -1, the symbol is added to the parse tree by calling .

the token type to match the matched symbol if the current input symbol did not match and the error strategy could not recover from the mismatched symbol

Match current input symbol as a wildcard.

Match current input symbol as a wildcard. If the symbol type matches (i.e. has a value greater than 0), and are called to complete the match process.

the matched symbol if the current input symbol did not match a wildcard and the error strategy could not recover from the mismatched symbol

Track the objects during the parse and hook them up using the list so that it forms a parse tree. The returned from the start rule represents the root of the parse tree.

Note that if we are not building parse trees, rule contexts only point upwards. When a rule exits, it returns the context but that gets garbage collected if nobody holds a reference. It points upwards but nobody points at it.

When we build parse trees, we are adding all of these contexts to list. Contexts are then not candidates for garbage collection.

Gets whether or not a complete parse tree will be constructed while parsing.

Gets whether or not a complete parse tree will be constructed while parsing. This property is for a newly constructed parser. if a complete parse tree will be constructed while parsing, otherwise

Trim the internal lists of the parse tree during parsing to conserve memory.

Trim the internal lists of the parse tree during parsing to conserve memory. This property is set to by default for a newly constructed parser. to trim the capacity of the list to its size after a rule is parsed. if the list is trimmed using the default during the parse process.

Registers to receive events during the parsing process.

To support output-preserving grammar transformations (including but not limited to left-recursion removal, automated left-factoring, and optimized code generation), calls to listener methods during the parse may differ substantially from calls made by used after the parse is complete. In particular, rule entry and exit events may occur in a different order during the parse than after the parser. In addition, calls to certain rule entry methods may be omitted.

With the following specific exceptions, calls to listener events are deterministic, i.e. for identical input the calls to listener methods will be the same.

Alterations to the grammar used to generate code may change the behavior of the listener calls.
Alterations to the command line options passed to ANTLR 4 when generating the parser may change the behavior of the listener calls.
Changing the version of the ANTLR Tool used to generate the parser may change the behavior of the listener calls.

the listener to add if listener is

Remove from the list of parse listeners.

If is or has not been added as a parse listener, this method does nothing.

the listener to remove

Remove all parse listeners.

Notify any parse listeners of an enter rule event.

Notify any parse listeners of an exit rule event.

Gets the number of syntax errors reported during parsing.

Gets the number of syntax errors reported during parsing. This value is incremented each time is called.

The ATN with bypass alternatives is expensive to create so we create it lazily.

The ATN with bypass alternatives is expensive to create so we create it lazily. if the current parser does not implement the method.

The preferred method of getting a tree pattern.

The preferred method of getting a tree pattern. For example, here's a sample use:

            ParseTree t = parser.expr();
            ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr);
            ParseTreeMatch m = p.match(t);
            String id = m.get("ID");

The same as but specify a rather than trying to deduce it from this parser.

Match needs to return the current input symbol, which gets put into the label for the associated token ref; e.g., x=ID.

Consume and return the #getCurrentToken current symbol .

E.g., given the following input with A being the current lookahead symbol, this function moves the cursor to B and returns A .

            A B
            ^

If the parser is not in error recovery mode, the consumed symbol is added to the parse tree using , and is called on any parse listeners. If the parser is in error recovery mode, the consumed symbol is added to the parse tree using , and is called on any parse listeners.

Always called by generated parsers upon entry to a rule.

Always called by generated parsers upon entry to a rule. Access field get the current context.

Get the precedence level for the top-most precedence rule.

Get the precedence level for the top-most precedence rule. The precedence level for the top-most precedence rule, or -1 if the parser context is not nested within a precedence rule.

Like but for recursive rules.

Checks whether or not can follow the current state in the ATN. The behavior of this method is equivalent to the following, but is implemented such that the complete context-sensitive follow set does not need to be explicitly constructed.

             return getExpectedTokens().contains(symbol);

the symbol type to check if can follow the current state in the ATN, otherwise .

Computes the set of input symbols which could follow the current parser state and context, as given by and , respectively.

Get a rule's index (i.e., RULE_ruleName field) or -1 if not found.

Return List<String> of the rule names in your parser instance leading up to a call to the current rule.

Return List<String> of the rule names in your parser instance leading up to a call to the current rule. You could override if you want more details such as the file/line info of where in the ATN a rule is invoked. This is very useful for error messages.

For debugging and other purposes.

For debugging and other purposes. 4.3

During a parse is sometimes useful to listen in on the rule entry and exit events as well as token matches.

During a parse is sometimes useful to listen in on the rule entry and exit events as well as token matches. This is for quick and dirty debugging.

A parser simulator that mimics what ANTLR's generated parser code does.

A parser simulator that mimics what ANTLR's generated parser code does. A ParserATNSimulator is used to make predictions via adaptivePredict but this class moves a pointer through the ATN to simulate parsing. ParserATNSimulator just makes us efficient rather than having to backtrack, for example. This properly creates parse trees even for left recursive rules. We rely on the left recursive rule invocation and special predicate transitions to make left recursive rules work. See TestParserInterpreter for examples.

Begin parsing at startRuleIndex

A rule invocation record for parsing.

A rule invocation record for parsing. Contains all of the information about the current rule not stored in the RuleContext. It handles parse tree children list, Any ATN state tracing, and the default values available for rule indications: start, stop, rule index, current alt number, current ATN state. Subclasses made for each rule and grammar track the parameters, return values, locals, and labels specific to that rule. These are the objects that are returned from rules. Note text is not an actual field of a rule return value; it is computed from start and stop using the input stream's toString() method. I could add a ctor to this so that we can pass in and store the input stream, but I'm not sure we want to do that. It would seem to be undefined to get the .text property anyway if the rule matches tokens from multiple input streams. I do not use getters for fields of objects that are used simply to group values such as this aggregate. The getters/setters are there to satisfy the superclass interface.

If we are debugging or building a parse tree for a visitor, we need to track all of the tokens and rule invocations associated with this rule's context.

If we are debugging or building a parse tree for a visitor, we need to track all of the tokens and rule invocations associated with this rule's context. This is empty for parsing w/o tree constr. operation because we don't the need to track the details about how we parse this rule.

For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule.

For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule. This list indicates the sequence of ATN nodes used to match the elements of the children list. This list does not include ATN nodes and other rules used to match rule invocations. It traces the rule invocation node itself but nothing inside that other rule's ATN submachine. There is NOT a one-to-one correspondence between the children and states list. There are typically many nodes in the ATN traversed for each element in the children list. For example, for a rule invocation there is the invoking state and the following state. The parser setState() method updates field s and adds it to this list if we are debugging/tracing. This does not trace states visited during prediction.

For debugging/tracing purposes, we want to track all of the nodes in the ATN traversed by the parser for a particular rule.

The exception that forced this rule to return.

The exception that forced this rule to return. If the rule successfully completed, this is .

COPY a ctx (I'm deliberately not using copy constructor) to avoid confusion with creating node with parent. Does not copy children. This is used in the generated parser code to flip a generic XContext node for rule X to a YContext for alt label Y. In that sense, it is not really a generic copy function. If we do an error sync() at start of a rule, we might add error nodes to the generic XContext so this function must copy those nodes to the YContext as well else they are lost!

Does not set parent link; other add methods do that

Used by enterOuterAlt to toss out a RuleContext previously added as we entered a rule.

Used by enterOuterAlt to toss out a RuleContext previously added as we entered a rule. If we have # label, we will need to remove generic ruleContext object.

Used for rule context info debugging during parse-time, not so much for ATN debugging

This implementation of dispatches all calls to a collection of delegate listeners. This reduces the effort required to support multiple listeners.

Sam Harwell Sam Harwell

The root of the ANTLR exception hierarchy.

The root of the ANTLR exception hierarchy. In general, ANTLR tracks just 3 kinds of errors: prediction errors, failed predicate errors, and mismatched input errors. In each case, the parser knows where it is in the input, where it is in the ATN, the rule invocation stack, and what kind of problem occurred.

The where this exception originated.

The current when an error occurred. Since not all streams support accessing symbols by index, we have to track the instance itself.

Get the ATN state number the parser was in at the time the error occurred.

Get the ATN state number the parser was in at the time the error occurred. For and exceptions, this is the number. For others, it is the state whose outgoing edge we couldn't match.

If the state number is not known, this method returns -1.

Gets the set of input symbols which could potentially follow the previously matched symbol at the time this exception was thrown.

If the set of expected tokens is not known and could not be computed, this method returns .

The set of token types that could potentially follow the current state in the ATN, or if the information is not available.

Gets the at the time this exception was thrown.

If the context is not available, this method returns .

The at the time this exception was thrown. If the context is not available, this method returns .

Gets the input stream which is the symbol source for the recognizer where this exception was thrown.

If the input stream is not available, this method returns .

The input stream which is the symbol source for the recognizer where this exception was thrown, or if the stream is not available.

Gets the where this exception occurred.

If the recognizer is not available, this method returns .

The recognizer where this exception occurred, or if the recognizer is not available.

Used to print out token names like ID during debugging and error reporting.

Used to print out token names like ID during debugging and error reporting. The generated parsers implement a method that overrides this to point to their String[] tokenNames.

Get the vocabulary used by the recognizer.

Get the vocabulary used by the recognizer. A instance providing information about the vocabulary used by the grammar.

Get a map from token names to token types.

Used for XPath and tree pattern compilation.

Get a map from rule names to rule indexes.

Used for XPath and tree pattern compilation.

If this recognizer was generated, it will have a serialized ATN representation of the grammar.

For interpreters, we don't know their serialized ATN despite having created the interpreter from it.

For debugging and other purposes, might want the grammar name.

For debugging and other purposes, might want the grammar name. Have ANTLR generate an implementation for this method.

Get the used by the recognizer for prediction.

The used by the recognizer for prediction.

Get the ATN interpreter used by the recognizer for prediction.

Get the ATN interpreter used by the recognizer for prediction. The ATN interpreter used by the recognizer for prediction.

Set the ATN interpreter used by the recognizer for prediction.

Set the ATN interpreter used by the recognizer for prediction. The ATN interpreter used by the recognizer for prediction.

If profiling during the parse/lex, this will return DecisionInfo records for each decision in recognizer in a ParseInfo object.

If profiling during the parse/lex, this will return DecisionInfo records for each decision in recognizer in a ParseInfo object. 4.3

What is the error header, normally line/character position information?

How should a token be displayed in an error message? The default is to display just the text, but during development you might want to have a lot of information spit out.

Indicate that the recognizer has changed internal state that is consistent with the ATN state passed in.

Indicate that the recognizer has changed internal state that is consistent with the ATN state passed in. This way we always know where we are in the ATN as the parser goes along. The rule context objects form a stack that lets us see the stack of invoking rules. Combine this and we have complete ATN configuration information.

A rule context is a record of a single rule invocation.

A rule context is a record of a single rule invocation. It knows which context invoked it, if any. If there is no parent context, then naturally the invoking state is not valid. The parent link provides a chain upwards from the current rule invocation to the root of the invocation tree, forming a stack. We actually carry no information about the rule associated with this context (except when parsing). We keep only the state number of the invoking state from the ATN submachine that invoked this. Contrast this with the s pointer inside ParserRuleContext that tracks the current state being "executed" for the current rule. The parent contexts are useful for computing lookahead sets and getting error information. These objects are used during parsing and prediction. For the special case of parsers, we use the subclass ParserRuleContext.

What context invoked this rule?

What state invoked the rule associated with this context? The "return address" is the followState of invokingState If parent is null, this should be -1.

A context is empty if there is no invoking state; meaning nobody call current context.

Return the combined text of all child nodes.

Return the combined text of all child nodes. This method only considers tokens which have been added to the parse tree.

Since tokens on hidden channels (e.g. whitespace or comments) are not added to the parse trees, they will not appear in the output of this method.

Print out a whole tree, not just a node, in LISP format (root child1 ..

Print out a whole tree, not just a node, in LISP format (root child1 .. childN). Print just a node if this is a leaf. We have to know the recognizer so we can get rule names.

Print out a whole tree, not just a node, in LISP format (root child1 ..

Print out a whole tree, not just a node, in LISP format (root child1 .. childN). Print just a node if this is a leaf.

Useful for rewriting out a buffered input token stream after doing some augmentation or other manipulations on it.

You can insert stuff, replace, and delete chunks. Note that the operations are done lazily--only if you convert the buffer to a with . This is very efficient because you are not moving data around all the time. As the buffer of tokens is converted to strings, the method(s) scan the input token stream and check to see if there is an operation at the current index. If so, the operation is done and then normal rendering continues on the buffer. This is like having multiple Turing machine instruction streams (programs) operating on a single input tape. :)

This rewriter makes no modifications to the token stream. It does not ask the stream to fill itself up nor does it advance the input cursor. The token stream will return the same value before and after any call.

The rewriter only works on tokens that you have in the buffer and ignores the current input cursor. If you are buffering tokens on-demand, calling halfway through the input will only do rewrites for those tokens in the first half of the file.

Since the operations are done lazily at -time, operations do not screw up the token index values. That is, an insert operation at token index i does not change the index values for tokens i +1..n-1.

Because operations never actually alter the buffer, you may always get the original token stream back without undoing anything. Since the instructions are queued up, you can easily simulate transactions and roll back any changes if there is an error just by removing instructions. For example,

            CharStream input = new ANTLRFileStream("input");
            TLexer lex = new TLexer(input);
            CommonTokenStream tokens = new CommonTokenStream(lex);
            T parser = new T(tokens);
            TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens);
            parser.startRule();

Then in the rules, you can execute (assuming rewriter is visible):

            Token t,u;
            ...
            rewriter.insertAfter(t, "text to put after t");}
            rewriter.insertAfter(u, "text after u");}
            System.out.println(tokens.toString());

You can also have multiple "instruction streams" and get multiple rewrites from a single pass over the input. Just name the instruction streams and use that name again when printing the buffer. This could be useful for generating a C file and also its header file--all from the same buffer:

            tokens.insertAfter("pass1", t, "text to put after t");}
            tokens.insertAfter("pass2", u, "text after u");}
            System.out.println(tokens.toString("pass1"));
            System.out.println(tokens.toString("pass2"));

If you don't use named rewrite streams, a "default" stream is used as the first example shows.

What index into rewrites List are we?

Token buffer index.

Execute the rewrite operation by possibly adding to the buffer.

Execute the rewrite operation by possibly adding to the buffer. Return the index of the next token to operate on.

I'm going to try replacing range from x..y with (y-x)+1 ReplaceOp instructions.

Our source stream

You may have multiple, named streams of rewrite operations.

You may have multiple, named streams of rewrite operations. I'm calling these things "programs." Maps String (name) → rewrite (List)

Map String (program name) → Integer index

Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream.

Rollback the instruction stream for a program so that the indicated instruction (via instructionIndex) is no longer in the stream. UNTESTED!

Reset the program so that no instructions exist

Return the text from the original tokens altered per the instructions given to this rewriter.

Return the text associated with the tokens in the interval from the original token stream but with the alterations given to this rewriter.

Return the text associated with the tokens in the interval from the original token stream but with the alterations given to this rewriter. The interval refers to the indexes in the original token stream. We do not alter the token stream in any way, so the indexes and intervals are still consistent. Includes any operations done to the first and last token in the interval. So, if you did an insertBefore on the first token, you would get that insertion. The same is true if you do an insertAfter the stop token.

We need to combine operations and report invalid operations (like overlapping replaces that are not completed nested).

We need to combine operations and report invalid operations (like overlapping replaces that are not completed nested). Inserts to same index need to be combined etc... Here are the cases: I.i.u I.j.v leave alone, nonoverlapping I.i.u I.i.v combine: Iivu R.i-j.u R.x-y.v | i-j in x-y delete first R R.i-j.u R.i-j.v delete first R R.i-j.u R.x-y.v | x-y in i-j ERROR R.i-j.u R.x-y.v | boundaries overlap ERROR Delete special case of replace (text==null): D.i-j.u D.x-y.v | boundaries overlap combine to max(min)..max(right) I.i.u R.x-y.v | i in (x+1)-y delete I (since insert before we're not deleting i) I.i.u R.x-y.v | i not in (x+1)-y leave alone, nonoverlapping R.x-y.v I.i.u | i in x-y ERROR R.x-y.v I.x.u R.x-y.uv (combine, delete I) R.x-y.v I.i.u | i not in x-y leave alone, nonoverlapping I.i.u = insert u before op @ index i R.x-y.u = replace x-y indexed tokens with u First we need to examine replaces. For any replace op: 1. wipe out any insertions before op within that range. 2. Drop any replace op before that is contained completely within that range. 3. Throw exception upon boundary overlap with any previous replace. Then we can deal with inserts: 1. for any inserts to same index, combine even if not adjacent. 2. for any prior replace with same left boundary, combine this insert with replace and delete this replace. 3. throw exception if index in same range as previous replace Don't actually delete; make op null in list. Easier to walk list. Later we can throw as we add to index → op map. Note that I.2 R.2-2 will wipe out I.2 even though, technically, the inserted stuff would be before the replace range. But, if you add tokens in front of a method body '{' and then delete the method body, I think the stuff before the '{' you added should disappear too. Return a map from token index to operation.

Get all operations before an index of a particular kind

The default implementation calls on the specified tree.

The default implementation initializes the aggregate result to defaultResult() . Before visiting each child, it calls shouldVisitNextChild ; if the result is no more children are visited and the current aggregate result is returned. After visiting a child, the aggregate result is updated by calling aggregateResult with the previous aggregate result and the result of visiting the child.

The default implementation is not safe for use in visitors that modify the tree structure. Visitors that modify the tree should override this method to behave properly in respect to the specific algorithm in use.

The default implementation returns the result of defaultResult .

Gets the default value returned by visitor methods.

Gets the default value returned by visitor methods. This value is returned by the default implementations of visitTerminal , visitErrorNode . The default implementation of visitChildren initializes its aggregate result to this value.

The base implementation returns .

The default value returned by visitor methods.

Aggregates the results of visiting multiple children of a node.

Aggregates the results of visiting multiple children of a node. After either all children are visited or returns , the aggregate value is returned as the result of .

The default implementation returns , meaning will return the result of the last child visited (or return the initial value if the node has no children).

The previous aggregate value. In the default implementation, the aggregate value is initialized to , which is passed as the argument to this method after the first child node is visited. The result of the immediately preceeding call to visit a child node. The updated aggregate result.

This method is called after visiting each child in . This method is first called before the first child is visited; at that point will be the initial value (in the default implementation, the initial value is returned by a call to . This method is not called after the last child is visited.

The default implementation always returns , indicating that visitChildren should only return after all children are visited. One reason to override this method is to provide a "short circuit" evaluation option for situations where the result of visiting a single child has the potential to determine the result of the visit operation as a whole.

The whose children are currently being visited. The current aggregate result of the children visited to the current point. to continue visiting children. Otherwise return to stop visiting children and immediately return the current aggregate result from .

Represents a token that was consumed during resynchronization rather than during a valid match operation.

Represents a token that was consumed during resynchronization rather than during a valid match operation. For example, we will create this kind of a node during single token insertion and deletion as well as during "consume until error recovery set" upon no viable alternative exceptions.

An interface to access the tree of objects created during a parse that makes the data structure look like a simple parse tree. This node represents both internal nodes, rule invocations, and leaf nodes, token matches.

The payload is either a or a object.

The needs a double dispatch method.

Return the combined text of all leaf nodes.

Return the combined text of all leaf nodes. Does not get any off-channel tokens (if any) so won't return whitespace and comments if they are sent to parser on hidden channel.

Specialize toStringTree so that it can print out more information based upon the parser.

This interface defines the basic notion of a parse tree visitor.

This interface defines the basic notion of a parse tree visitor. Generated visitors implement this interface and the XVisitor interface for grammar X . Sam Harwell

Visit a parse tree, and return a user-defined result of the operation.

Visit a parse tree, and return a user-defined result of the operation. The to visit. The result of visiting the parse tree.

Visit the children of a node, and return a user-defined result of the operation.

Visit the children of a node, and return a user-defined result of the operation. The whose children should be visited. The result of visiting the children of the node.

Visit a terminal node, and return a user-defined result of the operation.

Visit a terminal node, and return a user-defined result of the operation. The to visit. The result of visiting the node.

Visit an error node, and return a user-defined result of the operation.

Visit an error node, and return a user-defined result of the operation. The to visit. The result of visiting the node.

A tree that knows about an interval in a token stream is some kind of syntax tree.

A tree that knows about an interval in a token stream is some kind of syntax tree. Subinterfaces distinguish between parse trees and other kinds of syntax trees we might want to create.

Return an indicating the index in the of the first and last token associated with this subtree. If this node is a leaf, then the interval represents a single token.

If source interval is unknown, this returns .

The basic notion of a tree has a parent, a payload, and a list of children.

The basic notion of a tree has a parent, a payload, and a list of children. It is the most abstract interface for all the trees used by ANTLR.

The parent of this node.

The parent of this node. If the return value is null, then this node is the root of the tree.

This method returns whatever object represents the data at this note.

This method returns whatever object represents the data at this note. For example, for parse trees, the payload can be a representing a leaf node or a object representing a rule invocation. For abstract syntax trees (ASTs), this is a object.

If there are children, get the th value indexed from 0.

How many children are there? If there is none, then this node represents a leaf node.

Print out a whole tree, not just a node, in LISP format (root child1 .. childN) . Print just a node if this is a leaf.

Associate a property with a parse tree node.

Associate a property with a parse tree node. Useful with parse tree listeners that need to associate values with particular tree nodes, kind of like specifying a return value for the listener event method that visited a particular node. Example:

            ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>();
            values.put(tree, 36);
            int x = values.get(tree);
            values.removeFrom(tree);

You would make one decl (values here) in the listener and use lots of times in your event methods.

Performs a walk on the given parse tree starting at the root and going down recursively with depth-first search. On each node, is called before recursively walking down into child nodes, then is called after the recursive call to wind up.

The listener used by the walker to process grammar rules The parse tree to be walked on

Enters a grammar rule by first triggering the generic event then by triggering the event specific to the given parse tree node

The listener responding to the trigger events The grammar rule containing the rule context

Exits a grammar rule by first triggering the event specific to the given parse tree node then by triggering the generic event

The listener responding to the trigger events The grammar rule containing the rule context

A chunk is either a token tag, a rule tag, or a span of literal text within a tree pattern.

The method returns a list of chunks in preparation for creating a token stream by . From there, we get a parse tree from with . These chunks are converted to , , or the regular tokens of the text surrounding the tags.

Represents the result of matching a against a tree pattern.

This is the backing field for .

Constructs a new instance of from the specified parse tree and pattern.

The parse tree to match against the pattern. The parse tree pattern. A mapping from label names to collections of objects located by the tree pattern matching process. The first node which failed to match the tree pattern during the matching process. IllegalArgumentException if is IllegalArgumentException if is IllegalArgumentException if is

Get the last node associated with a specific .

For example, for pattern <id:ID> , get("id") returns the node matched for that ID . If more than one node matched the specified label, only the last is returned. If there is no node associated with the label, this returns .

Pattern tags like <ID> and <expr> without labels are considered to be labeled with ID and expr , respectively.

The label to check. The last to match a tag with the specified label, or if no parse tree matched a tag with the label.

Return all nodes matching a rule or token tag with the specified label.

If the is the name of a parser rule or token in the grammar, the resulting list will contain both the parse trees matching rule or tags explicitly labeled with the label and the complete set of parse trees matching the labeled and unlabeled tags in the pattern for the parser rule or token. For example, if is "foo" , the result will contain all of the following.

Parse tree nodes matching tags of the form <foo:anyRuleName> and <foo:AnyTokenName> .
Parse tree nodes matching tags of the form <anyLabel:foo> .
Parse tree nodes matching tags of the form <foo> .

The label. A collection of all nodes matching tags with the specified . If no nodes matched the label, an empty list is returned.

Return a mapping from label → [list of nodes].

The map includes special entries corresponding to the names of rules and tokens referenced in tags in the original pattern. For additional information, see the description of .

A mapping from labels to parse tree nodes. If the parse tree pattern did not contain any rule or token tags, this map will be empty.

Get the node at which we first detected a mismatch.

Get the node at which we first detected a mismatch. the node at which we first detected a mismatch, or if the match was successful.

Gets a value indicating whether the match operation succeeded.

Gets a value indicating whether the match operation succeeded. if the match operation succeeded; otherwise, .

Get the tree pattern we are matching against.

Get the tree pattern we are matching against. The tree pattern we are matching against.

Get the parse tree we are trying to match to a pattern.

Get the parse tree we are trying to match to a pattern. The we are trying to match to a pattern.

A pattern like <ID> = <expr>; converted to a by .

This is the backing field for .

Construct a new instance of the class.

The which created this tree pattern. The tree pattern in concrete syntax form. The parser rule which serves as the root of the tree pattern. The tree pattern in form.

Match a specific parse tree against this tree pattern.

Match a specific parse tree against this tree pattern. The parse tree to match against this tree pattern. A object describing the result of the match operation. The method can be used to determine whether or not the match was successful.

Determine whether or not a parse tree matches this tree pattern.

Determine whether or not a parse tree matches this tree pattern. The parse tree to match against this tree pattern. if is a match for the current tree pattern; otherwise, .

Find all nodes using XPath and then try to match those subtrees against this tree pattern.

Find all nodes using XPath and then try to match those subtrees against this tree pattern. The to match against this pattern. An expression matching the nodes A collection of objects describing the successful matches. Unsuccessful matches are omitted from the result, regardless of the reason for the failure.

Get the which created this tree pattern.

The which created this tree pattern.

Get the tree pattern in concrete syntax form.

Get the tree pattern in concrete syntax form. The tree pattern in concrete syntax form.

Get the parser rule which serves as the outermost rule for the tree pattern.

Get the parser rule which serves as the outermost rule for the tree pattern. The parser rule which serves as the outermost rule for the tree pattern.

Get the tree pattern as a . The rule and token tags from the pattern are present in the parse tree as terminal nodes with a symbol of type or .

The tree pattern as a .

A tree pattern matching mechanism for ANTLR s.

Patterns are strings of source input text with special tags representing token or rule references such as:

<ID> = <expr>;

Given a pattern start rule such as statement , this object constructs a with placeholders for the ID and expr subtree. Then the routines can compare an actual from a parse with this pattern. Tag <ID> matches any ID token and tag <expr> references the result of the expr rule (generally an instance of ExprContext .

Pattern x = 0; is a similar pattern that matches the same pattern except that it requires the identifier to be x and the expression to be 0 .

The routines return or based upon a match for the tree rooted at the parameter sent in. The routines return a object that contains the parse tree, the parse tree pattern, and a map from tag name to matched nodes (more below). A subtree that fails to match, returns with set to the first tree node that did not match.

For efficiency, you can compile a tree pattern in string form to a object.

See TestParseTreeMatcher for lots of examples. has two static helper methods: and that are easy to use but not super efficient because they create new objects each time and have to compile the pattern in string form before using it.

The lexer and parser that you pass into the constructor are used to parse the pattern in string form. The lexer converts the <ID> = <expr>; into a sequence of four tokens (assuming lexer throws out whitespace or puts it on a hidden channel). Be aware that the input stream is reset for the lexer (but not the parser; a is created to parse the input.). Any user-defined fields you have put into the lexer might get changed when this mechanism asks it to scan the pattern string.

Normally a parser does not accept token <expr> as a valid expr but, from the parser passed in, we create a special version of the underlying grammar representation (an ) that allows imaginary tokens representing rules ( <expr> ) to match entire rules. We call these bypass alternatives.

Delimiters are < and > , with \ as the escape string by default, but you can set them to whatever you want using . You must escape both start and stop strings \< and \> .

This is the backing field for .

Constructs a or from a and object. The lexer input stream is altered for tokenizing the tree patterns. The parser is used as a convenient mechanism to get the grammar name, plus token, rule names.

Set the delimiters used for marking rule and token tags within concrete syntax used by the tree pattern parser.

Set the delimiters used for marking rule and token tags within concrete syntax used by the tree pattern parser. The start delimiter. The stop delimiter. The escape sequence to use for escaping a start or stop delimiter. IllegalArgumentException if is or empty. IllegalArgumentException if is or empty.

Does matched as rule match ?

Does matched as rule patternRuleIndex match tree? Pass in a compiled pattern instead of a string representation of a tree pattern.

Compare matched as rule against and return a object that contains the matched elements, or the node at which the match failed.

Compare matched against and return a object that contains the matched elements, or the node at which the match failed. Pass in a compiled pattern instead of a string representation of a tree pattern.

For repeated use of a tree pattern, compile it to a using this method.

Used to convert the tree pattern string into a series of tokens.

Used to convert the tree pattern string into a series of tokens. The input stream is reset.

Used to collect to the grammar file name, token names, rule names for used to parse the pattern into a parse tree.

Recursively walk against , filling match. .

the first node encountered in which does not match a corresponding node in , or if the match was successful. The specific node returned depends on the matching algorithm used by the implementation, and may be overridden.

Is (expr <expr>) subtree?

Split <ID> = <e:expr> ; into 4 chunks for tokenizing by .

A object representing an entire subtree matched by a parser rule; e.g., <expr> . These tokens are created for chunks where the tag corresponds to a parser rule.

This is the backing field for .

The token type for the current token.

The token type for the current token. This is the token type assigned to the bypass alternative for the rule during ATN deserialization.

This is the backing field for .

Constructs a new instance of with the specified rule name and bypass token type and no label.

The name of the parser rule this rule tag matches. The bypass token type assigned to the parser rule. IllegalArgumentException if is or empty.

Constructs a new instance of with the specified rule name, bypass token type, and label.

The name of the parser rule this rule tag matches. The bypass token type assigned to the parser rule. The label associated with the rule tag, or if the rule tag is unlabeled. IllegalArgumentException if is or empty.

Gets the name of the rule associated with this rule tag.

Gets the name of the rule associated with this rule tag. The name of the parser rule associated with this rule tag.

Gets the label associated with the rule tag.

Gets the label associated with the rule tag. The name of the label associated with the rule tag, or if this is an unlabeled rule tag.

Rule tag tokens are always placed on the .

This method returns the rule tag formatted with < and > delimiters.

Rule tag tokens have types assigned according to the rule bypass transitions created during ATN deserialization.

The implementation for always returns 0.

The implementation for always returns -1.

The implementation for always returns .

The implementation for returns a string of the form ruleName:bypassTokenType .

Represents a placeholder tag in a tree pattern.

Represents a placeholder tag in a tree pattern. A tag can have any of the following forms.

expr : An unlabeled placeholder for a parser rule expr .
ID : An unlabeled placeholder for a token of type ID .
e:expr : A labeled placeholder for a parser rule expr .
id:ID : A labeled placeholder for a token of type ID .

This class does not perform any validation on the tag or label names aside from ensuring that the tag is a non-null, non-empty string.

This is the backing field for .

Construct a new instance of using the specified tag and no label.

The tag, which should be the name of a parser rule or token type. IllegalArgumentException if is or empty.

Construct a new instance of using the specified label and tag.

The label for the tag. If this is , the represents an unlabeled tag. The tag, which should be the name of a parser rule or token type. IllegalArgumentException if is or empty.

Get the tag for this chunk.

Get the tag for this chunk. The tag for the chunk.

Get the label, if any, assigned to this chunk.

Get the label, if any, assigned to this chunk. The label assigned to this chunk, or if no label is assigned to the chunk.

This method returns a text representation of the tag chunk.

This method returns a text representation of the tag chunk. Labeled tags are returned in the form label:tag , and unlabeled tags are returned as just the tag name.

Represents a span of raw text (concrete syntax) between tags in a tree pattern string.

This is the backing field for .

Constructs a new instance of with the specified text.

The text of this chunk. IllegalArgumentException if is .

Gets the raw text of this chunk.

Gets the raw text of this chunk. The text of the chunk.

The implementation for returns the result of in single quotes.

A object representing a token of a particular type; e.g., <ID> . These tokens are created for chunks where the tag corresponds to a lexer rule or token type.

This is the backing field for .

Constructs a new instance of for an unlabeled tag with the specified token name and type.

The token name. The token type.

Constructs a new instance of with the specified token name, type, and label.

The token name. The token type. The label associated with the token tag, or if the token tag is unlabeled.

Gets the token name.

Gets the token name. The token name.

Gets the label associated with the rule tag.

Gets the label associated with the rule tag. The name of the label associated with the rule tag, or if this is an unlabeled rule tag.

The implementation for returns the token tag formatted with < and > delimiters.

The implementation for returns a string of the form tokenName:type .

A set of utility routines useful for all kinds of ANTLR trees.

Print out a whole tree in LISP form.

Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately.

Print out a whole tree in LISP form.

Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately.

Print out a whole tree in LISP form.

Print out a whole tree in LISP form. is used on the node payloads to get the text for the nodes. Detect parse trees and extract data appropriately.

Return ordered list of all children of this node

Return a list of all ancestors of this node.

Return a list of all ancestors of this node. The first node of list is the root and the last is the parent of this node.

Represent a subset of XPath XML path syntax for use in identifying nodes in parse trees.

Split path into words and separators / and // via ANTLR itself then walk path elements from left to right. At each separator-word pair, find set of nodes. Next stage uses those as work list.

The basic interface is ParseTree.findAll (tree, pathString, parser) . But that is just shorthand for:

            
            p = new
            XPath
            (parser, pathString);
            return p.
            evaluate
            (tree);

See org.antlr.v4.test.TestXPath for descriptions. In short, this allows operators:

/: root
//: anywhere
!: invert; this must appear directly after root or anywhere operator

and path elements:

ID: token name
'string': any string literal token from the grammar
expr: rule name
*: wildcard matching any node

Whitespace is not allowed.

Convert word like * or ID or expr to a path element. is if // precedes the word.

Return a list of all nodes starting at as root that satisfy the path. The root / is relative to the node passed to .

Construct element like /ID or ID or /* etc... op is null if just node

Given tree rooted at return all nodes matched by this path element.

Either ID at start of path or ...//ID in middle of path.

Do not buffer up the entire char stream.

Do not buffer up the entire char stream. It does keep a small buffer for efficiency and also buffers while a mark exists (set by the lookahead prediction in parser). "Unbuffered" here refers to fact that it doesn't buffer all data, not that's it's on demand loading of char.

A moving window buffer of the data being scanned.

A moving window buffer of the data being scanned. While there's a marker, we keep adding to buffer. Otherwise, consume() resets so we start filling at index 0 again.

The number of characters currently in data .

This is not the buffer capacity, that's data.length .

0..n-1 index into data of next character.

The LA(1) character is data[p] . If p == n , we are out of buffered characters.

Count up with mark() and down with release() . When we release() the last mark, numMarkers reaches 0 and we reset the buffer. Copy data[p]..data[n-1] to data[0]..data[(n-1)-p] .

This is the LA(-1) character for the current position.

When numMarkers > 0 , this is the LA(-1) character for the first character in data . Otherwise, this is unspecified.

Absolute character index.

Absolute character index. It's the index of the character about to be read via LA(1) . Goes from 0 to the number of characters in the entire stream, although the stream size is unknown before the end is reached.

The name or source of this char stream.

Useful for subclasses that pull char from other than this.input.

Make sure we have 'need' elements from current position p . Last valid p index is data.length-1 . p+need-1 is the char index 'need' elements ahead. If we need 1 element, (p+1-1)==p must be less than data.length .

Add characters to the buffer. Returns the number of characters actually added to the buffer. If the return value is less than , then EOF was reached before characters could be added.

Override to provide different source of characters than input .

Return a marker that we can release later.

The specific marker value used for this class allows for some level of protection against misuse where seek() is called on a mark or release() is called in the wrong order.

Decrement number of markers, resetting buffer if we hit 0.

Seek to absolute character index, which might not be in the current sliding window.

Seek to absolute character index, which might not be in the current sliding window. Move p to index-bufferStartIndex .

A moving window buffer of the data being scanned.

A moving window buffer of the data being scanned. While there's a marker, we keep adding to buffer. Otherwise, consume() resets so we start filling at index 0 again.

The number of tokens currently in tokens .

This is not the buffer capacity, that's tokens.length .

0..n-1 index into tokens of next token.

The LT(1) token is tokens[p] . If p == n , we are out of buffered tokens.

Count up with mark() and down with release() . When we release() the last mark, numMarkers reaches 0 and we reset the buffer. Copy tokens[p]..tokens[n-1] to tokens[0]..tokens[(n-1)-p] .

This is the LT(-1) token for the current position.

When numMarkers > 0 , this is the LT(-1) token for the first token in . Otherwise, this is .

Absolute token index.

Absolute token index. It's the index of the token about to be read via LT(1) . Goes from 0 to the number of tokens in the entire stream, although the stream size is unknown before the end is reached.

This value is used to set the token indexes if the stream provides tokens that implement .

Make sure we have 'need' elements from current position p . Last valid p index is tokens.length-1 . p+need-1 is the tokens index 'need' elements ahead. If we need 1 element, (p+1-1)==p must be less than tokens.length .

Add elements to the buffer. Returns the number of tokens actually added to the buffer. If the return value is less than , then EOF was reached before tokens could be added.

Return a marker that we can release later.

The specific marker value used for this class allows for some level of protection against misuse where seek() is called on a mark or release() is called in the wrong order.

This class provides a default implementation of the interface.

Sam Harwell

Gets an empty instance.

No literal or symbol names are assigned to token types, so returns the numeric value for all tokens except .

Constructs a new instance of from the specified literal and symbolic token names.

The literal names assigned to tokens, or if no literal names are assigned. The symbolic names assigned to tokens, or if no symbolic names are assigned.

Constructs a new instance of from the specified literal, symbolic, and display token names.

The literal names assigned to tokens, or if no literal names are assigned. The symbolic names assigned to tokens, or if no symbolic names are assigned. The display names assigned to tokens, or to use the values in and as the source of display names, as described in .

Returns the highest token type value. It can be used to iterate from zero to that number, inclusively, thus querying all stored entries.