Conversation
lapp0
commented
Jul 24, 2024
| lexer_conf.terminals = [ | ||
| terminals_by_name[n] for n in accepts if n in terminals_by_name | ||
| ] | ||
| if not lexer_conf.terminals: |
Owner
Author
There was a problem hiding this comment.
Note: Enables returning EOS
| token_history=lexer_state.last_token and [lexer_state.last_token], | ||
| state=parser_state, | ||
| terminals_by_name=self.root_lexer.terminals, | ||
| ) |
Owner
Author
There was a problem hiding this comment.
Note: Fixes the following tests
- Multiple Valid Continuations
- Token is Substring of Another Token
- Recursive Patterns
db36739 to
5922b3b
Compare
|
Any updates on when this will be merged? Grammars via vLLM are completely broken atm. |
Owner
Author
|
@w013nad Please track dottxt-ai#1067 I'll follow up this week and see if we can get this merged. |
|
Has this pr been available? I'd like to see the behavior of CFG via PartialLark. |
e763f29 to
b8d5b42
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes:
Rendered Docs: https://github.com/lapp0/outlines/blob/cfg-beta/docs/reference/creating_grammars.md
Changes
CFGGuideCFGGuidebased on Brandon Willard's implementation inexamples/parsing.pyoutlines.fsm.parsingto handle some edge cases$ENDis a legal next terminalCFGFSMGrammars
ESCAPED_STRINGinjson.larkandcommon.larkIntegrations
outlines.generate.cfg(...)viaSequenceGeneratorAdapteroutlines.processors.CFGLogitsProcessorTesting
tests/fsm/test_cfg_guide.pytest_cfg_next_token: assert that given a sequence of prior tokens generated, the expected next tokens in a vocabulary are allowed.test_cfg_grammar_sample: Resurrected tests from an old PR which encode a sample which is valid with the grammar, and assert that the sequence of encoded tokens can be produced byCFGGuide. Allows for a new test to be created by simply adding an example totests/cfg_samples/test
outlines.generate.cfgviatests/generate/test_generate.pyBenchmarks
benchmarks/bench_cfg_guide.py: measureCFGGuideconstruction time, token run time, and token run peak-memoryAnalysis
Regardless of length, 10 tokens, 40 tokens, or 100 tokens, it takes ~1.2 seconds to generate a token.
Unsurprisingly
get_next_instructiontakes most of the time, totaling over 99.99% of the runtime. It's intuitive considering the same operation is applied forget_next_state, but for a single token instead of once for each of gpt2's 50,257 tokens.cProfile:
Details
Future Work
(TODO: Move these to issues)
Improvements
Context-sensitive features such as pythons tree parser
Currently tree parser isn't supported dottxt-ai#592
Allow CFG in
outlines.serveRemove
Guide.is_final_stateis_final_stateis ambiguous (dottxt-ai#885), in a separate PR we should removeis_final_stateClean Up Dead Code
Remove
StopAtEosFSMRegexFSMStopAtEOSGuideis useful anywhereBug Fixes
Ensure parser allows ambiguous terminals
(e.g.
?start: /ab*/ /bc?/)Improve performance
Incorrectly Over-Constrained
arithmetic_lots_of_ops.arithmetic.test- guide doesn't allow generation of eos token at endTODO
outlines.generate.cfgCFGLogitsProcessortest_generate.pyNotify these threads:
CFGFSMLALR(1) dottxt-ai/outlines#588Separate PR
outlines.grammars.sql_selectand testssql_select_select_minimal_lalr1.sql.testdottxt-ai#636
dottxt-ai#633