💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
[TOC] ## 1 概述 > 本文适用 `Java 8` 及以下版本,以下称 `Java` ## 2 Java 词法规则(lexer grammar) 笔者将以**自下而上**的顺序进行词法规则(lexer grammar) 的介绍 ### 2.1 关键字(keywords) 以下是 Java 中关键字/保留字的词法规则: <pre style="font-family: 'Monaco'"> ABSTRACT: 'abstract'; ASSERT: 'assert'; BOOLEAN: 'boolean'; BREAK: 'break'; BYTE: 'byte'; CASE: 'case'; CATCH: 'catch'; CHAR: 'char'; CLASS: 'class'; CONST: 'const'; CONTINUE: 'continue'; DEFAULT: 'default'; DO: 'do'; DOUBLE: 'double'; LSE: 'else'; ENUM: 'enum'; EXTENDS: 'extends'; FINAL: 'final'; FINALLY: 'finally'; FLOAT: 'float'; FOR: 'for'; IF: 'if'; GOTO: 'goto'; IMPLEMENTS: 'implements'; IMPORT: 'import'; INSTANCEOF: 'instanceof'; INT: 'int'; INTERFACE: 'interface'; LONG: 'long'; NATIVE: 'native'; NEW: 'new'; PACKAGE: 'package'; PRIVATE: 'private'; PROTECTED: 'protected'; PUBLIC: 'public'; RETURN: 'return'; SHORT: 'short'; STATIC: 'static'; STRICTFP: 'strictfp'; SUPER: 'super'; SWITCH: 'switch'; SYNCHRONIZED: 'synchronized'; THIS: 'this'; THROW: 'throw'; THROWS: 'throws'; TRANSIENT: 'transient'; TRY: 'try'; VOID: 'void'; VOLATILE: 'volatile'; WHILE: 'while'; </pre> ### 2.2 片段规则(fragment rule) 片段(fragment)规则只能为其他词法规则提供基础,而不参与到语法规则的解析中 ---- 数字(Digits)、指数部分(ExponentPart): <pre style="font-family: 'Monaco'"> fragment Digits : [0-9] ([0-9_]* [0-9])? ; fragment ExponentPart : [eE] [+-]? Digits ; </pre> 字母(Letter)、字母或数字(LetterOrDigit): <pre style="font-family: 'Monaco'"> fragment Letter : [a-zA-Z$_] | ~[\u0000-\u007F\uD800-\uDBFF] | [\uD800-\uDBFF] [\uDC00-\uDFFF] ; fragment LetterOrDigit : Letter | [0-9] ; </pre> 十六进制数字(HexDigits、HexDigit)、转义序列(EscapeSequence): <pre style="font-family: 'Monaco'"> fragment HexDigit : [0-9a-fA-F] ; fragment HexDigits : HexDigit ((HexDigit | '_')* HexDigit)? ; fragment EscapeSequence : '\\' [btnfr"'\\] | '\\' ([0-3]? [0-7])? [0-7] | '\\' 'u'+ HexDigit HexDigit HexDigit HexDigit ; </pre> ### 2.3 字面量(literals) 十进制字面量: <pre style="font-family: 'Monaco'"> DECIMAL_LITERAL: ('0' | [1-9] (Digits? | '_'+ Digits)) [lL]?; </pre> 十六进制字面量: <pre style="font-family: 'Monaco'"> HEX_LITERAL: '0' [xX] [0-9a-fA-F] ([0-9a-fA-F_]* [0-9a-fA-F])? [lL]?; </pre> 八进制字面量: <pre style="font-family: 'Monaco'"> OCT_LITERAL: '0' '_'* [0-7] ([0-7_]* [0-7])? [lL]?; </pre> 二进制字面量: <pre style="font-family: 'Monaco'"> BINARY_LITERAL: '0' [bB] [01] ([01_]* [01])? [lL]?; </pre> 浮点数字面量: <pre style="font-family: 'Monaco'"> FLOAT_LITERAL: (Digits '.' Digits? | '.' Digits) ExponentPart? [fFdD]? | Digits (ExponentPart [fFdD]? | [fFdD]) ; </pre> 十六进制浮点数字面量: <pre style="font-family: 'Monaco'"> HEX_FLOAT_LITERAL: '0' [xX] (HexDigits '.'? | HexDigits? '.' HexDigits) [pP] [+-]? Digits [fFdD]?; </pre> 布尔字面量: <pre style="font-family: 'Monaco'"> BOOL_LITERAL: 'true' | 'false' ; </pre> 字符字面量: <pre style="font-family: 'Monaco'"> CHAR_LITERAL: '\'' (~['\\\r\n] | EscapeSequence) '\''; </pre> 字符串字面量: <pre style="font-family: 'Monaco'"> STRING_LITERAL: '"' (~["\\\r\n] | EscapeSequence)* '"'; </pre> null字面量: <pre style="font-family: 'Monaco'"> NULL_LITERAL: 'null'; </pre> ### 2.4 分隔符(separators) 以下是对分隔符的定义: <pre style="font-family: 'Monaco'"> LPAREN: '('; RPAREN: ')'; LBRACE: '{'; RBRACE: '}'; LBRACK: '['; RBRACK: ']'; SEMI: ';'; COMMA: ','; DOT: '.'; </pre> ### 2.5 操作符(operators) 以下是对操作符的定义: <pre style="font-family: 'Monaco'"> ASSIGN: '='; LT: '<'; GT: '>'; BANG: '!'; EQUAL: '=='; LE: '<='; GE: '>='; NOTEQUAL: '!='; AND: '&&'; OR: '||'; BITAND: '&'; BITOR: '|'; ADD: '+'; SUB: '-'; MUL: '*'; DIV: '/'; INC: '++'; DEC: '--'; CARET: '^'; MOD: '%'; TILDE: '~'; QUESTION: '?'; COLON: ':'; // 委派(assign) ADD_ASSIGN: '+='; SUB_ASSIGN: '-='; MUL_ASSIGN: '*='; DIV_ASSIGN: '/='; AND_ASSIGN: '&='; OR_ASSIGN: '|='; XOR_ASSIGN: '^='; MOD_ASSIGN: '%='; LSHIFT_ASSIGN: '<<='; RSHIFT_ASSIGN: '>>='; URSHIFT_ASSIGN: '>>>='; // Java 8 中的词法 ARROW: '->'; COLONCOLON: '::'; // 词法规范中未定义的附加符号 AT: '@'; ELLIPSIS: '...'; // 空白字符和注释 WS: [ \t\r\n\u000C]+ -> channel(HIDDEN); COMMENT: '/*' .*? '*/' -> channel(HIDDEN); LINE_COMMENT: '//' ~[\r\n]* -> channel(HIDDEN); // 标识符 IDENTIFIER: Letter LetterOrDigit*; </pre> 以上便是Java的词法规则,接下来将基于词法规则介绍语法规则 ## 3 Java 语法规则(parser grammar) 笔者将以**广度优先搜索**的顺序进行语法规则(parser grammar)的列举。经过梳理,一共11层,以下是各层中的语法规则: <pre style="font-family: 'Monaco'"> 1 compilationUnit 2 packageDeclaration importDeclaration typeDeclaration 3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration annotationTypeDeclaration 4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody enumConstants enumBodyDeclarations interfaceBody annotationTypeBody 5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration 6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments interfaceMemberDeclaration annotationTypeElementRest 7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest 8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest 9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue 10 variableDeclaratorId catchType enhancedForControl forInit resources 11 resource </pre> ### 3.1 第1层 <pre style="font-family: 'Monaco'"> /* 1 compilationUnit */ compilationUnit : packageDeclaration? importDeclaration* typeDeclaration* EOF ; </pre> ### 3.2 第2层 <pre style="font-family: 'Monaco'"> /* 2 packageDeclaration importDeclaration typeDeclaration */ packageDeclaration : annotation* PACKAGE qualifiedName ';' ; importDeclaration : IMPORT STATIC? qualifiedName ('.' '*')? ';' ; typeDeclaration : classOrInterfaceModifier* (classDeclaration | enumDeclaration | interfaceDeclaration | annotationTypeDeclaration) | ';' ; </pre> ### 3.3 第3层 <pre style="font-family: 'Monaco'"> /* 3 annotation qualifiedName classOrInterfaceModifier classDeclaration enumDeclaration interfaceDeclaration annotationTypeDeclaration */ annotation : ('@' qualifiedName | altAnnotationQualifiedName) ('(' ( elementValuePairs | elementValue )? ')')? ; qualifiedName : IDENTIFIER ('.' IDENTIFIER)* ; classOrInterfaceModifier : annotation | PUBLIC | PROTECTED | PRIVATE | STATIC | ABSTRACT | FINAL | STRICTFP ; classDeclaration : CLASS IDENTIFIER typeParameters? (EXTENDS typeType)? (IMPLEMENTS typeList)? classBody ; enumDeclaration : ENUM IDENTIFIER (IMPLEMENTS typeList)? '{' enumConstants? ','? enumBodyDeclarations? '}' ; interfaceDeclaration : INTERFACE IDENTIFIER typeParameters? (EXTENDS typeList)? interfaceBody ; annotationTypeDeclaration : '@' INTERFACE IDENTIFIER annotationTypeBody ; </pre> ### 3.4 第4层 <pre style="font-family: 'Monaco'"> /* 4 altAnnotationQualifiedName elementValuePairs elementValue typeParameters typeType typeList classBody enumConstants enumBodyDeclarations interfaceBody annotationTypeBody */ altAnnotationQualifiedName : (IDENTIFIER DOT)* '@' IDENTIFIER ; elementValuePairs : elementValuePair (',' elementValuePair)* ; typeParameters : '<' typeParameter (',' typeParameter)* '>' ; typeType : annotation? (classOrInterfaceType | primitiveType) ('[' ']')* ; typeList : typeType (',' typeType)* ; classBody : '{' classBodyDeclaration* '}' ; enumConstants : enumConstant (',' enumConstant)* ; enumBodyDeclarations : ';' classBodyDeclaration* ; interfaceBody : '{' interfaceBodyDeclaration* '}' ; annotationTypeBody : '{' (annotationTypeElementDeclaration)* '}' ; </pre> ### 3.5 第5层 <pre style="font-family: 'Monaco'"> /* 5 elementValuePair expression elementValueArrayInitializer typeParameter classOrInterfaceType primitiveType classBodyDeclaration enumConstant interfaceBodyDeclaration annotationTypeElementDeclaration */ elementValuePair : IDENTIFIER '=' elementValue ; expression : primary | expression bop='.' ( IDENTIFIER | methodCall | THIS | NEW nonWildcardTypeArguments? innerCreator | SUPER superSuffix | explicitGenericInvocation ) | expression '[' expression ']' | methodCall | NEW creator | '(' typeType ')' expression | expression postfix=('++' | '--') | prefix=('+'|'-'|'++'|'--') expression | prefix=('~'|'!') expression | expression bop=('*'|'/'|'%') expression | expression bop=('+'|'-') expression | expression ('<' '<' | '>' '>' '>' | '>' '>') expression | expression bop=('<=' | '>=' | '>' | '<') expression | expression bop=INSTANCEOF typeType | expression bop=('==' | '!=') expression | expression bop='&' expression | expression bop='^' expression | expression bop='|' expression | expression bop='&&' expression | expression bop='||' expression | <assoc=right> expression bop='?' expression ':' expression | <assoc=right> expression bop=('=' | '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '^=' | '>>=' | '>>>=' | '<<=' | '%=') expression | lambdaExpression // Java8 // Java 8 methodReference | expression '::' typeArguments? IDENTIFIER | typeType '::' (typeArguments? IDENTIFIER | NEW) | classType '::' typeArguments? NEW ; elementValueArrayInitializer : '{' (elementValue (',' elementValue)*)? (',')? '}' ; typeParameter : annotation* IDENTIFIER (EXTENDS typeBound)? ; classOrInterfaceType : IDENTIFIER typeArguments? ('.' IDENTIFIER typeArguments?)* ; primitiveType : BOOLEAN | CHAR | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE ; classBodyDeclaration : ';' | STATIC? block | modifier* memberDeclaration ; enumConstant : annotation* IDENTIFIER arguments? classBody? ; interfaceBodyDeclaration : modifier* interfaceMemberDeclaration | ';' ; annotationTypeElementDeclaration : modifier* annotationTypeElementRest | ';' ; </pre> ### 3.6 第6层 <pre style="font-family: 'Monaco'"> /* 6 elementValue primary methodCall nonWildcardTypeArguments innerCreator superSuffix explicitGenericInvocation creator lambdaExpression typeArguments classType typeBound block modifier memberDeclaration arguments interfaceMemberDeclaration annotationTypeElementRest */ elementValue : expression | annotation | elementValueArrayInitializer ; primary : '(' expression ')' | THIS | SUPER | literal | IDENTIFIER | typeTypeOrVoid '.' CLASS | nonWildcardTypeArguments (explicitGenericInvocationSuffix | THIS arguments) ; methodCall : IDENTIFIER '(' expressionList? ')' | THIS '(' expressionList? ')' | SUPER '(' expressionList? ')' ; nonWildcardTypeArguments : '<' typeList '>' ; innerCreator : IDENTIFIER nonWildcardTypeArgumentsOrDiamond? classCreatorRest ; superSuffix : arguments | '.' IDENTIFIER arguments? ; explicitGenericInvocation : nonWildcardTypeArguments explicitGenericInvocationSuffix ; creator : nonWildcardTypeArguments createdName classCreatorRest | createdName (arrayCreatorRest | classCreatorRest) ; // Java8 lambdaExpression : lambdaParameters '->' lambdaBody ; typeArguments : '<' typeArgument (',' typeArgument)* '>' ; classType : (classOrInterfaceType '.')? annotation* IDENTIFIER typeArguments? ; typeBound : typeType ('&' typeType)* ; block : '{' blockStatement* '}' ; modifier : classOrInterfaceModifier | NATIVE | SYNCHRONIZED | TRANSIENT | VOLATILE ; memberDeclaration : methodDeclaration | genericMethodDeclaration | fieldDeclaration | constructorDeclaration | genericConstructorDeclaration | interfaceDeclaration | annotationTypeDeclaration | classDeclaration | enumDeclaration ; arguments : '(' expressionList? ')' ; interfaceMemberDeclaration : constDeclaration | interfaceMethodDeclaration | genericInterfaceMethodDeclaration | interfaceDeclaration | annotationTypeDeclaration | classDeclaration | enumDeclaration ; annotationTypeElementRest : typeType annotationMethodOrConstantRest ';' | classDeclaration ';'? | interfaceDeclaration ';'? | enumDeclaration ';'? | annotationTypeDeclaration ';'? ; </pre> ### 3.7 第7层 <pre style="font-family: 'Monaco'"> /* 7 literal typeTypeOrVoid explicitGenericInvocationSuffix expressionList nonWildcardTypeArgumentsOrDiamond createdName classCreatorRest arrayCreatorRest lambdaParameters lambdaBody typeArgument blockStatement methodDeclaration genericMethodDeclaration fieldDeclaration constructorDeclaration genericConstructorDeclaration constDeclaration interfaceMethodDeclaration genericInterfaceMethodDeclaration annotationMethodOrConstantRest */ literal : integerLiteral | floatLiteral | CHAR_LITERAL | STRING_LITERAL | BOOL_LITERAL | NULL_LITERAL ; typeTypeOrVoid : typeType | VOID ; explicitGenericInvocationSuffix : SUPER superSuffix | IDENTIFIER arguments ; expressionList : expression (',' expression)* ; nonWildcardTypeArgumentsOrDiamond : '<' '>' | nonWildcardTypeArguments ; createdName : IDENTIFIER typeArgumentsOrDiamond? ('.' IDENTIFIER typeArgumentsOrDiamond?)* | primitiveType ; classCreatorRest : arguments classBody? ; arrayCreatorRest : '[' (']' ('[' ']')* arrayInitializer | expression ']' ('[' expression ']')* ('[' ']')*) ; // Java8 lambdaParameters : IDENTIFIER | '(' formalParameterList? ')' | '(' IDENTIFIER (',' IDENTIFIER)* ')' ; // Java8 lambdaBody : expression | block ; typeArgument : typeType | '?' ((EXTENDS | SUPER) typeType)? ; blockStatement : localVariableDeclaration ';' | statement | localTypeDeclaration ; methodDeclaration : typeTypeOrVoid IDENTIFIER formalParameters ('[' ']')* (THROWS qualifiedNameList)? methodBody ; genericMethodDeclaration : typeParameters methodDeclaration ; fieldDeclaration : typeType variableDeclarators ';' ; constructorDeclaration : IDENTIFIER formalParameters (THROWS qualifiedNameList)? constructorBody=block ; genericConstructorDeclaration : typeParameters constructorDeclaration ; constDeclaration : typeType constantDeclarator (',' constantDeclarator)* ';' ; interfaceMethodDeclaration : interfaceMethodModifier* (typeTypeOrVoid | typeParameters annotation* typeTypeOrVoid) IDENTIFIER formalParameters ('[' ']')* (THROWS qualifiedNameList)? methodBody ; genericInterfaceMethodDeclaration : typeParameters interfaceMethodDeclaration ; annotationMethodOrConstantRest : annotationMethodRest | annotationConstantRest ; </pre> ### 3.8 第8层 <pre style="font-family: 'Monaco'"> /* 8 integerLiteral floatLiteral typeArgumentsOrDiamond arrayInitializer formalParameterList localVariableDeclaration statement localTypeDeclaration formalParameters qualifiedNameList methodBody variableDeclarators constantDeclarator interfaceMethodModifier annotationMethodRest annotationConstantRest */ integerLiteral : DECIMAL_LITERAL | HEX_LITERAL | OCT_LITERAL | BINARY_LITERAL ; floatLiteral : FLOAT_LITERAL | HEX_FLOAT_LITERAL ; typeArgumentsOrDiamond : '<' '>' | typeArguments ; arrayInitializer : '{' (variableInitializer (',' variableInitializer)* (',')? )? '}' ; formalParameterList : formalParameter (',' formalParameter)* (',' lastFormalParameter)? | lastFormalParameter ; localVariableDeclaration : variableModifier* typeType variableDeclarators ; statement : blockLabel=block | ASSERT expression (':' expression)? ';' | IF parExpression statement (ELSE statement)? | WHILE parExpression statement | DO statement WHILE parExpression ';' | FOR '(' forControl ')' statement | TRY block (catchClause+ finallyBlock? | finallyBlock) | TRY resourceSpecification block catchClause* finallyBlock? | SWITCH parExpression '{' switchBlockStatementGroup* switchLabel* '}' | SYNCHRONIZED parExpression block | RETURN expression? ';' | THROW expression ';' | BREAK IDENTIFIER? ';' | CONTINUE IDENTIFIER? ';' | SEMI | statementExpression=expression ';' | identifierLabel=IDENTIFIER ':' statement ; localTypeDeclaration : classOrInterfaceModifier* (classDeclaration | interfaceDeclaration) | ';' ; formalParameters : '(' formalParameterList? ')' ; qualifiedNameList : qualifiedName (',' qualifiedName)* ; methodBody : block | ';' ; variableDeclarators : variableDeclarator (',' variableDeclarator)* ; constantDeclarator : IDENTIFIER ('[' ']')* '=' variableInitializer ; // Java8 interfaceMethodModifier : annotation | PUBLIC | ABSTRACT | DEFAULT | STATIC | STRICTFP ; annotationMethodRest : IDENTIFIER '(' ')' defaultValue? ; annotationConstantRest : variableDeclarators ; </pre> ### 3.9 第9层 <pre style="font-family: 'Monaco'"> /* 9 formalParameter lastFormalParameter variableModifier parExpression catchClause forControl finallyBlock resourceSpecification switchBlockStatementGroup switchLabel variableDeclarator variableInitializer defaultValue */ formalParameter : variableModifier* typeType variableDeclaratorId ; lastFormalParameter : variableModifier* typeType '...' variableDeclaratorId ; variableModifier : annotation | FINAL ; parExpression : '(' expression ')' ; catchClause : CATCH '(' variableModifier* catchType IDENTIFIER ')' block ; forControl : enhancedForControl | forInit? ';' expression? ';' forUpdate=expressionList? ; finallyBlock : FINALLY block ; resourceSpecification : '(' resources ';'? ')' ; switchBlockStatementGroup : switchLabel+ blockStatement+ ; switchLabel : CASE (constantExpression=expression | enumConstantName=IDENTIFIER) ':' | DEFAULT ':' ; variableDeclarator : variableDeclaratorId ('=' variableInitializer)? ; variableInitializer : arrayInitializer | expression ; defaultValue : DEFAULT elementValue ; </pre> ### 3.10 第10层 <pre style="font-family: 'Monaco'"> /* 10 variableDeclaratorId catchType enhancedForControl forInit resources */ variableDeclaratorId : IDENTIFIER ('[' ']')* ; catchType : qualifiedName ('|' qualifiedName)* ; enhancedForControl : variableModifier* typeType variableDeclaratorId ':' expression ; forInit : localVariableDeclaration | expressionList ; resources : resource (';' resource)* ; </pre> ### 3.11 第11层 <pre style="font-family: 'Monaco'"> /* 11 resource */ resource : variableModifier* classOrInterfaceType variableDeclaratorId '=' expression ; </pre>