Air Traffic Control (ATC) commands semantic parsing using CCG.
You may want to start with some examples. Here CCG parser from NLTK is used to parse hundreds of ATC commands generated by chatGPT.
Column A contains a set of manually selected short phrases that represent important phraseology in the ATC domain.
Column C. Each of these phrases (from column A) was given to chatGPT with a request to generate realistic ATC communication examples related to the phrase. Please note that chatGPT generates phrases with punctuation that split each phrase into semantically distinct segments, but this punctuation was not used in the parsing process.
Column D. CCG parser was used to parse all communications generated by chatGPT. Results are present here as JSON strings. You can use free Online JSON Viewer to look into any of these results. Just copy any JSON string from this column and paste it into Text window of the viewer. See results in Viewer window.
pip install .
in the folder with setup.py file.
import ATC_parsing as atc
# generate lexicon and parser
dData = {}
atc.make_lexicon(dData)
# ATC commands to parse
a_commands = [
'Cirrus 82AB, Runway 27 is shortened. Takeoff from intersection Bravo, available TORA is 4,500 feet. Runway 27, cleared for takeoff.',
'DAL456, cross 40 miles east of PGS at or above FL330',
]
#Parsing
for command in a_commands:
print('\ncommand:\t'+command)
number_of_steps = 3
# native presentation
logicalForm = atc.parsing(command, number_of_steps, dData)
print('\nLogical Form:\t'+logicalForm)
# JSON presentation
sJSON = atc.logicalForm2JSON(logicalForm)
print('\nJSON:\t'+sJSON)
And these are the results:
command: Cirrus 82AB, Runway 27 is shortened. Takeoff from intersection Bravo, available TORA is 4,500 feet. Runway 27, cleared for takeoff.
Logical Form: _CALLSIGN_(_AIRCRAFT_(*Cirrus*),_CALLSIGN_(*82AB*)); _RUNWAY_(_RUNWAY_(_RUNWAY_(_RUNWAY_(*Runway*),_INTNUMBER_(*27*))),_IS_(*is*),_STATUS_(*shortened*)); _DEPARTURE_(_DEPARTURE_(*Takeoff*),_FROM_(*from*),_TAXIWAY_(_TAXIWAY_(*intersection*),_PHONETICALPHABET_(*Bravo*))); _DECLAREDDISTANCE_(_DECLAREDDISTANCE_(*available TORA*),_IS_(*is*),_DECLAREDDISTANCE_(*4500 feet*)); _RUNWAY_(_RUNWAY_(_RUNWAY_(_RUNWAY_(*Runway*),_INTNUMBER_(*27*))),_CLEARED_(_CLEARED_(_CLEARED_(*cleared*),_FOR_(*for*),_DEPARTURE_(*takeoff*))));
JSON: {"CALLSIGN_1":{"AIRCRAFT_1":"Cirrus","CALLSIGN_2":"82AB"}, "RUNWAY_1":{"RUNWAY_2":{"RUNWAY_3":"Runway","INTNUMBER_1":"27"},"IS_1":"is","STATUS_1":"shortened"}, "DEPARTURE_1":{"DEPARTURE_2":"Takeoff","FROM_1":"from","TAXIWAY_1":{"TAXIWAY_2":"intersection","PHONETICALPHABET_1":"Bravo"}}, "DECLAREDDISTANCE_1":{"DECLAREDDISTANCE_2":"available TORA","IS_2":"is","DECLAREDDISTANCE_3":"4500 feet"}, "RUNWAY_4":{"RUNWAY_5":{"RUNWAY_6":"Runway","INTNUMBER_2":"27"},"CLEARED_1":{"CLEARED_2":"cleared","FOR_1":"for","DEPARTURE_3":"takeoff"}}}
command: DAL456, cross 40 miles east of PGS at or above FL330
Logical Form: _CALLSIGN_(*DAL456*); _NAVIGATION_(_NAVIGATION_(_NAVIGATION_(_NAVIGATION_(_NAVIGATION_(_NAVIGATION_(*cross*),_MEASURE_(_MEASURE_(*40*),_MEASURE_(_MEASURE_(*miles*),_DIRECTIONMAGNETIC_(*east*))))),_OF_(*of*),_FIX_(*PGS*))),_AT_(*at*),_FLEVEL_(_FLEVEL_(_COMPARISONOR_(*or above*),_FLEVEL_(*FL330*))));
JSON: {"CALLSIGN_1":"DAL456", "NAVIGATION_1":{"NAVIGATION_2":{"NAVIGATION_3":{"NAVIGATION_4":{"NAVIGATION_5":{"NAVIGATION_6":"cross","MEASURE_1":{"MEASURE_2":"40","MEASURE_3":{"MEASURE_4":"miles","DIRECTIONMAGNETIC_1":"east"}}}},"OF_1":"of","FIX_1":"PGS"}},"AT_1":"at","FLEVEL_1":{"COMPARISONOR_1":"or above","FLEVEL_2":"FL330"}}}
Please note that parsing quality depends on the lexicon that was developed specifically for Air Traffic Control commands.
Folder
...\ATC-parsing\ATC_parsing\data\
contains some files that may be updated.
If we want to take into account new places, waypoints, fixes, airline names, we should update ‘regex.txt’. We may call this process localization.
Also we may need to take into account new ATC phraseology that may occur from time to time. In this case we should update ‘regex.txt’ once again.
If we see new command patterns that can’t be parsed correctly we should update other files - lexicon_complex.txt is the most important here.
You may try to update these files yourself, especially if you are familiar with CCG parsing from NLTK. But I’ll be happy if you could let me know about any problems with the parsing of any specific command.
Please use GitHub issues or mail me directly: atc.parsing@gmail.com
Thank you.