srl.py 64 KB


  1. #! /usr/bin/python
  2. # -*- coding: utf-8 -*-
  3. """
  4. This module defines classes for semantic role labeling and predicate
  5. argument structure.
  6. Version 1.6 (20-Jun-2014)
  7. - getPropositions() and getPredicates() are added to SRL.
  8. - getForm() is added to DepArg and ConstArg.
  9. - getStr() and __str__ are added to to Arguments.
  10. - All [pv]flgCollapsAdjunct are fixed to [pv]flgCollapseAdjunct
  11. - Proposition.getStr() is updated.
  12. Version 1.5 (09-May-2014)
  13. - SRLComp.frameMatchPreds() is updated to optionally ignore frame match,
  14. i.e. two predicates will match only if they are aligned regardless
  15. of their frame match.
  16. - BiSRLComp is removed (not developed).
  17. Version 1.4 (11-Apr-2014)
  18. - Predicate type is added to SRL as an attribute to allow to load either
  19. verb or noun predicates or all predicate types. _removeNounPreds() and
  20. _removeVerbPreds() are added to SRL for this purpose.
  21. - getPredicate() is added to Proposition.
  22. - isVerbPred() is added to DepPred and ConstPred.
  23. - Constraint violation handling argument of SRL.loadFromCoNLL2009()
  24. is removed and now handled via annotation option argument.
  25. - The default values for predicate alignment resolutions are changed
  26. from 1 to 0 to perform no action.
  27. Version 1.3 (07-Apr-2014)
  28. - SRLComp is debugged and tested.
  29. - DepSRLProject.waProject() is changed to load the target dependency
  30. tree with the projected SRL.
  31. - Bug is fixed in Predicate.compareFrameWith() which used lemma for sense
  32. comparison level.
  33. - getArguments() is added to Proposition and SRL.
  34. Version 1.2 (20-Mar-2014)
  35. - In DepSRLProject, non and one-to-many predicates and arguments are
  36. collected and reported. The statistics are computed by counting them
  37. instead of keeping the counts. Many-to-one is still handled by counts.
  38. _initReport() is added for this purpose.
  39. - position, getNode() and getForm() are added to DepPred.
  40. - getForm() is added to ConstPred.
  41. Version 1.1 (05-Mar-2014)
  42. - SRLProject and DepSRLProject are added.
  43. - addProposition() is added to SRL.
  44. - The initialization value of propositions in SRL.__init__() is changed
  45. from none to [].
  46. - Constructor of Proposition is edited to have a default value for
  47. plArguments in the method's argument.
  48. - addArguments() is added to Proposition.
  49. - "ignore" is added to OnConstraintViolation in SRL.checkConstraints().
  50. Version 1.0 (19-Feb-2014)
  51. - getNode() is added to DepArg and ConstArg.
  52. - Constraints #5 and #6 are added to ConstArg._checkElevationConstraints().
  53. - ConstArg now optionally loads itself into the corresponding constituency
  54. node during creation.
  55. - Conversion to constituency involves loading the annotation to the
  56. corresponding constituency tree by default. This however can be overridden
  57. by setting the corresponding method argument. This early loading helps
  58. applying global constraints to elevating argument node in the tree
  59. when converted form dependency. Therefore it should be let done unless
  60. it is required or other conversion methods that elevation is used.
  61. - getPOSTag() is added to DepPred and DepArg.
  62. - position is added to DepArg.
  63. - getLabel() is added to Predicate.
  64. Version 0.9 (17-Feb-2014)
  65. - Proposition.compareStructWith() is renamed to isStructEqualTo() and
  66. predicate frame comparison is removed (i.e. proposition structure
  67. consists of only its argument roles).
  68. - SRLComp and BiSRLComp are added.
  69. - Predicate.compareFrameWith()
  70. - Argument.getLabel() is added.
  71. - getSynTag() is added to ConstPred, ConstArg.
  72. - pAnnotationOptions is added to SRL.loadFromCoNLL2009().
  73. - Dependency node is added to DepPred and DepArg. The constructors as
  74. calling methods such as loadFromCoNLL2009() are edited accordingly.
  75. Version 0.8 (11-Feb-2014)
  76. - span is moved to Arguments and DepArg and ConstArg were changed
  77. accordingly.
  78. - Argument.size() is added.
  79. - ConstArg.elevate() is edited to update span after elevation.
  80. - Bug in _checkElevationConstraints() is fixed to consider collapsed
  81. terminal cases as pre-teriminal nodes.
  82. - A sanity check is added SRL.depToConst() to take care of dependency
  83. SRL and constituency tree mismatch (e.g. non-parsed sentence for which
  84. there however is a dependency SRL).
  85. Version 0.7 (07-Feb-2014)
  86. - Proposition.argCount() is changes from property to method and renamed
  87. to getArgCount() to take adjunct collapse option as argument.
  88. - Bug is fixed in Argument.getAdjunctType() and Argument.isAdjunct().
  89. Version 0.6 (06-Feb-2014)
  90. - propCount(), getAvgArgCount() are added to SRL.
  91. - argCount(), compareStructWith() are added to Proposition
  92. - compareFrameWith() is added to Predicate
  93. - isAdjunct(), compareRoleWith() are added to Argument.
  94. - __str__() and getStr() are added to SRL and Proposition
  95. Version 0.5 (29-Jan-2014)
  96. - Constraints #3, #4 are added in ConstArg._checkElevationConstraints().
  97. - getNode() is added to ConstPred.
  98. - Proposition is added as an attribute to Argument.
  99. - Argument.gerPredicate()
  100. Version 0.4 (27-Jan-2014)
  101. - DepArg and ConstArg are derived from Argument and dependency- and
  102. constituency-based arguments as separated.
  103. - DepPred and ConstPred are derived from Predicate and dependency- and
  104. constituency-based predicates as separated.
  105. - language is added to SRL and Argument.
  106. - getAdjunctType() is added to Argument.
  107. - Language-specific constraints are added to ConstArg.
  108. Version 0.3 (20-Jan-2014)
  109. - SRL.type is added specify the type of SRL formalism: dependency or
  110. consistuency
  111. - isConstituency(), isConstituency(), convertToConst(), depToConst()
  112. are added to SRL.
  113. Version 0.2 (27-Dec-2013)
  114. - SRL.loadFromCoNLL2009() is edited to allow for constraints check.
  115. - SRL.checkConstraints() is added.
  116. Version 0.1 (23-Dec-2013)
  117. - SRL, Proposition, Predicate, Argument are defined.
  118. """
  119. import sys
  120. import dloader, parse
  121. from utils import util
  122. class SRL:
  123. '''
  124. Class for semantic role labeling of a sentence.
  125. It is designed to label a sentence or loads the labels from various
  126. formats e.g. CoNLL 2009.
  127. '''
  128. def __init__(self, pLanguage, pType = ''):
  129. '''
  130. Creates an SRL object
  131. '''
  132. # type of SRL formalism: (d)ependency or (c)onstituency
  133. self.type = pType
  134. self.language = pLanguage
  135. # list of propositions in the sentence
  136. self.propositions = []
  137. def isConstituency(self):
  138. '''
  139. Returns true if the SRL formalism is constituency
  140. '''
  141. return self.type.lower().startswith('c')
  142. def isDependency(self):
  143. '''
  144. Returns true if the SRL formalism is constituency
  145. '''
  146. return self.type.lower().startswith('d')
  147. def addProposition(self, pProposition):
  148. '''
  149. Adds a proposition
  150. '''
  151. self.propositions.append(pProposition)
  152. def loadFromCoNLL2009(self, pCoNLL2009Sent, pdAnnotationOptions = {}):
  153. '''
  154. Loads semantic role labeling from a CoNLL 2009 sentence.
  155. pdAnnotationOptions include the following options:
  156. - on-srl-constraint-violation can be:
  157. - exception: raise exception in case of any linguistic constraint
  158. violation
  159. - fix : fix linguistic constraint violations
  160. - ignore : ignore the problem but report
  161. - gold-or-predicted: can be gold or pred specifying which annotation
  162. should be loaded
  163. - predicate-type: can be (v)erb or (n)oun or (a)ll
  164. '''
  165. # SRL CoNLL 2009 format is in dependency formalism
  166. self.type = 'dependency'
  167. vDepTree = pCoNLL2009Sent.getDepTree(pdAnnotationOptions = pdAnnotationOptions)
  168. # creating proposition placeholders
  169. self.propositions = [Proposition(None, None) for i in range(pCoNLL2009Sent.predicateCount)]
  170. vCurrPredNo = 0
  171. for i, vToken in enumerate(pCoNLL2009Sent.conllTokens, start = 1):
  172. # 1. creating arguments and adding to corresponding propositions
  173. for vArgLabel, vPredNo in vToken.args:
  174. self.propositions[vPredNo - 1].arguments.append(DepArg(vArgLabel, i, vDepTree.getNode(i), self.propositions[vPredNo - 1], self.language))
  175. # 2. creating predicates and adding to corresponding proposition
  176. if vToken.fillPred:
  177. self.propositions[vCurrPredNo].predicate = DepPred(vToken.pred, i, vDepTree.getNode(i))
  178. vCurrPredNo += 1
  179. # applying predicate type
  180. if "predicate-type" in pdAnnotationOptions:
  181. if pdAnnotationOptions["predicate-type"].lower().startswith('v'):
  182. self._removeNounPreds()
  183. elif pdAnnotationOptions["predicate-type"].lower().startswith('n'):
  184. self._removeVerbPreds()
  185. elif not pdAnnotationOptions["predicate-type"].lower().startswith('a'):
  186. raise Exception("Unknown predicate type in annotation options: %s" % pdAnnotationOptions["predicate-type"])
  187. # checking for linguistic constraints
  188. if "on-srl-constraint-violation" in pdAnnotationOptions:
  189. vOnConstraintViolation = pdAnnotationOptions["on-srl-constraint-violation"]
  190. else:
  191. vOnConstraintViolation = "exception"
  192. return self.checkConstraints(vOnConstraintViolation)
  193. def _removeNounPreds(self):
  194. '''
  195. Removes propositions of noun predicates, i.e. keeps only propositions
  196. of verb predicates
  197. '''
  198. self.propositions = [p for p in self.propositions if p.getPredicate().isVerbPred()]
  199. def _removeVerbPreds(self):
  200. '''
  201. Removes propositions of verb predicates, i.e. keeps only propositions
  202. of noun predicates
  203. '''
  204. self.propositions = [p for p in self.propositions if not p.getPredicate().isVerbPred()]
  205. def convertToConst(self, pConstTree, pConversionMethod, pflgLoadIntoTree = True):
  206. '''
  207. Creates a new SRL object in constituency formalism by converting
  208. from the current formalism given the constituency tree
  209. pflgLoadIntoTree determines whether the annotation must be loaded
  210. into the corresponding constituency tree after conversion or not.
  211. '''
  212. if self.isDependency():
  213. return self.depToConst(pConstTree, pConversionMethod, pflgLoadIntoTree)
  214. else:
  215. raise Exception("Conversion from %s is not implemented yet", self.type)
  216. def depToConst(self, pConstTree, pConversionMethod, pflgLoadIntoTree):
  217. '''
  218. Creates a new SRL object in constituency formalism by converting
  219. from current dependency formalism given the constituency tree
  220. pflgLoadIntoTree determines whether the annotation must be loaded
  221. into the corresponding constituency tree after conversion or not.
  222. '''
  223. vConstSRL = SRL(pLanguage = self.language)
  224. vConstSRL.type = "constituency"
  225. vConstSRL.propositions = []
  226. # sanity check; useful for e.g. non-parsed trees having dep SRL
  227. vSentLength = pConstTree.getSentLength()
  228. if self.propCount > vSentLength or self.propCount == vSentLength == 1:
  229. sys.stderr.write('Number of propositions (%s) is bigger than sentence length (%s) according to constituency tree! Skipped.\n' % (self.propCount, vSentLength))
  230. return vConstSRL
  231. # NOTE: we don't use deepcopy()
  232. for vProp in self.propositions:
  233. vConstSRL.propositions.append(vProp.depToConst(pConstTree, pConversionMethod, pflgLoadIntoTree))
  234. return vConstSRL
  235. def checkConstraints(self, pOnConstraintViolation):
  236. '''
  237. Checks for a set of linguistic constraints
  238. '''
  239. vlErrStr = [] # violation errors caught
  240. vlArgToRemove = [] # violent arguments to remove
  241. # 1. predicate-argument overlap
  242. for vProp in self.propositions:
  243. for vArg in vProp.arguments:
  244. if vArg.span[0] <= vProp.predicate.position <= vArg.span[1]:
  245. vlErrStr.append("Predicate-argument overlap: %s and (%s, %s)" % (vProp.predicate.position, vArg.span[0], vArg.span[1]))
  246. if pOnConstraintViolation.lower() == "exception":
  247. raise Exception(vlErrStr[-1])
  248. elif pOnConstraintViolation.lower() == "ignore":
  249. vlErrStr[-1] = "(Ignocopyred) " + vlErrStr[-1]
  250. continue
  251. elif pOnConstraintViolation.lower() == "fix":
  252. ## We remove the argument that overlaps with its own
  253. ## predicate.
  254. vlArgToRemove.append(vArg)
  255. vlErrStr[-1] = "(Fixed) " + vlErrStr[-1]
  256. # removing violent arguments
  257. for vArg in vlArgToRemove:
  258. vProp.arguments.remove(vArg)
  259. vlArgToRemove = []
  260. # TODO: add other constraints
  261. return vlErrStr
  262. @property
  263. def propCount(self):
  264. '''
  265. Returns the number of propositions
  266. '''
  267. return len(self.propositions)
  268. @property
  269. def globalArgCount(self):
  270. '''
  271. Returns the number of arguments across all predicates
  272. '''
  273. vArgCount = 0
  274. for vProp in self.propositions:
  275. vArgCount += vProp.getArgCount()
  276. return vArgCount
  277. def getAvgArgCount(self):
  278. '''
  279. Returns the average number of arguments per proposition
  280. '''
  281. if self.propCount == 0:
  282. return 0
  283. else:
  284. return sum([p.getArgCount() for p in self.propositions]) * 1.0 / self.propCount
  285. def __str__(self):
  286. '''
  287. String representation of the class instance
  288. '''
  289. return self.getStr()
  290. def getStr(self):
  291. '''
  292. Returns string representation of the class instance
  293. '''
  294. return "\n".join([p.getStr() for p in self.propositions])
  295. def getArguments(self):
  296. '''
  297. Returns all arguments of the SRL
  298. '''
  299. vlArgs = []
  300. for vProp in self.propositions:
  301. vlArgs += vProp.getArguments()
  302. return vlArgs
  303. def getPropositions(self):
  304. '''
  305. Returns all the propositions
  306. '''
  307. return self.propositions
  308. def getPredicates(self):
  309. '''
  310. Returns predicates of all propositions
  311. '''
  312. return [p.getPredicate() for p in self.getPropositions()]
  313. class Proposition:
  314. '''
  315. Class for proposition
  316. A proposition consists of a predicate and its arguments.
  317. '''
  318. def __init__(self, pPredicate, plArguments = None):
  319. '''
  320. Creates a proposition object
  321. '''
  322. self.predicate = pPredicate
  323. if plArguments == None:
  324. self.arguments = []
  325. else:
  326. self.arguments = plArguments
  327. def getPredicate(self):
  328. '''
  329. Returns the predicate
  330. '''
  331. return self.predicate
  332. def getArguments(self):
  333. '''
  334. Returns arguments
  335. '''
  336. return self.arguments
  337. def addArguments(self, plArguments):
  338. '''
  339. Adds given arguments to proposition
  340. '''
  341. self.arguments += plArguments
  342. def depToConst(self, pConstTree, pConversionMethod, pflgLoadIntoTree):
  343. '''
  344. Creates a new proposition object by converting the predicate and
  345. arguments of this proposition to constituency formalism given
  346. the constituency tree.
  347. In dependency formalism, an argument role is assigned to a dependency
  348. node which corresponds to a token position. In constituency formalism,
  349. it is assigned to a phrase which corresponds to contituency node.
  350. Since a dependency-based argument cannot deterministically be mapped
  351. to a constituency node, simplifying assumptions are made:
  352. - preterminal: assign argument role to the preterminal at the
  353. argument position
  354. - elevate: assign argument role to the highest possible parent
  355. node covering the argument position. The possibility
  356. is determined by checking against a set of constraints.
  357. pflgLoadIntoTree determines whether the annotation must be loaded
  358. into the corresponding constituency tree after conversion or not.
  359. '''
  360. # creating and converting predicate
  361. vNewPred = self.predicate.depToConst(pConstTree, pflgLoadIntoTree)
  362. # creating proposition
  363. vNewProp = Proposition(pPredicate = vNewPred, plArguments = [])
  364. # creating, converting and adding arguments
  365. for vArg in self.arguments:
  366. vNewProp.arguments.append(vArg.depToConst(pConstTree, vNewProp, pflgLoadIntoTree))
  367. ## applying conversion method
  368. ## NOTE: In order to elevation to be fully done, the SRL must be
  369. ## loaded into the tree while converting (pflgLoadIntoTree).
  370. ## Otherwise, some elevation constraints will not work (e.g. #5)
  371. if pConversionMethod.lower() == "elevate":
  372. for vArg in vNewProp.arguments:
  373. vArg.elevate()
  374. return vNewProp
  375. def getArgCount(self, pflgCollapseAdjunct = False):
  376. '''
  377. Returns the number of arguments
  378. If pflgCollapseAdjunct is set to true, the adjunct roles are collapsed
  379. into a single role.
  380. '''
  381. if pflgCollapseAdjunct and len([a for a in self.arguments if a.isAdjunct()]) > 0:
  382. return len([a for a in self.arguments if not a.isAdjunct()]) + 1
  383. else:
  384. return len(self.arguments)
  385. def __str__(self):
  386. '''
  387. String representation of the class instance
  388. '''
  389. return self.getStr()
  390. def getStr(self):
  391. '''
  392. Returns string representation of the class instance
  393. '''
  394. return "%s (%s): %s" % (self.predicate.getLabel(pLevel = "frameset"), self.predicate.getForm(), ' '.join(["%s (%s)" % (a.label, a.getForm()) for a in self.arguments]))
  395. def isStructEqualTo(self, pAnotherProp, pflgCollapseAdjunct):
  396. '''
  397. VERIFY AND MOVE TO SRLCOMP
  398. Returns true is the structure of this proposition with that of another
  399. proposition is the same
  400. Structure of a proposition consists of its arguments role labels.
  401. (Previously consisted of its predicate's label as well.)
  402. pflgCollapseAdjunct specifies the comparison method at argument
  403. level. See Argument class methods for details.
  404. '''
  405. if self.getArgCount() != pAnotherProp.getArgCount():
  406. return False
  407. else:
  408. vlArgs2 = pAnotherProp.arguments[ : ]
  409. for vArg1 in self.arguments:
  410. vArg2Itr = 0
  411. vMatchFound = False
  412. for vArg2 in vlArgs2:
  413. if vArg1.compareRoleWith(vArg2, pflgCollapseAdjunct) == True:
  414. del vlArgs2[vArg2Itr]
  415. vMatchFound = True
  416. break
  417. else:
  418. vArg2Itr += 1
  419. continue
  420. if vMatchFound == False:
  421. return False
  422. return True
  423. class Predicate:
  424. '''
  425. Class for predicate
  426. '''
  427. def __init__(self, pPredLabel):
  428. '''
  429. Creates a predicate object
  430. '''
  431. vSensePos = pPredLabel.rfind('.')
  432. self.frameset = pPredLabel
  433. self.lemma = pPredLabel[ : vSensePos]
  434. self.sense = pPredLabel[vSensePos + 1 : ]
  435. def compareFrameWith(self, pAnotherPred, pLevel = "frameset"):
  436. '''
  437. MOVE TO SRLComp
  438. Returns true if this predicate has the same frameset or lemma as
  439. a given predicate
  440. pLevel specifies whether the comparison should be done at frameset
  441. lemma, or sense level. Sense level may not make sense but it is
  442. for compatibility with CoNLL2009 scorer.
  443. '''
  444. if pLevel == "frameset":
  445. if self.frameset == pAnotherPred.frameset:
  446. return True
  447. else:
  448. return False
  449. elif pLevel == "lemma":
  450. if self.lemma == pAnotherPred.lemma:
  451. return True
  452. else:
  453. return False
  454. elif pLevel == "sense":
  455. if self.sense == pAnotherPred.sense:
  456. return True
  457. else:
  458. return False
  459. def getLabel(self, pLevel = "frameset"):
  460. '''
  461. Returns predicate label at given level
  462. '''
  463. vLevel = pLevel.lower()
  464. if vLevel == "frameset":
  465. return self.frameset
  466. elif vLevel == "lemma":
  467. return self.lemma
  468. elif vLevel == "sense":
  469. return self.sense
  470. class DepPred(Predicate):
  471. '''
  472. Class for dependency-based predicate
  473. '''
  474. def __init__(self, pPredLabel, pTokenPos, pDepNode):
  475. '''
  476. Creates a dependency-based predicate object
  477. In dependency-based arguments, the predicate is identified by the
  478. position of the token to which it is assigned.
  479. '''
  480. Predicate.__init__(self, pPredLabel)
  481. self.position = pTokenPos
  482. self.node = pDepNode
  483. def depToConst(self, pConstTree, pflgLoadIntoTree):
  484. '''
  485. Creates a new constituency predicate object by converting from
  486. current dependency formalism given the constituency tree
  487. It assigns the predicate to the preterminal node that spans predicate
  488. token.
  489. pflgLoadIntoTree determines whether the annotation must be loaded
  490. into the corresponding constituency tree node after conversion or
  491. not.
  492. '''
  493. return ConstPred(self.frameset, pConstTree.getPreTerminalNodeAt(self.position), pflgLoadIntoTree)
  494. def getDepRel(self):
  495. '''
  496. Returns dependency relation of argument node
  497. '''
  498. return self.node.getDepRel()
  499. @property
  500. def position(self):
  501. '''
  502. Returns the token position of the predicate in the sentence
  503. It assumes that the predicate is assigned to a preterminal that
  504. spans only a token.
  505. '''
  506. return self.node.getTokenSpan()[0]
  507. def getNode(self):
  508. '''
  509. Returns the dependency node of the predicate
  510. '''
  511. return self.node
  512. def getPOSTag(self):
  513. '''
  514. Returns the POS tag of the predicate
  515. '''
  516. return self.node.getPOSTag()
  517. def getForm(self):
  518. '''
  519. Returns the surface form of the predicate
  520. '''
  521. return self.node.getForm()
  522. def isVerbPred(self):
  523. '''
  524. Returns true of the predicate is verb
  525. A predicate is recognized as a verb predicate if its POS tag starts
  526. with 'v'.
  527. '''
  528. return self.getPOSTag().lower().startswith('v')
  529. class ConstPred(Predicate):
  530. '''
  531. Class for constituency-based predicate
  532. '''
  533. def __init__(self, pPredLabel, pConstNode, pflgLoadIntoTree):
  534. '''
  535. Creates a constituency-based predicate object
  536. In constituency-based arguments, the predicate is identified by the
  537. constituency node to which it is assigned.
  538. '''
  539. Predicate.__init__(self, pPredLabel)
  540. self.setNode(pConstNode, pflgLoadIntoTree)
  541. def setNode(self, pNode, pflgLoadIntoTree):
  542. '''
  543. Sets the constituency node of argument
  544. pflgLoadIntoTree determines whether the annotation must be loaded
  545. into the corresponding constituency tree node or not.
  546. NOTE: If the node is not empty, it will be overridden. In that case
  547. the predicate-role annotation in the current node will remain
  548. intact and cause inconsistency. Therefore, for such cases, use
  549. changeNode() instead.
  550. '''
  551. self.node = pNode
  552. if pflgLoadIntoTree:
  553. self.node.addPredRole((None, self.frameset))
  554. @property
  555. def position(self):
  556. '''
  557. Returns the token position of the predicate in the sentence
  558. It assumes that the predicate is assigned to a preterminal that
  559. spans only a token.
  560. '''
  561. return self.node.getTokenSpan()[0]
  562. def getNode(self):
  563. '''
  564. Returns the constituency node of the predicate
  565. '''
  566. return self.node
  567. def getSynTag(self):
  568. '''
  569. Returns syntactic tag of predicate node
  570. '''
  571. return self.node.getSynTag()
  572. def getForm(self):
  573. '''
  574. Returns the surface form of the predicate
  575. '''
  576. return self.node.surface
  577. def isVerbPred(self):
  578. '''
  579. Returns true of the predicate is verb
  580. A predicate is recognized as a verb predicate if its POS tag starts
  581. with 'v'.
  582. '''
  583. return self.getSynTag().lower().startswith('v')
  584. class Argument:
  585. '''
  586. Class for argument
  587. '''
  588. def __init__(self, pArgLabel, pTokenSpan, pProposition, pLanguage):
  589. '''
  590. Creates an argument object
  591. '''
  592. self.label = pArgLabel
  593. self.span = pTokenSpan
  594. self.language = pLanguage
  595. self.proposition = pProposition
  596. def getAdjunctType(self):
  597. '''
  598. Returns the type of adjunct
  599. It assumes that adjuncts are labeled in *AM-Type format e.g. AM-TMP
  600. or R-AM-TMP.
  601. '''
  602. if self.isAdjunct():
  603. vPos = self.label.find('AM-')
  604. if vPos == -1:
  605. raise Exception("%s is not adjunct!" % self.label)
  606. else:
  607. return self.label[vPos + 3 : ]
  608. else:
  609. raise Exception("%s is not adjunct!" % self.label)
  610. def isAdjunct(self):
  611. '''
  612. Returns true if the argument is an adjunct
  613. It assumes that adjuncts are labeled in *AM-Type format.
  614. '''
  615. return self.label.find('AM-') != -1
  616. def getPredicate(self):
  617. '''
  618. Returns the predicate of the argument
  619. '''
  620. return self.proposition.predicate
  621. def compareRoleWith(self, pAnotherArg, pflgCollapseAdjunct = False):
  622. '''
  623. MOVE TO SRLComp
  624. Returns true if this argument has the same role label as a given
  625. argument
  626. If pflgCollapseAdjunct is set to true, the adjunct argument labels
  627. will be collapsed to a single label.
  628. '''
  629. if pflgCollapseAdjunct:
  630. if (self.label == pAnotherArg.label) or (self.isAdjunct() and pAnotherArg.isAdjunct()):
  631. return True
  632. else:
  633. return False
  634. else:
  635. if self.label == pAnotherArg.label:
  636. return True
  637. else:
  638. return False
  639. @property
  640. def size(self):
  641. '''
  642. Returns the size of token span of the argument in the sentence
  643. '''
  644. return self.span[1] - self.span[0] + 1
  645. def getLabel(self, pflgCollapseAdjunct = False):
  646. '''
  647. Returns argument label
  648. '''
  649. if pflgCollapseAdjunct and self.isAdjunct():
  650. # R-AM and C-AM taken into account
  651. vPos = self.label.rfind('-')
  652. return self.label[ : vPos]
  653. else:
  654. return self.label
  655. def __str__(self):
  656. '''
  657. String representation of the class instance
  658. '''
  659. return self.getStr()
  660. def getStr(self):
  661. '''
  662. Returns string representation of the class instance
  663. '''
  664. return "%s (%s)" % (self.getLabel(), self.getForm())
  665. class DepArg(Argument):
  666. '''
  667. Class for dependency-based argument
  668. '''
  669. def __init__(self, pArgLabel, pTokenPos, pDepNode, pProposition, pLanguage):
  670. '''
  671. Creates a dependnecy-based argument
  672. In dependency-based arguments, the argument is identified by the
  673. position of the token to which it is assigned.
  674. '''
  675. Argument.__init__(self, pArgLabel, (pTokenPos, pTokenPos), pProposition, pLanguage)
  676. self.node = pDepNode
  677. def depToConst(self, pConstTree, pProposition, pflgLoadIntoTree):
  678. '''
  679. Creates a new constituency argument object by converting from
  680. current dependency formalism given the constituency tree.
  681. pflgLoadIntoTree determines whether the annotation must be loaded
  682. into the corresponding constituency tree node after conversion or
  683. not.
  684. '''
  685. return ConstArg(self.label, pConstTree.getPreTerminalNodeAt(self.span[0]), pProposition, self.language, pflgLoadIntoTree)
  686. def getNode(self):
  687. '''
  688. Returns the dependency node of argument
  689. '''
  690. return self.node
  691. def getDepRel(self):
  692. '''
  693. Returns dependency relation of argument node
  694. '''
  695. return self.node.getDepRel()
  696. def getPOSTag(self):
  697. '''
  698. Returns the POS tag of the argument
  699. '''
  700. return self.node.getPOSTag()
  701. @property
  702. def position(self):
  703. '''
  704. Returns the position of argument roll filler in sentence
  705. '''
  706. return self.span[0]
  707. def getForm(self):
  708. '''
  709. Returns the surface form of the predicate
  710. '''
  711. return self.getNode().getForm()
  712. class ConstArg(Argument):
  713. '''
  714. Class for constituency-based argument
  715. '''
  716. def __init__(self, pArgLabel, pConstNode, pProposition, pLanguage, pflgLoadIntoTree):
  717. '''
  718. Creates a constituency-based argument
  719. In constituency-based arguments, the argument is identified by
  720. the constituent node to which it is assigned.
  721. '''
  722. Argument.__init__(self, pArgLabel, pConstNode.getTokenSpan(), pProposition, pLanguage)
  723. self.setNode(pConstNode, pflgLoadIntoTree)
  724. def setNode(self, pNode, pflgLoadIntoTree):
  725. '''
  726. Sets the constituency node of argument
  727. pflgLoadIntoTree determines whether the annotation must be loaded
  728. into the corresponding constituency tree node or not.
  729. NOTE: If the node is not empty, it will be overridden. In that case
  730. the predicate-role annotation in the current node will remain
  731. intact and will cause inconsistency. Therefore, for such cases, use
  732. changeNode() instead.
  733. '''
  734. self.node = pNode
  735. if pflgLoadIntoTree:
  736. self.node.addPredRole((self.getPredicate().getNode(), self.getLabel()))
  737. def getNode(self):
  738. '''
  739. Returns the constituency node of argument
  740. '''
  741. return self.node
  742. def changeNode(self, pNewNode):
  743. '''
  744. Changes the constituency node of argument to a new node
  745. It checks the current constituency node and if it contains the
  746. predicate-role annotation of this argument, it will be moved into
  747. new node. Otherwise, it wont change the predicate-role annotation
  748. of the nodes.
  749. '''
  750. vPredRole = (self.getPredicate().getNode(), self.getLabel())
  751. if self.getNode().hasPredRole(vPredRole):
  752. self.getNode().removePredRole(vPredRole)
  753. self.setNode(pNewNode, pflgLoadIntoTree = True)
  754. else:
  755. self.setNode(pNewNode, pflgLoadIntoTree = False)
  756. def getSynTag(self):
  757. '''
  758. Returns syntactic tag of argument node
  759. '''
  760. return self.node.getSynTag()
  761. def elevate(self):
  762. '''
  763. Elevates the argument node to the highest possible parent node.
  764. The possibility is determined by checking against a set of constraints.
  765. '''
  766. if self.node.isRoot():
  767. return
  768. else:
  769. vParent = self.node.getParent()
  770. ## The parent node spanning (exactly) the same span is a valid level
  771. ## to elvate the node to.
  772. if vParent.getTokenSpan() == self.span:
  773. self.changeNode(vParent)
  774. self.elevate()
  775. elif self._checkElevationConstraints():
  776. self.changeNode(vParent)
  777. self.span = self.node.getTokenSpan()
  778. self.elevate()
  779. def _checkElevationConstraints(self):
  780. '''
  781. Returns true if elevating the argument node to its parent does
  782. not violate a set of constraints
  783. '''
  784. # NOTE: numbering constraints are chronological
  785. # common
  786. # 3. Root node cannot take a role.
  787. if self.getNode().getParent().isRoot():
  788. return False
  789. # 2. adjunct type which are normally assigned to single tokens
  790. if self.isAdjunct() and self.getAdjunctType() in ["NEG", "MOD", "DIS"]:
  791. return False
  792. # 6. Argument of a node with these tags cannot be elevated
  793. if self.getSynTag() in ["POS"]:
  794. return False
  795. # 4. argument node cannot dominate its predicate node
  796. if self.getNode().getParent().ifDominates(self.getPredicate().getNode()):
  797. return False
  798. ## 5. argument node cannot dominate another argument node of its
  799. ## predicate
  800. if len([vSibling for vSibling in self.getNode().getSibling() if vSibling.isArgumentOf(self.getPredicate().getNode())]) > 0:
  801. return False
  802. # 1. parnet of a pre-terminal can usually take the role
  803. if self.getNode().isPreTerminal() or self.getNode().isCollapsedTerminal():
  804. return True
  805. ## TODO: do something for argument overlap for a proposition
  806. ## TODO: add common constraints
  807. # language-specific
  808. if self.language.lower().startswith("en"):
  809. return self._checkElevationConstraintsEN()
  810. elif self.language.lower().startswith("fr"):
  811. return self._checkElevationConstraintsFR()
  812. def _checkElevationConstraintsEN(self):
  813. '''
  814. Checks English-specific argument elevation constraints
  815. '''
  816. # TODO: add constriants
  817. return False
  818. def _checkElevationConstraintsFR(self):
  819. '''
  820. Checks French-specific argument elevation constraints
  821. '''
  822. # TODO: add constriants
  823. return False
  824. def getForm(self):
  825. '''
  826. Returns the surface form of the predicate
  827. '''
  828. return self.getNode().surface
  829. ## Comparisons #########################################################
  830. class SRLComp:
  831. '''
  832. Class for comparing two SRLs of a sentence
  833. '''
  834. def __init__(self, pPredLabelMatchLevel = "frameset", pflgCollapseAdjunct = False):
  835. '''
  836. Constructor
  837. Predicate label matching level is one of the following:
  838. - frameset : the entire frameset labels match (e.g. go.01 vs. go.01)
  839. - lemma : only the lemmas match (e.g. go.01 vs. go.02)
  840. - sense : only predicate sense (e.g. go.01 vs. eat.01);
  841. This may not make sense but it is for compatibility
  842. with CoNLL2009 scorer.
  843. - none or ignore : no frame match check, i.e. two predicates will
  844. match only if they are aligned regardless of
  845. their frame match.
  846. Adjunct labels can be optionally collapsed into one label to ignore
  847. adjunct type.
  848. '''
  849. self.predLabelMatchLevel = pPredLabelMatchLevel.lower()
  850. self.collapseAdjunct = pflgCollapseAdjunct
  851. # list of tuples of propositions whose predicates are aligned
  852. self.alignedProps = []
  853. # list of tuples of arguments whose fillers are aligned
  854. self.alignedArgs = []
  855. @property
  856. def alignedPreds(self):
  857. '''
  858. Returns list of aligned predicates (regardless of label match)
  859. '''
  860. return [(p1.predicate, p2.predicate) for p1, p2 in self.alignedProps]
  861. @property
  862. def frameMatchPreds(self):
  863. '''
  864. Returns list of aligned predicates whose frame label match at the
  865. preset level
  866. See constructor documentation for predicate label matching level.
  867. '''
  868. if self.predLabelMatchLevel.lower() in ["none", "ignore"]:
  869. ## ignoring frame match, i.e. two predicates will match only
  870. ## if they are aligned regardless of their frame match.
  871. return self.alignedPreds
  872. else:
  873. return [(p1.predicate, p2.predicate) for p1, p2 in self.alignedProps if p1.predicate.compareFrameWith(p2.predicate, self.predLabelMatchLevel)]
  874. @property
  875. def alignedPredCount(self):
  876. '''
  877. Returns the number of matching predicates
  878. '''
  879. return len(self.alignedProps)
  880. @property
  881. def frameMatchPredCount(self):
  882. '''
  883. Returns the number of aligned predicates whose frame label match
  884. '''
  885. return len(self.frameMatchPreds)
  886. @property
  887. def roleMatchArgs(self):
  888. '''
  889. Returns list of aligned arguments whose role label match considering
  890. the setting of adjuncts match (original or collapsed).
  891. '''
  892. return [(a1, a2) for a1, a2 in self.alignedArgs if a1.compareRoleWith(a2, pflgCollapseAdjunct = self.collapseAdjunct)]
  893. @property
  894. def alignedArgCount(self):
  895. '''
  896. Returns the number of matching arguments
  897. '''
  898. return len(self.alignedArgs)
  899. @property
  900. def roleMatchArgCount(self):
  901. '''
  902. Returns the number of aligned arguments whose role label match
  903. considering the setting of adjuncts match (original or collapsed).
  904. '''
  905. return len(self.roleMatchArgs)
  906. def compare(self, pSRL1, pSRL2):
  907. '''
  908. NOT TESTED
  909. Compare two SRLs
  910. Results are set into the corresponding attributes.
  911. '''
  912. self.alignedProps = self._extractAlignedProps(pSRL1, pSRL2)
  913. for vProp1, vProp2 in self.alignedProps:
  914. self.alignedArgs += self._extractAlignedArgs(vProp1, vProp2)
  915. def _extractAlignedProps(self, pSRL1, pSRL2):
  916. '''
  917. Extracts from two given SRLs pairs of propositions whose predicates
  918. are aligned
  919. '''
  920. ## coping list of propositions of second SRL into a new list to
  921. ## be able to delete form it later
  922. vlProps2 = pSRL2.propositions[ : ]
  923. vlAligned = []
  924. for vProp1 in pSRL1.propositions:
  925. vProp2Itr = 0
  926. for vProp2 in vlProps2:
  927. if self._arePredsAligend(vProp1.predicate, vProp2.predicate):
  928. del vlProps2[vProp2Itr]
  929. vlAligned.append((vProp1, vProp2))
  930. break
  931. else:
  932. vProp2Itr += 1
  933. return vlAligned
  934. def _arePredsAligend(self, pPred1, pPred2):
  935. '''
  936. NOT TESTED
  937. Returns true if the given predicates are aligned, i.e. their positions
  938. in the sentence are the same
  939. '''
  940. return pPred1.position == pPred2.position
  941. def _extractAlignedArgs(self, pProp1, pProp2):
  942. '''
  943. NOT TESTED
  944. Extracts from given two propositions, pairs of aligned arguments
  945. '''
  946. ## coping list of arguments of second proposition into a new list
  947. ## to be able to delete form it later
  948. vlArgs2 = pProp2.arguments[ : ]
  949. vlMatchPairs = []
  950. for vArg1 in pProp1.arguments:
  951. vArg2Itr = 0
  952. for vArg2 in vlArgs2:
  953. if self._areArgsAligned(vArg1, vArg2):
  954. del vlArgs2[vArg2Itr]
  955. vlMatchPairs.append((vArg1, vArg2))
  956. break
  957. else:
  958. vArg2Itr += 1
  959. return vlMatchPairs
  960. def _areArgsAligned(self, pArg1, pArg2):
  961. '''
  962. NOT TESTED
  963. Returns true if the given arguments are aligned, i.e. their boundaries
  964. match.
  965. '''
  966. return pArg1.span == pArg2.span
  967. ## Projection ##########################################################
  968. class SRLProject:
  969. '''
  970. Base Class for projecting SRL from one the source side of the translation
  971. to the target side
  972. '''
  973. def __init__(self, pSrcSurface, pSrcSRL, pSrcParse, pSrcLang):
  974. '''
  975. Constructor
  976. '''
  977. self.srcSurface = pSrcSurface
  978. self.srcSRL = pSrcSRL
  979. self.srcParse = pSrcParse
  980. self.pSrcLang = pSrcLang
  981. # initializing projection statistics attributes
  982. self._initStat()
  983. def _initStat(self):
  984. '''
  985. Initializes projection statistics
  986. Only many-to-ones are counted in this way. The others are handled
  987. by _initReport().
  988. '''
  989. #self._cntNonAlignedPred = 0 # number of source predicates with no alignment (not necessarily word-alignment) in the target
  990. #self._cntNonProjectedPred = 0 # number of source predicated not finally projected
  991. #self._cntOneToManyAlignedPred = 0 # number of source predicates aligned (not necessarily word-aligned) with non-consecutive target role fillers
  992. #self._cntOneToManyProjectedPred = 0 # number of source predicates projected to non-consecutive target role fillers (not happening in practice)
  993. self._cntManyToOneAlignedPred = 0 # number of times different predicates aligned (not necessarily word-aligned) to same filler
  994. self._cntManyToOneProjectedPred = 0 # number of times different predicates projected to same filler
  995. #self._cntNonAlignedArg = 0 # number of source arguments with no alignment (not necessarily word-alignment) in the target
  996. #self._cntNonProjectedArg = 0 # number of source arguments not finally projected including those due to non-projected predicates
  997. #self._cntOneToManyAlignedArg = 0 # number of source arguments aligned (not necessarily word-aligned) with non-consecutive target role fillers
  998. #self._cntOneToManyProjectedArg = 0 # number of source arguments projected to non-consecutive target role fillers
  999. self._cntManyToOneAlignedArg = 0 # number of times different arguments of same proposition aligned (not necessarily word-aligned) to same filler
  1000. self._cntManyToOneProjectedArg = 0 # number of times different arguments of same proposition projected to same filler
  1001. def _initReport(self):
  1002. '''
  1003. Initializes projection report
  1004. Many-to-ones are not supported yet and handled by _initStat() only
  1005. in terms of counts.
  1006. '''
  1007. self.nonAlignedPred = [] # source predicates with no alignment (not necessarily word-alignment) in the target
  1008. self.nonProjectedPred = [] # source predicated not finally projected
  1009. self.oneToManyAlignedPred = [] # source predicates aligned (not necessarily word-aligned) with non-consecutive target role fillers
  1010. self.oneToManyProjectedPred = [] # source predicates projected to non-consecutive target role fillers (not happening in practice)
  1011. #self.manyToOneAlignedPred = [] # sets of different predicates aligned (not necessarily word-aligned) to same filler
  1012. #self.manyToOneProjectedPred = [] # sets of different predicates projected to same filler
  1013. self.nonAlignedArg = [] # source arguments with no alignment (not necessarily word-alignment) in the target
  1014. self.nonProjectedArg = [] # source arguments not finally projected including those due to non-projected predicates
  1015. self.oneToManyAlignedArg = [] # source arguments aligned (not necessarily word-aligned) with non-consecutive target role fillers
  1016. self.oneToManyProjectedArg = [] # source arguments projected to non-consecutive target role fillers
  1017. #self.manyToOneAlignedArg = [] # sets of different arguments of same proposition aligned (not necessarily word-aligned) to same filler
  1018. #self.manyToOneProjectedArg = [] # sets of different arguments of same proposition projected to same filler
  1019. def getStat(self):
  1020. '''
  1021. Returns the projection statistics in a dictionary
  1022. 4 types of statistics are returned:
  1023. - Counts
  1024. - Aggregation of counts based on various factors
  1025. - Percentages
  1026. - Aggregation of percentages based on various factors
  1027. '''
  1028. vdCount = {"Non-aligned predicate count": len(self.nonAlignedPred), \
  1029. "Non-projected predicate count": len(self.nonProjectedPred), \
  1030. "One-to-many aligned predicate count": len(self.oneToManyAlignedPred), \
  1031. "One-to-many projected predicate count": len(self.oneToManyProjectedPred), \
  1032. "Many-to-one aligned predicate count": self._cntManyToOneAlignedPred, \
  1033. "Many-to-one projected predicate count": self._cntManyToOneProjectedPred, \
  1034. "Non-aligned argument count": len(self.nonAlignedArg), \
  1035. "Non-projected argument count": len(self.nonProjectedArg), \
  1036. "One-to-many aligned argument count": len(self.oneToManyAlignedArg), \
  1037. "One-to-many projected argument count": len(self.oneToManyProjectedArg), \
  1038. "Many-to-one aligned argument count": self._cntManyToOneAlignedArg, \
  1039. "Many-to-one projected argument count": self._cntManyToOneProjectedArg}
  1040. vdAggCount = {"Total non-aligned count": len(self.nonAlignedPred) + len(self.nonAlignedArg), \
  1041. "Total non-projected count": len(self.nonProjectedPred) + len(self.nonProjectedArg), \
  1042. "Total one-to-many alignment count": len(self.oneToManyAlignedPred) + len(self.oneToManyAlignedArg), \
  1043. "Total one-to-many projection count": len(self.oneToManyProjectedPred) + len(self.oneToManyProjectedArg), \
  1044. "Total many-to-one alignment count": self._cntManyToOneAlignedPred + self._cntManyToOneAlignedArg, \
  1045. "Total many-to-one projection count": self._cntManyToOneProjectedPred + self._cntManyToOneProjectedArg, \
  1046. "Total predicate alignment case count": len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred, \
  1047. "Total predicate projection case count": len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred, \
  1048. "Total argument alignment case count": len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg, \
  1049. "Total argument projection case count": len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg, \
  1050. "Total alignment case count": len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred + len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg, \
  1051. "Total projection case count": len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred + len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg, \
  1052. "Total predicate case count": len(self.nonAlignedPred) + len(self.nonProjectedPred) + len(self.oneToManyAlignedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneAlignedPred + self._cntManyToOneProjectedPred, \
  1053. "Total argument case count": len(self.nonAlignedArg) + len(self.nonProjectedArg) + len(self.oneToManyAlignedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneAlignedArg + self._cntManyToOneProjectedArg, \
  1054. "Total problem count": len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred + len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg + \
  1055. len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred + len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg}
  1056. vdPercentage = {"Non-aligned predicate percentage": util.percent(len(self.nonAlignedPred), self.srcSRL.propCount), \
  1057. "Non-projected predicate percentage": util.percent(len(self.nonProjectedPred), self.srcSRL.propCount), \
  1058. "One-to-many aligned predicate percentage": util.percent(len(self.oneToManyAlignedPred), self.srcSRL.propCount), \
  1059. "One-to-many projected predicate percentage": util.percent(len(self.oneToManyProjectedPred), self.srcSRL.propCount), \
  1060. "Many-to-one aligned predicate percentage": util.percent(self._cntManyToOneAlignedPred, self.srcSRL.propCount), \
  1061. "Many-to-one projected predicate percentage": util.percent(self._cntManyToOneProjectedPred, self.srcSRL.propCount), \
  1062. "Non-aligned argument percentage": util.percent(len(self.nonAlignedArg), self.srcSRL.globalArgCount), \
  1063. "Non-projected argument percentage": util.percent(len(self.nonProjectedArg), self.srcSRL.globalArgCount), \
  1064. "One-to-many aligned argument percentage": util.percent(len(self.oneToManyAlignedArg), self.srcSRL.globalArgCount), \
  1065. "One-to-many projected argument percentage": util.percent(len(self.oneToManyProjectedArg), self.srcSRL.globalArgCount), \
  1066. "Many-to-one aligned argument percentage": util.percent(self._cntManyToOneAlignedArg, self.srcSRL.globalArgCount), \
  1067. "Many-to-one projected argument percentage": util.percent(self._cntManyToOneProjectedArg, self.srcSRL.globalArgCount)}
  1068. vdAggPercentage = {"Total non-aligned percentage": util.percent(len(self.nonAlignedPred) + len(self.nonAlignedArg), self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1069. "Total non-projected percentage": util.percent(len(self.nonProjectedPred) + len(self.nonProjectedArg), self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1070. "Total one-to-many alignment percentage": util.percent(len(self.oneToManyAlignedPred) + len(self.oneToManyAlignedArg), self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1071. "Total one-to-many projection percentage": util.percent(len(self.oneToManyProjectedPred) + len(self.oneToManyProjectedArg), self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1072. "Total many-to-one alignment percentage": util.percent(self._cntManyToOneAlignedPred + self._cntManyToOneAlignedArg, self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1073. "Total many-to-one projection percentage": util.percent(self._cntManyToOneProjectedPred + self._cntManyToOneProjectedArg, self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1074. "Total predicate alignment case percentage": util.percent(len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred, self.srcSRL.propCount), \
  1075. "Total predicate projection case percentage": util.percent(len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred, self.srcSRL.propCount), \
  1076. "Total argument alignment case percentage": util.percent(len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg, self.srcSRL.globalArgCount), \
  1077. "Total argument projection case percentage": util.percent(len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg, self.srcSRL.globalArgCount), \
  1078. "Total alignment case percentage": util.percent(len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred + len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg, self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1079. "Total projection case percentage": util.percent(len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred + len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg, self.srcSRL.propCount + self.srcSRL.globalArgCount), \
  1080. "Total predicate case percentage": util.percent(len(self.nonAlignedPred) + len(self.nonProjectedPred) + len(self.oneToManyAlignedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneAlignedPred + self._cntManyToOneProjectedPred, self.srcSRL.propCount), \
  1081. "Total argument case percentage": util.percent(len(self.nonAlignedArg) + len(self.nonProjectedArg) + len(self.oneToManyAlignedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneAlignedArg + self._cntManyToOneProjectedArg, self.srcSRL.globalArgCount), \
  1082. "Total problem percentage": util.percent(len(self.nonAlignedPred) + len(self.oneToManyAlignedPred) + self._cntManyToOneAlignedPred + len(self.nonAlignedArg) + len(self.oneToManyAlignedArg) + self._cntManyToOneAlignedArg + \
  1083. len(self.nonProjectedPred) + len(self.oneToManyProjectedPred) + self._cntManyToOneProjectedPred + len(self.nonProjectedArg) + len(self.oneToManyProjectedArg) + self._cntManyToOneProjectedArg, self.srcSRL.propCount + self.srcSRL.globalArgCount)}
  1084. return vdCount, vdAggCount, vdPercentage, vdAggPercentage
  1085. def printStat(self, pflgCountStat = True, pflgAggCountStat = False, pflgPercentStat = True, pflgAggPercentStat = True):
  1086. '''
  1087. Print the projection statistics in user-friendly format
  1088. 4 types of statistics are printed each upon request:
  1089. - Counts
  1090. - Aggregation of counts based on various factors
  1091. - Percentages
  1092. - Aggregation of percentages based on various factors
  1093. '''
  1094. vdCntStat, vdAggCntStat, vdPctStat, vdAggPctStat = self.getStat()
  1095. vPredStatTxt = "Non-aligned predicate count ............. %s \n" % vdCntStat["Non-aligned predicate count"] + \
  1096. "Non-aligned predicate pct. .............. %s \n" % vdPctStat["Non-aligned predicate percentage"] + \
  1097. "Non-projected predicate count ........... %s \n" % vdCntStat["Non-projected predicate count"] + \
  1098. "Non-projected predicate pct. ............ %s \n" % vdPctStat["Non-projected predicate percentage"] + \
  1099. "One-to-many aligned predicate count ..... %s \n" % vdCntStat["One-to-many aligned predicate count"] + \
  1100. "One-to-many aligned predicate pct. ...... %s \n" % vdPctStat["One-to-many aligned predicate percentage"] + \
  1101. "One-to-many projected predicate count ... %s \n" % vdCntStat["One-to-many projected predicate count"] + \
  1102. "One-to-many projected predicate pct. .... %s \n" % vdPctStat["One-to-many projected predicate percentage"] + \
  1103. "Many-to-one aligned predicate count ..... %s \n" % vdCntStat["Many-to-one aligned predicate count"] + \
  1104. "Many-to-one aligned predicate pct. ...... %s \n" % vdPctStat["Many-to-one aligned predicate percentage"] + \
  1105. "Many-to-one projected predicate count ... %s \n" % vdCntStat["Many-to-one projected predicate count"] + \
  1106. "Many-to-one projected predicate pct. .... %s \n" % vdPctStat["Many-to-one projected predicate percentage"]
  1107. vArgStatTxt = "Non-aligned argument count .............. %s \n" % vdCntStat["Non-aligned argument count"] + \
  1108. "Non-aligned argument pct. ............... %s \n" % vdPctStat["Non-aligned argument percentage"] + \
  1109. "Non-projected argument count ............ %s \n" % vdCntStat["Non-projected argument count"] + \
  1110. "Non-projected argument pct. ............. %s \n" % vdPctStat["Non-projected argument percentage"] + \
  1111. "One-to-many aligned argument count ...... %s \n" % vdCntStat["One-to-many aligned argument count"] + \
  1112. "One-to-many aligned argument pct. ....... %s \n" % vdPctStat["One-to-many aligned argument percentage"] + \
  1113. "One-to-many projected argument count .... %s \n" % vdCntStat["One-to-many projected argument count"] + \
  1114. "One-to-many projected argument pct. ..... %s \n" % vdPctStat["One-to-many projected argument percentage"] + \
  1115. "Many-to-one aligned argument count ...... %s \n" % vdCntStat["Many-to-one aligned argument count"] + \
  1116. "Many-to-one aligned argument pct. ....... %s \n" % vdPctStat["Many-to-one aligned argument percentage"] + \
  1117. "Many-to-one projected argument count .... %s \n" % vdCntStat["Many-to-one projected argument count"] + \
  1118. "Many-to-one projected argument pct. ..... %s \n" % vdPctStat["Many-to-one projected argument percentage"]
  1119. vAggStatTxt = "Total non-aligned count .............. %s \n" % vdAggCntStat["Total non-aligned count"] + \
  1120. "Total non-aligned pct. ............... %s \n" % vdAggPctStat["Total non-aligned percentage"] + \
  1121. "Total non-projected count ............ %s \n" % vdAggCntStat["Total non-projected count"] + \
  1122. "Total non-projected pct. ............. %s \n" % vdAggPctStat["Total non-projected percentage"] + \
  1123. "Total one-to-many alignment count .... %s \n" % vdAggCntStat["Total one-to-many alignment count"] + \
  1124. "Total one-to-many alignment pct. ..... %s \n" % vdAggPctStat["Total one-to-many alignment percentage"] + \
  1125. "Total one-to-many projection count ... %s \n" % vdAggCntStat["Total one-to-many projection count"] + \
  1126. "Total one-to-many projection pct. .... %s \n" % vdAggPctStat["Total one-to-many projection percentage"] + \
  1127. "Total many-to-one alignment count .... %s \n" % vdAggCntStat["Total many-to-one alignment count"] + \
  1128. "Total many-to-one alignment pct. ..... %s \n" % vdAggPctStat["Total many-to-one alignment percentage"] + \
  1129. "Total many-to-one projection count ... %s \n" % vdAggCntStat["Total many-to-one projection count"] + \
  1130. "Total many-to-one projection pct. .... %s \n" % vdAggPctStat["Total many-to-one projection percentage"] + \
  1131. "Total alignment case count ........... %s \n" % vdAggCntStat["Total alignment case count"] + \
  1132. "Total alignment case pct. ............ %s \n" % vdAggPctStat["Total alignment case percentage"] + \
  1133. "Total projection case count .......... %s \n" % vdAggCntStat["Total projection case count"] + \
  1134. "Total projection case pct. ........... %s \n" % vdAggPctStat["Total projection case percentage"] + \
  1135. "Total predicate case count ........... %s \n" % vdAggCntStat["Total predicate case count"] + \
  1136. "Total predicate case pct. ............ %s \n" % vdAggPctStat["Total predicate case percentage"] + \
  1137. "Total argument case count ............ %s \n" % vdAggCntStat["Total argument case count"] + \
  1138. "Total argument case pct. ............. %s \n" % vdAggPctStat["Total argument case percentage"] + \
  1139. "Total problem count .................. %s \n" % vdAggCntStat["Total problem count"] + \
  1140. "Total problem pct. ................... %s \n" % vdAggPctStat["Total problem percentage"]
  1141. sys.stdout.write(vPredStatTxt + '\n' + vArgStatTxt + '\n' + vAggStatTxt + '\n')
  1142. class DepSRLProject(SRLProject):
  1143. '''
  1144. Base Class for projecting SRL from one the source side of the translation
  1145. to the target side
  1146. '''
  1147. def __init__(self, pSrcSurface, pSrcSRL, pSrcDepTree, pSrcLang):
  1148. '''
  1149. Constructor
  1150. '''
  1151. SRLProject.__init__(self, pSrcSurface, pSrcSRL, pSrcParse = pSrcDepTree, pSrcLang = pSrcLang)
  1152. def waProject(self, pTrgSurface, pTrgDepTree, pWordAlignment, pTrgLang, pPredOneToManyMethod = 0, pPredOverlapMethod = 0, pArgOneToManyMethod = 0, pArgOverlapMethod = 0, pflgClassicDRule = False):
  1153. '''
  1154. Projects source side SRL to given target dependency tree using
  1155. given word alignment, loads the projected SRL to the tree and
  1156. returns the projected SRL
  1157. Word alignment is of type mt.WordAlignment
  1158. pPredOneToManyMethod specifies the method to resolve one-to-many
  1159. predicate projection (see _resPredOneToMany()). Default is method
  1160. 1.
  1161. pPredOverlapMethod specifies the method to resolve many-to-one
  1162. predicate projection (target predicate overlap; see _resPredOverlap()).
  1163. Default is method 1.
  1164. pArgOneToManyMethod specifies the method to resolve one-to-many
  1165. argument projection (see _resArgOneToMany()). Default is project
  1166. to many (do not resolve).
  1167. pArgOverlapMethod specifies the method to resolve many-to-one argument
  1168. projection (target argument overlap; see _resArgOverlap()). Default
  1169. is project to many (do not resolve).
  1170. if pflgClassicDRule set to true, the D-Rule used in Classic project
  1171. is applied in argument transfer. Default is false.
  1172. '''
  1173. self._initReport()
  1174. #### reconsider
  1175. self._initStat()
  1176. vProjSRL = SRL(pLanguage = pTrgLang, pType = 'd')
  1177. # projection
  1178. vlPredsAndProjTrgPositions = []
  1179. for vSrcProp in self.srcSRL.propositions:
  1180. # 1. projecting predicate
  1181. vProjPredPos = self._waProjectPred(vSrcProp.predicate, pWordAlignment, pTrgDepTree, pPredOneToManyMethod)
  1182. vlPredsAndProjTrgPositions.append((vSrcProp.predicate, vProjPredPos))
  1183. # 2. projecting arguments
  1184. ## NOTE: we do not create a target proposition if no word
  1185. ## is found to be aligned with source predicate. However, we
  1186. ## proceed with argument projection to extract statistics.
  1187. ## ToDo: find alternative ways to project non-aligned predicates
  1188. vlArgsAndProjTrgPositions = []
  1189. for vSrcArg in vSrcProp.arguments:
  1190. ## When a source argument role filler is aligned with more
  1191. ## than one token in the target, there will be more than
  1192. ## one token assigned the same role in the target except
  1193. ## some cases (see _projectArg).
  1194. vlProjArgPositions = self._waProjectArg(vSrcArg, pWordAlignment, vSrcProp, pTrgDepTree, pTrgLang, pArgOneToManyMethod, pflgClassicDRule)
  1195. vlArgsAndProjTrgPositions.append((vSrcArg, vlProjArgPositions))
  1196. # 3. creating and adding projected propositions (only if predicate is projected)
  1197. # ToDo: Move to a general method for all projection methods (eg. self.project())
  1198. if vProjPredPos != None:
  1199. # creating predicate
  1200. vProjPred = DepPred(vSrcProp.predicate.getLabel(), vProjPredPos, pTrgDepTree.getNode(vProjPredPos))
  1201. # creating proposition
  1202. vProjProp = Proposition(vProjPred)
  1203. # creating and adding arguments
  1204. vlProjArgs = []
  1205. for vSrcArg, vlProjArgPositions in vlArgsAndProjTrgPositions:
  1206. vProjProp.addArguments([DepArg(pArgLabel = vSrcArg.getLabel(),
  1207. pTokenPos = vProjArgPos,
  1208. pDepNode = pTrgDepTree.getNode(vProjArgPos),
  1209. pProposition = vProjProp,
  1210. pLanguage = pTrgLang)
  1211. for vProjArgPos in vlProjArgPositions])
  1212. # adding proposition
  1213. vProjSRL.addProposition(vProjProp)
  1214. # statistics of argument projection
  1215. # finding many-to-one argument projections for the proposition
  1216. vlProjArgPositions = [vPos for vArgPositions in vlArgsAndProjTrgPositions for vPos in vArgPositions[1]]
  1217. for vPosCnt in util.groupBy(vlProjArgPositions).itervalues():
  1218. if vPosCnt > 1:
  1219. self._cntManyToOneProjectedArg += 1
  1220. # finding many-to-one argument alignments for the proposition (independent of projection)
  1221. vlSrcArgPositions = [a.position for a in vSrcProp.arguments]
  1222. self._cntManyToOneAlignedArg += len(pWordAlignment.subsetBySrcPos(vlSrcArgPositions).subsetManyToOne().getTrgPositions())
  1223. # statistics of predicate projection
  1224. # finding many-to-one predicate projections
  1225. vlTrgPredPositions = [vPredPos[1] for vPredPos in vlPredsAndProjTrgPositions if vPredPos[1] != None]
  1226. for vPosCnt in util.groupBy(vlTrgPredPositions).itervalues():
  1227. if vPosCnt > 1:
  1228. self._cntManyToOneProjectedPred += 1
  1229. # finding many-to-one predicate alignments (independent of projection)
  1230. vlSrcPredPoistions = [p.predicate.position for p in self.srcSRL.propositions]
  1231. self._cntManyToOneAlignedPred += len(pWordAlignment.subsetBySrcPos(vlSrcPredPoistions).subsetManyToOne().getTrgPositions())
  1232. pTrgDepTree.loadSRL(vProjSRL)
  1233. return vProjSRL
  1234. def _waProjectPred(self, pSrcPred, pWordAlignment, pTrgDepTree, pPredOneToManyMethod):
  1235. '''
  1236. Projects given source predicate to target using given word alignment
  1237. and returns projected target position
  1238. '''
  1239. # A word may be aligned to more than one words in the other side
  1240. vlTrgPredPos = pWordAlignment.getTrgAlignedTo(pSrcPred.position)
  1241. if len(vlTrgPredPos) == 0:
  1242. self.nonAlignedPred.append(pSrcPred)
  1243. self.nonProjectedPred.append(pSrcPred)
  1244. # statistics: remove
  1245. #self._cntNonAlignedPred += 1
  1246. #self._cntNonProjectedPred += 1
  1247. return None
  1248. elif len(vlTrgPredPos) == 1:
  1249. vTrgPredPos = vlTrgPredPos[0]
  1250. elif len(vlTrgPredPos) > 1: # 1-to-n alignment case
  1251. self.oneToManyAlignedPred.append(pSrcPred)
  1252. # statistics: remove
  1253. #self._cntOneToManyAlignedPred += 1
  1254. vTrgPredPos = self._resPredOneToMany(vlTrgPredPos, pPredOneToManyMethod, pTrgDepTree)
  1255. return vTrgPredPos
  1256. def _waProjectArg(self, pSrcArg, pWordAlignment, pProposition, pTrgDepTree, pTrgLang, pArgOneToManyMethod, pflgClassicDRule):
  1257. '''
  1258. Projects given source argument to target using given word alignment
  1259. and returns projected target position
  1260. '''
  1261. # A word may be aligned to more than one words in the other side
  1262. vlTrgArgPos = pWordAlignment.getTrgAlignedTo(pSrcArg.position)
  1263. if len(vlTrgArgPos) == 0:
  1264. self.nonAlignedArg.append(pSrcArg)
  1265. # statistics: remove
  1266. #self._cntNonAlignedArg += 1
  1267. # applying Classic D-Rule
  1268. if pflgClassicDRule:
  1269. vlTrgArgPos = self._applyClassicDRule(pSrcArg, pWordAlignment)
  1270. else:
  1271. vlTrgArgPos = []
  1272. if len(vlTrgArgPos) == 0:
  1273. self.nonProjectedArg.append(pSrcArg)
  1274. # statistics: remove commented
  1275. #self._cntNonProjectedArg += 1
  1276. return []
  1277. elif len(vlTrgArgPos) > 1: # 1-to-n alignment
  1278. vlTrgArgPos = self._resArgOneToMany(vlTrgArgPos, pArgOneToManyMethod, pTrgDepTree)
  1279. self.oneToManyAlignedArg.append(pSrcArg)
  1280. # statistics: remove commented
  1281. #self._cntOneToManyAlignedArg += 1
  1282. if len(vlTrgArgPos) > 1:
  1283. self.oneToManyProjectedArg.append(pSrcArg)
  1284. #self._cntOneToManyProjectedArg += 1
  1285. return vlTrgArgPos
  1286. def _resPredOneToMany(self, plProjPositions, pMethod, pTrgDepTree):
  1287. '''
  1288. Resolves one-to-many predicate projection case using given method
  1289. The default is to return the first position in the list.
  1290. '''
  1291. def _resPredOneToMany1(plProjPositions, pTrgDepTree):
  1292. '''
  1293. Resolves one-to-many predicate projection case using method 1
  1294. This method chooses the first verb word among Many target words.
  1295. If a verb is not found, it chooses the first word.
  1296. ToDo: improve tie-breaking
  1297. '''
  1298. vlVerbPos = [pos for pos in plProjPositions if pTrgDepTree.getNode(pos).isVerb()]
  1299. if len(vlVerbPos) == 0:
  1300. return plProjPositions[0]
  1301. else:
  1302. return vlVerbPos[0]
  1303. if pMethod == 0:
  1304. return plProjPositions[0]
  1305. elif pMethod == 1:
  1306. return _resPredOneToMany1(plProjPositions, pTrgDepTree)
  1307. def _resPredOverlap(self, pMethod):
  1308. '''
  1309. Resolves many-to-one predicate projection case (target predicate
  1310. overlap) using given method
  1311. '''
  1312. def _resPredOverlap1(self):
  1313. '''
  1314. Resolves many-to-one predicate projection case (target predicate
  1315. overlap) using method 1
  1316. This method checks whether there is other alignment for f
  1317. '''
  1318. pass
  1319. def _resArgOneToMany(self, plProjPositions, pMethod, pTrgDepTree):
  1320. '''
  1321. Resolves one-to-many argument projection case using given method
  1322. '''
  1323. def _resArgOneToMany1(plProjPositions, pTrgDepTree):
  1324. '''
  1325. Resolves one-to-many argument projection case using method 1
  1326. This method does the following:
  1327. 1- remove all the words in Many side which are dependents of
  1328. any other word in Many.
  1329. 2- create an argument for each remaining word in Many
  1330. The purpose is to assign the role to only head if both dependent
  1331. and nodes are among words in Many.
  1332. ToDo: improve the tie-breaking
  1333. '''
  1334. def __isHeadIn(pPos, plPos, pDepTree):
  1335. '''
  1336. Returns true if the position of a head of a node at pPos is
  1337. in plPos using pDepTree
  1338. '''
  1339. # handling multiple head
  1340. for vHead in pDepTree.getNode(pos).getHeadPos():
  1341. if vHead in plPos:
  1342. return True
  1343. return False
  1344. vlHeadPos = [pos for pos in plProjPositions if not __isHeadIn(pos, plProjPositions, pTrgDepTree)]
  1345. if len(vlHeadPos) == 0:
  1346. # a rare situation!
  1347. raise Exception("All-circular head/dependent in 1-ton alignment of argument role filler!")
  1348. else:
  1349. return vlHeadPos
  1350. if pMethod == 0:
  1351. return plProjPositions
  1352. elif pMethod == 1:
  1353. return _resArgOneToMany1(plProjPositions, pTrgDepTree)
  1354. def _resArgOverlap(self, pMethod):
  1355. '''
  1356. Resolves many-to-one argument projection case (target predicate
  1357. overlap) using given method
  1358. '''
  1359. pass
  1360. def _applyClassicDRule(self, pSrcArg, pWordAlignment):
  1361. '''
  1362. Applies D-Rule used by Classic project and returns the position(s)
  1363. of target argument if successful otherwise an empty list
  1364. This is the quote from Classic delivery 2 report:
  1365. For any pair of translated sentences E and F and a semantic
  1366. relationship R(xE, yE) in E, if there exists a word-alignment
  1367. between predicates xE and xF but not between roles yE and any yF,
  1368. and if the POS of yE is 'IN' or 'TO', then we find the dependent
  1369. y'E of yE and if there exists a word-alignment between y'E and
  1370. some y'F, we transfer the semantic relationship R(xE, yE) to
  1371. R(xF, y'F).
  1372. '''
  1373. if not pSrcArg.getPOSTag() in ["IN", "TO"]:
  1374. return []
  1375. vlTrgArgPos = []
  1376. for vDepPosition in pSrcArg.getNode().getDepPositions():
  1377. vlTrgArgPos = pWordAlignment.getTrgAlignedTo(vDepPosition)
  1378. if len(vlTrgArgPos) > 0:
  1379. return vlTrgArgPos
  1380. return vlTrgArgPos