123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281 |
- ## List of Non-breaking tokens containing period for German
- ## 28-Sep-2012
- ##
- ## Adapted from Moses tokenizer
- ##
- # Roman Numerals. A dot after one of these is not a sentence break in German.
- I.
- II.
- III.
- IV.
- V.
- VI.
- VII.
- VIII.
- IX.
- X.
- XI.
- XII.
- XIII.
- XIV.
- XV.
- XVI.
- XVII.
- XVIII.
- XIX.
- XX.
- i.
- ii.
- iii.
- iv.
- v.
- vi.
- vii.
- viii.
- ix.
- x.
- xi.
- xii.
- xiii.
- xiv.
- xv.
- xvi.
- xvii.
- xviii.
- xix.
- xx.
- # Titles and Honorifics
- Adj.
- Adm.
- Adv.
- Asst.
- Bart.
- Bldg.
- Brig.
- Bros.
- Capt.
- Cmdr.
- Col.
- Comdr.
- Con.
- Corp.
- Cpl.
- DR.
- #Dr. handled by ignoring case
- Ens.
- Gen.
- Gov.
- Hon.
- Hosp.
- Insp.
- Lt.
- MM.
- MR.
- MRS.
- MS.
- Maj.
- Messrs.
- Mlle.
- Mme.
- #Mr. handled by ignoring case
- #Mrs. handled by ignoring case
- #Ms. handled by ignoring case
- Msgr.
- Op.
- Ord.
- Pfc.
- Ph.
- Prof.
- Pvt.
- Rep.
- Reps.
- Res.
- Rev.
- Rt.
- Sen.
- Sens.
- Sfc.
- Sgt.
- Sr.
- St.
- Supt.
- Surg.
- # Misc
- Mio.
- Mrd.
- bzw.
- v.
- vs.
- usw.
- d.h.
- z.B.
- u.a.
- etc.
- Mrd.
- MwSt.
- ggf.
- d.J.
- D.h.
- m.E.
- vgl.
- I.F.
- z.T.
- sogen.
- ff.
- u.E.
- g.U.
- g.g.A.
- c.-à-d.
- Buchst.
- u.s.w.
- sog.
- u.ä.
- Std.
- evtl.
- Zt.
- Chr.
- u.U.
- o.ä.
- Ltd.
- b.A.
- z.Zt.
- spp.
- sen.
- SA.
- k.o.
- jun.
- i.H.v.
- dgl.
- dergl.
- Co.
- zzt.
- usf.
- s.p.a.
- Dkr.
- Corp.
- bzgl.
- BSE.
- # Ordinals are done with . in German - "1." = "1st" in English
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 999.
- # Added from Symantec data
- Ver.
- ca.
- i.d.R.
- inkl.
- d.
- h.
- i.
- R.
- z.
- B.
|