admin管理员组文章数量:1180551
So i used pdfplumber to extract texts from pdf but it contains several tags like "\n","\t","u2019", and many spaces in between i need to pass this text into an llm to extract specific fields any idea on how to do that? This is a example of the text being extracted
CURRICULUM CURRICULUMVITAEVITAE"\n"examplename"\n"E-mail:[email protected]"\n"\uf03f\uf030+91-examplenumber"\n"CAREER CAREEROBJECTIVE:- OBJECTIVE:-"\n"Toworkindynamicenvironmentwhichprovideamplescopetoenrichmylearning"\n"curvebyutilizingmyprofessionalknowledge.Ultimatelycontributingto"\n"organizationalandpersonaldevelopment."\n"EXPERIENCE:-:-"\n"Workedasan\u201cExecutive\u201dwith\u2018examplecompany2\u2019Pvt.Ltd.UdyogVihar,Phase-4,,"\n"Gurgaon,Haryana.Since24thSept2014to25thJan2016."\n"WorkingasSMELeadwithexamplecompanysinceNov2017totilldate,handlingcredit"\n"cardsoutboundprocess."\n"COMPANY&JOBPROFILE:-:-"\n"(OnePointOne)"\n"OnepointOneSolutionisaglobalbusinessservicedproviderintheareaofexperience"\n"management.Weprovideasuiteofsolutionsforoutclients-fromstrategyanddesignto"\n"implementationandexecutionthathelpglobalbrandsdelivermemorableandcustomer"\n"experiences"\n"Jobprofile:IhaveworkedinoutboundprocessandmyprocessnamewasAirtel"\n"Digital.Inthisprocesswehavetocallthecustomerregardingnewchannelsandnew"\n"offerslaunchbyAirtel.Wealsosolvethecustomerqueriesandcomplaintlike-"\n"deductionofamountandchannelactivation."\n"ACADEMIC ACADEMICPROFILE:- PROFILE:-"\n"BABAfromfromUtkalUtkalUniversity, University,Bhubaneswar, Bhubaneswar,OrissaOrissain2012.in2012."\n"Council CouncilofofHigherHigherSecondary SecondaryEducation, Education,OrissaOrissainin2009.2009."\n"BoardBoardofofSecondary SecondaryEducation, Education,OrissaOrissaBoardBoardinin2006.2006."\n"PERSONAL PERSONALDETAILS DETAILS"\n"NameName :: SameerSameerRanjanRanjanRoutRout"\n"DateDateofofbirthbirth :: 0n0n2222ththMarchMarch19891989"\n"GenderGender :: MaleMale"\n"Address Address :: D-2/1,D-2/1,Chattarpur Chattarpur"\n"NewNewDelhi-110068 Delhi-110068FatherFather\u2019\u2019ssnamename :: Mr.Mr.examplefathername"\n"MaritalMaritalstatusstatus :: Unmarried Unmarried"\n"Nationality Nationality :: IndianIndian"\n"LeisureLeisuredoingdoing :: PlayingPlaying&&Watching WatchingCricketCricket"\n"Language Languageknownknown :: EnglishEnglish,,Hindi,Hindi,OriyaOriya"\n"DECLARATION DECLARATION"\n"Ifindmyselfasanenthusiasticandambitiouspersonality.Asmyhardworkingnature"\n"anddeterminationaremybigassets.Managingskillsandcreativemindaremyadded"\n"qualities."\n"Iherebydeclarethattheinformationfurnishedabovebymeistruetothebestof"\n"myknowledge."\n"DateDate:: Signature Signature((\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026 \u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026))"\n"PlacePlace::NewNewDelhiDelhi Name:Name:exampleexamplenamename"
Tried passing the whole text into local llm was expecting to get a structured output but obviously the llm started hallucinating.I prompted the llm to give structured output like this but is was'nt able to
```json
{
"name": "examplename",
"email": "[email protected]",
"phone": "9718127215",
"location": "New Delhi",
"highest_qualification": "BA",
"gender":"Male",
"marital_status": "Unmarried",
"current_company": [
{
"company_name": "companyexample",
"designation": "SME Lead",
"duration": "2017-present"
}
],
"education": ["BA","HSC"],
"skills": [ "Customer Support","Credit Cards"],
"experience": [
{
"position": "Executive",
"company": "companyexample2",
"duration": "2014-2016"
}
]
}
```
I'm new to this so any help would be appreciated.
本文标签: pythonhow can i extract specific fields from a documentStack Overflow
版权声明:本文标题:python - how can i extract specific fields from a document? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738207721a2068688.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论