admin管理员组

文章数量:1180551

So i used pdfplumber to extract texts from pdf but it contains several tags like "\n","\t","u2019", and many spaces in between i need to pass this text into an llm to extract specific fields any idea on how to do that? This is a example of the text being extracted

CURRICULUM CURRICULUMVITAEVITAE"\n"examplename"\n"E-mail:[email protected]"\n"\uf03f\uf030+91-examplenumber"\n"CAREER CAREEROBJECTIVE:- OBJECTIVE:-"\n"Toworkindynamicenvironmentwhichprovideamplescopetoenrichmylearning"\n"curvebyutilizingmyprofessionalknowledge.Ultimatelycontributingto"\n"organizationalandpersonaldevelopment."\n"EXPERIENCE:-:-"\n"Workedasan\u201cExecutive\u201dwith\u2018examplecompany2\u2019Pvt.Ltd.UdyogVihar,Phase-4,,"\n"Gurgaon,Haryana.Since24thSept2014to25thJan2016."\n"WorkingasSMELeadwithexamplecompanysinceNov2017totilldate,handlingcredit"\n"cardsoutboundprocess."\n"COMPANY&JOBPROFILE:-:-"\n"(OnePointOne)"\n"OnepointOneSolutionisaglobalbusinessservicedproviderintheareaofexperience"\n"management.Weprovideasuiteofsolutionsforoutclients-fromstrategyanddesignto"\n"implementationandexecutionthathelpglobalbrandsdelivermemorableandcustomer"\n"experiences"\n"Jobprofile:IhaveworkedinoutboundprocessandmyprocessnamewasAirtel"\n"Digital.Inthisprocesswehavetocallthecustomerregardingnewchannelsandnew"\n"offerslaunchbyAirtel.Wealsosolvethecustomerqueriesandcomplaintlike-"\n"deductionofamountandchannelactivation."\n"ACADEMIC ACADEMICPROFILE:- PROFILE:-"\n"BABAfromfromUtkalUtkalUniversity, University,Bhubaneswar, Bhubaneswar,OrissaOrissain2012.in2012."\n"Council CouncilofofHigherHigherSecondary SecondaryEducation, Education,OrissaOrissainin2009.2009."\n"BoardBoardofofSecondary SecondaryEducation, Education,OrissaOrissaBoardBoardinin2006.2006."\n"PERSONAL PERSONALDETAILS DETAILS"\n"NameName :: SameerSameerRanjanRanjanRoutRout"\n"DateDateofofbirthbirth :: 0n0n2222ththMarchMarch19891989"\n"GenderGender :: MaleMale"\n"Address Address :: D-2/1,D-2/1,Chattarpur Chattarpur"\n"NewNewDelhi-110068 Delhi-110068FatherFather\u2019\u2019ssnamename :: Mr.Mr.examplefathername"\n"MaritalMaritalstatusstatus :: Unmarried Unmarried"\n"Nationality Nationality :: IndianIndian"\n"LeisureLeisuredoingdoing :: PlayingPlaying&&Watching WatchingCricketCricket"\n"Language Languageknownknown :: EnglishEnglish,,Hindi,Hindi,OriyaOriya"\n"DECLARATION DECLARATION"\n"Ifindmyselfasanenthusiasticandambitiouspersonality.Asmyhardworkingnature"\n"anddeterminationaremybigassets.Managingskillsandcreativemindaremyadded"\n"qualities."\n"Iherebydeclarethattheinformationfurnishedabovebymeistruetothebestof"\n"myknowledge."\n"DateDate:: Signature Signature((\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026 \u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026\u2026))"\n"PlacePlace::NewNewDelhiDelhi Name:Name:exampleexamplenamename"

Tried passing the whole text into local llm was expecting to get a structured output but obviously the llm started hallucinating.I prompted the llm to give structured output like this but is was'nt able to

 ```json
    {
        "name": "examplename",
        "email": "[email protected]",
        "phone": "9718127215",
        "location": "New Delhi",
        "highest_qualification": "BA",
        "gender":"Male",
        "marital_status": "Unmarried",
        "current_company": [
            {
                "company_name": "companyexample",
                "designation": "SME Lead",
                "duration": "2017-present"
            }
        ],
        "education": ["BA","HSC"],
        "skills": [ "Customer Support","Credit Cards"],
        "experience": [
            {
                "position": "Executive",
                "company": "companyexample2",
                "duration": "2014-2016"
            }
        ]
    }
    ```

I'm new to this so any help would be appreciated.

本文标签: pythonhow can i extract specific fields from a documentStack Overflow