University of La Rochelle

Post-doc

Post-doctoral position Historical Handwritten Text Recognition & Information Extraction in Documents

접수중2025.07.05~2025.07.18

채용 정보

  • 접수 기간

    2025.07.05 00:00~2025.07.18 09:00

  • 접수 방법

    홈페이지지원더보기

  • 채용 구분

    경력

  • 고용 형태

    계약직

  • 지원 자격

    박사

  • 모집 전공

    제어계측공학, 정보・통신공학, 전자공학, 전산학・컴퓨터공학, 전기공학, 의공학, 응용소프트웨어공학, 광학공학더보기

  • 기관 유형

    대학교

  • 근무 지역

    해외(프랑스)더보기

  • 연봉 정보

La Rochelle Université calls for applications for a post-doctoral position in computer science, in the field of handwritten text recognition and machine learning, for historical census documents.

  • Duration: 12 months (with possibilities of renewal for 12 months)
  • Desired hiring date: 1st october of 2025
  • Salary: 2570 € gross /month
  • Workplace: L3i laboratory in La Rochelle, France
  • Specialities: Computer Science / Machine Learning / Handwritten Text Recognition


Job Summary
We are seeking a highly motivated and skilled Post-Doc in Machine to join our research team focused on advancing Historical Text Recognition (HTR) systems.
This role will involve developing and optimizing end-to-end solutions for extracting meaningful information from degraded historical documents (circa 1690-1790). The ideal candidate will work with a variety of recent machine learning models and compare their performance against classical HTR and Named Entity Recognition (NER) systems. The role also involves working with multimodal architectures like
Vision-Language Learning Models (VLLM) to improve the explainability, performance, and usability of these systems.


Context and Description of the Project
The DAI-CRéTDHI project proposes to mobilize and adapt the tools of digital and data sciences, demography and anthroponymy to contribute to a better understanding of the population of France from the 16th to the 19th century by deploying both a retrospective approach on a national scale based on aggregated data from old civil status records and, on a few selected corpora, an "individual approach" which collects and attributes to each actor a certain number of demographic characteristics (sex, age, marital status), family characteristics (fertility, household composition and position within it, etc.), relational characteristics (neighborhood of relatives, etc.), socio-professional characteristics (job, income level, etc.) and geographic characteristics (migrant or native, home address). The multiplicity of sources likely to provide detailed individual information on a significant number of clearly identified actors is well known to historians. Advances in data processing, in terms of engineering and visualization techniques, now make it possible to process considerable masses of data, provided that they are correctly structured and allow for nominative and family tracking. It is also possible to carry out more or less automated matching and to enrich these large corpora with contextual information (e.g. geographic environment) to broaden these analyses. In addition to the data already held by the partners, collaborative indexing (Geneanet) will make it possible to extend these corpora, in time and space.


Job description:
The main objective of this Post-Doc position are:

  • Propose End-to-End Systems for Historical Text Recognition (HTR)
    • Design and implement HTR systems to process 17th and 18th-century documents (circa 1690-1790), which may include a variety of scripts and degraded text conditions
    • Use some Multimodal Vision-Language Learning Models (VLLM) to
      extract information from historical documents, enabling enhanced
      information extraction via in-context learning
  • Comparison with classical HTR + NER Systems:
    • Evaluate and benchmark the performance of modern Transformer-based models against classical HTR systems like Pylaia, Transkribus, etc
    • Analyze differences in accuracy, speed, explainability, and lisibility between classical and Transformer-based systems.

A focus will be made on the analysis and explainability steps in order to assess the acceptability of the proposed systems by scholars in Humanities.

In addition to this topic, the candidate will work on building on terminology extraction methods. The main objective of this young researcher grant is to promote access to French scientific documents to a broader audience and thus improve the international visibility of publications in French-language scientific journals, by automatically translating keywords and entities into English. The project aims to develop and adapt recent advances in deep learning for terminology and cross-lingual, cross-domain information extraction for this purpose.

The use case is based on the journal Sciences Eaux & Territoires (SET): Sciences Eaux & Territoires is a scientific and technical journal freely available online, published by Irstea since 2010. Its target audience includes public and private stakeholders and decision-makers involved in territorial development and environmental issues.

Concretely, using the terminology and abstracts of scientific articles that have been translated into English—on the one hand by their (French-speaking) authors and on the other by professional translators—the postdoctoral researcher will aim to develop tools that provide more effective machine translation. This will be validated both
qualitatively (through comparative evaluations by professional translators) and quantitatively (by tracking changes in the number of accesses to the translated articles).

The project’s goal is to develop a prototype that can be generalized to other journals and other languages.

근무 예정지

대표University of La Rochelle(해외) : 23 Av. Albert Einstein, 17000 La Rochelle

해외(프랑스) : France, La Rochelle Université, La Rochelle, 17000, Nouvelle Aquitaine

관련 키워드

Computer science

기관 정보

University of La Rochelle

닫기신규 공고 알림받기신규 공고 알림받기 관심 기관 설정으로 신규 공고를 누구보다
먼저 받아보세요.

  • 기관유형

    대학교(해외)

  • 대표전화

    -

  • 대표주소

    23 Av. Albert Einstein, 17000 La Rochelle

  • 홈페이지

    바로가기

이런 공고는 어떠세요?