<?xml version="1.0" encoding="utf-8"?>
<journal>
<title>Multidisciplinary Cancer Investigation</title>
<title_fa>نشریه بین المللی چند تخصصی سرطان</title_fa>
<short_title>Multidiscip Cancer Investig</short_title>
<subject>Medical Sciences</subject>
<web_url>http://mcijournal.com</web_url>
<journal_hbi_system_id>1</journal_hbi_system_id>
<journal_hbi_system_user>admin</journal_hbi_system_user>
<journal_id_issn>2476-4922</journal_id_issn>
<journal_id_issn_online>2538-1911</journal_id_issn_online>
<journal_id_pii></journal_id_pii>
<journal_id_doi>10.61882/mci</journal_id_doi>
<journal_id_iranmedex></journal_id_iranmedex>
<journal_id_magiran></journal_id_magiran>
<journal_id_sid></journal_id_sid>
<journal_id_nlai></journal_id_nlai>
<journal_id_science></journal_id_science>
<language>en</language>
<pubdate>
	<type>jalali</type>
	<year>1404</year>
	<month>7</month>
	<day>1</day>
</pubdate>
<pubdate>
	<type>gregorian</type>
	<year>2025</year>
	<month>10</month>
	<day>1</day>
</pubdate>
<volume>0</volume>
<number>Multidisciplinary Cancer Investigation</number>
<publish_type>online</publish_type>
<publish_edition>1</publish_edition>
<article_type>fulltext</article_type>
<articleset>
	<article>


	<language>en</language>
	<article_id_doi></article_id_doi>
	<title_fa></title_fa>
	<title>Language Model–Based Representation Learning for Venom Protein Identification and Therapeutic Target Discovery in Cancer</title>
	<subject_fa></subject_fa>
	<subject>Prevention, Early Detection and Screening</subject>
	<content_type_fa></content_type_fa>
	<content_type>Original/Research Article</content_type>
	<abstract_fa>&lt;br&gt;
&lt;qb-div data-qb-element=&quot;re-enable-flow&quot; style=&quot;z-index: 2147483647; max-width: 1px; max-height: 1px; box-sizing: border-box; position: fixed; top: 10px; right: 10px;&quot;&gt;
&lt;div style=&quot;all: initial !important;&quot;&gt;&lt;qb-div style=&quot;all: initial !important;&quot;&gt;&lt;/qb-div&gt;&lt;/div&gt;
&lt;/qb-div&gt;</abstract_fa>
	<abstract>&lt;span style=&quot;font-size:12pt&quot;&gt;&lt;span style=&quot;line-height:normal&quot;&gt;&lt;span style=&quot;text-autospace:none&quot;&gt;&lt;span style=&quot;font-family:Calibri,sans-serif&quot;&gt;&lt;span style=&quot;font-size:9.0pt&quot;&gt;&lt;span style=&quot;font-family:NimbusSanL-Regu&quot;&gt;Venom is a complex mixture of bioactive molecules produced by venomous organisms for predation, defense, or intraspecific competition, often leading to specific physiological responses in target organisms. Venom-derived peptides and proteins have recently attracted attention in biomedical research for their potential therapeutic applications, including anticancer drug discovery. However, venom sequences constitute a highly divergent class of proteins, making their machine learning and homology-based identification particularly challenging. To address this, we propose ToxVec, a transfer learning based framework for automatic representation learning of protein sequences aimed at improving venom identification. Our approach leverages pre-trained protein language models to capture sequence-level information without manual feature engineering. ToxVec outperforms existing feature-based models, achieving amacro-F1 score of 0.89. Furthermore, an ensemble model trained on multiple balanced subsets enhances&lt;br&gt;
performance to a macro-F1 of 0.93, representing a 7% improvement over the state of the art. Beyond benchmark performance, screening of experimentally validated anticancer peptides from the CancerPPD2 dataset revealed that many exhibit high venom-like signatures according to ToxVec, supporting the notion that toxin-inspired molecular architectures may underlie anticancer bioactivity. We further discuss how language model&amp;ndash;based representation learning embodies a Cognitive Mind&amp;ndash;Body&amp;ndash;Inspired interpretation, linking abstract sequence semantics (the &amp;ldquo;mind&amp;rdquo;) to biological function (the &amp;ldquo;body&amp;rdquo;). By enabling more accurate large-scale identification of venom proteins, ToxVec provides a foundation for systematically exploring venom-derived bioactive peptides as potential therapeutic candidates, including those targeting pathways implicated in breast cancer progression and metastasis. This automated approach thus bridges computational protein informatics with translational oncology, supporting future efforts in bioactive peptide based anticancer research.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;qb-div data-qb-element=&quot;re-enable-flow&quot; style=&quot;z-index: 2147483647; max-width: 1px; max-height: 1px; box-sizing: border-box; position: fixed; top: 10px; right: 10px;&quot;&gt;
&lt;div style=&quot;all: initial !important;&quot;&gt;&lt;qb-div style=&quot;all: initial !important;&quot;&gt;&lt;/qb-div&gt;&lt;/div&gt;
&lt;/qb-div&gt;</abstract>
	<keyword_fa></keyword_fa>
	<keyword>Venom protein identification,Protein language model,Transfer learning,Representation learning,Anticancer peptides</keyword>
	<start_page>20</start_page>
	<end_page>30</end_page>
	<web_url>http://mcijournal.com/browse.php?a_code=A-10-717-1&amp;slc_lang=en&amp;sid=1</web_url>


<author_list>
	<author>
	<first_name>Meisam </first_name>
	<middle_name></middle_name>
	<last_name>Ahmadi</last_name>
	<suffix></suffix>
	<first_name_fa></first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa></last_name_fa>
	<suffix_fa></suffix_fa>
	<email>me.ahmadi@gmail.com</email>
	<code>10031947532846004198</code>
	<orcid>10031947532846004198</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran</affiliation>
	<affiliation_fa></affiliation_fa>
	 </author>


	<author>
	<first_name>Mohammad Reza</first_name>
	<middle_name></middle_name>
	<last_name>Jahed-Motlagh</last_name>
	<suffix></suffix>
	<first_name_fa></first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa></last_name_fa>
	<suffix_fa></suffix_fa>
	<email>jahedmr@iust.ac.ir</email>
	<code>10031947532846004199</code>
	<orcid>000000017025141X</orcid>
	<coreauthor>Yes
</coreauthor>
	<affiliation>Department of Computer Engineering, Iran University of Science and Technology,Tehran, Iran</affiliation>
	<affiliation_fa></affiliation_fa>
	 </author>


	<author>
	<first_name>Ehsaneddin </first_name>
	<middle_name></middle_name>
	<last_name>Asgari</last_name>
	<suffix></suffix>
	<first_name_fa></first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa></last_name_fa>
	<suffix_fa></suffix_fa>
	<email>me.ahmadi@gmail.com</email>
	<code>10031947532846004200</code>
	<orcid>10031947532846004200</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Qatar Computing Research Institute</affiliation>
	<affiliation_fa></affiliation_fa>
	 </author>


	<author>
	<first_name>Adel Torkaman </first_name>
	<middle_name></middle_name>
	<last_name>Rahmani</last_name>
	<suffix></suffix>
	<first_name_fa></first_name_fa>
	<middle_name_fa></middle_name_fa>
	<last_name_fa></last_name_fa>
	<suffix_fa></suffix_fa>
	<email>me.ahmadi@gmail.com</email>
	<code>10031947532846004201</code>
	<orcid>10031947532846004201</orcid>
	<coreauthor>No</coreauthor>
	<affiliation>Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran</affiliation>
	<affiliation_fa></affiliation_fa>
	 </author>


</author_list>


	</article>
</articleset>
</journal>
